OCR OpenCV in FORMS and RECEIPTS | Text Detection 2020 p.1/2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to my channel in this video we are going to run ocr with opencv so let me show you what we will be doing and what will be the final result so here i have the blog about this project which i haven't completed yet but soon i will and i will post this as well so the idea here is that we have this uh awesomeness form that you have to register to be awesome right so you have to give in your name your phone number you have to acknowledge that i am awesome and you cannot deny and uh so the idea behind this was that you should be able to work with input text and input as a checkbox so here we have uh input text where we have name phone number uh awesome id and stuff like this and then we also have the check box so here we will be able to detect whether they have checked it or not so we will be using feature detection to actually align the images so here you can see that we are detecting all the features and we are matching them and then here are some of the filled forms that for example are our input and then we are going to extract the form first and we are going to align so that it becomes as the original shape so by that i mean like this so all of these are basically aligned and now they are um with the bird's eye view and the thing is that uh i i did this in one of the previous videos where you had to have all of the region of your form within the actual image but in this case we don't have to you can see on the right hand side and on the left hand side this is a more advanced technique that does not require all of the form to be in the picture so you can see that wherever it's missing it will put this black area and it will still work so it's much more reliable and robust and once we have that we will extract all this information you can see that we have the name the number let me show you here so the name the number uh the contacts and the id and of course uh the country and the city and yes i am from mars that is true story uh then we will display the results we will overlaid on the original image and you can see here we have sarah we have the phone number we have one if it's checked zero if it's not checked and then we have all of the other information as well and at the end of the day we will extract all this information in form of text and we can also put that in a document so we will save all of it in a file in a csv file and that csv file we can open with let's say excel and the conclusion is empty as i said i am still writing the blog so i will put this as well so that once you are done with the video and you want to uh get to some very specific area of the video then rather than going through the whole video you can come to the blog and check out the specific topic okay so this was the introduction this was what we are going to do so let's dive in straight away and get started with this project it's very exciting it's very useful so without further ado let's get started okay so here we are in the pycharm environment and all of the packages that we need are pretty much available here and we can install them in the settings but one of them needs an external installation and that is pi tesseract so this is basically the tesseract documentation if you are not familiar with tesseract it is one of the most popular library for text recognition which is ocr optical character recognition and the reason it's quite popular is because it's free and it's very reliable when it comes to printed fonts so here we are going to use the windows version and we can install for windows over here i will put this link in the description or on my website where you can download this after the installation is complete you will go to uh program files in c and then tesseract ocr so over here we need is the tesseract exe so basically this is the file that we will link to so just remember this we are going to copy this and we will paste it here we can comment it out for now and later we can use this so first we are going to go to file then settings and over here we are going to add all the libraries so what we need is the opencv python and i have already installed all of these so it's showing blue otherwise you would click here and click install then we need numpy numpy and then we need pi tesseract uh yeah there you go pi tesseract and you will install all of these and then we will hit ok and we will come back here so we are going to import cv2 and then we will import numpy as np and then we will import pi tesseract and then we will not need imports os for the directories so then we are going to link our file the one that we were mentioning so we are going to write by tesseract dot by tesseract dot tesseract cmd equals uh then we are going to add this link over here this path and then instead of slash we are going to write double backlash so uh backslash okay so here we are going to write this and we will write here this racked i hope the spellings are right tesseract dot exe then we are going to import our image now before we do that let me show you uh the files that we have so in ocr forms this is our project we have the user forms and we have the original forms so original forms we are not interested in these are the ones that are already filled but this is the actual image we don't want that what we need are the test images the images that i have taken from my mobile phone so these are the images that we are going to detect our text from and then we have the curie image the query image is the blank form that we will be using as our template to find it in our test images so we need to import this first so we will write cb2 dot im read and we are going to import our uh qre image which is query.png and we are going to save it in img queue which will be our query image then to display this we can write i am cv2.imsh and we can write this as output and then we can write imq and then cb2 dot weight key and then we will write 1 or no let's write zero for now because later on when we loop we can add a delay of one or whatever we need okay so let's run this and see if we run into any errors and we don't so here is our original image so we are going to print this out uh quite often so we need to scale this down just for the viewing purpose so we can write here that image queue is equals to cb2 dot resize and we can write image q and then we can write uh no we need to write a bracket then we will write width or let's just write w divided by or let's write 3 and h divided by 3. but what are these w and h these are the width and height of our image which we can get from heights with height width and channel we can write image queue and then we can write shape so that should work let's run that and there you go so now we have a much smaller image that we can look at but again at the end of the day we can remove this and it will give you better results next we are going to create our uh detector which will be orb now we are using orb because it's free to use the other ones like uh sift and surf they are not free to use you have to get a license and stuff like that so we are going to simply use orb is equals to cv2 dot orb i have done another tutorial in which i have explained furthermore what is basically orp and what are these features so you can have a look at that if you want to go into more details so inside this we only have to define the features so by default it's 500 we can put let's say a thousand and later we can see how that affects our features okay then we will write kp1 which is basically our key points and then we will write the descriptors is equals to key points are the unique points or unique elements to your image and then the descriptors are the representation of these key points that will be easier for the computer to understand and differentiate between so we are going to write here that orb dot detect and compute oh this is supposed to be a dot and then we are going to write image q and we will write the mask as none okay um should we let's view this so that it's easier for us to understand what is going on so we can write here that our image let's say kp1 our key points are cv2 dot draw key points and we will send in our image which is image queue and okay write this and then we are going to write a kp kp1 as our key points and then we don't have an output image so we will write none because we are storing it here then we can simply display this let's just copy this and paste it here and then we can write kp1 over here and let's run this oh i didn't change the name so we have to write here key points for the curie image and there we have it so now you can see these are the key points that we are getting from our query image so okay let me show you what happens if i put this as 5000 there you go so earlier you saw that it was not detecting any key points below but now it's detecting key points everywhere but the thing is that we don't need these this many key points to actually uh find the matrix the transformation matrix for the image so we can simply write 1000 and that should be enough if it's not we can change it later on okay so we can remove this and we can remove this because we don't want to display that any further so next we are going to bring in all these images so that we can find the features of these images and then we can compare it with our uh features of this curie image so we will write here first of all is equals to the name of our folder which is user forms and then we will find out the list of our images so my pick list let's say is equals to os dot list directory and then we will give in our path so let's print this out and see what happens so here you can see that we have the list of names of all our uh images so we have test3.jpg test2 test1 so we can use this information to actually import all of these images so to do that we can write here we can loop and we can say that for let's say j and y in enumerates we are going to write my pick list so we are writing enumerate because we want the index value as well we want to know how many times the loop has run so the actual value of my pick list will be in y uh each item one by one so we are going to find that so we can write that image is equals to cv2 dot i am read and then we will write our path then the path plus our what do you call slash and then plus our y so this will be the complete path to our image so that we can find and import it so then we can simply find the detector for this or should we let's let's view it first so that we are sure that it's working so here we are going to put it down and we will write here image and we will write here y because that will be a unique name every time so we will view each of these and this says zero um i think it will work let's see if it doesn't we can change yeah it works so again these images are huge so we can scale them down let's copy this part and we will write it here and instead of image queue we will write image and let's rerun and there we have it so these are our images that we are getting so all of them are importing properly okay so that is they are a little bit squeezed i think that that's why it's looking a little bit weird but it should be fine okay so now we can uh comment this out again and then we can uh find the descriptors so we will write here kp2 and then descriptor 2 is equals to we will write here again the same thing we will compute and this time around we will compute using our image so let's do we need to we don't need to see the key points for it but what would be interesting to see is the matching of it so we can view that later on so to find the matching we have to use a matcher so we will write bf which is our brute force so we will write cb2 dot bf matcher which is our brute force matcher and we will use the cv2 dot norm underscore hamming so you can read the details of this in its documentation and so for now we are just going to use the match so let's say match is equals to bf dot match and then we will write descriptor 2 and then descriptor 1. so it will match these and let's write it matches and this will give us all the matches so what we can do is we can we can print it out and see how the matching is working at the back end so we can write here let's say should we print this out okay let's let's sort it out first we are going to sort out all the good matches so we will take only the good matches and then we will display it and then we will use it uh later on so we are going to write dot matches dot sort we are going to sort them out and we are going to use the key is equals to lambda and we will write x and then x dot distance so basically all this does is lambda is a single line function what it does is it allows us to map basically or sort all of the matches values based on the distance so the lower the distance the better the match is so we will have all the good matches at the very beginning and all the bad matches at the very end so now all we can do is we can extract how many matches do we need how many good matches do we need so we can write here good and then we can write matches and we can extract for example we can say we need from the very start till let's say 20 matches okay or we can write 50 matches something like that so but what i want to do is i want to put a percentage so we can write here that we have the length of matches so however many matches we got we will multiply it with a percentage so we will say multiplied by percentage divided by 100 and then we are going to make this an integer so that we don't have decimal places so that will give us a number and what we can do is we can write here let's say here percentage and we can say that we need 25 okay so that should give us 25 of the best matches okay so once we have these matches we can draw them so we can write here image match is equals to cv2 dot draw matches draw notches and we are going to get what is this okay we are going to give our image and then we have kp2 then we have our image query and then we have kp1 then we have all the good matches and then we have the out image we don't want any and then we have the flag we'll put the flag as to okay so then we are going to um display this so we will write here okay let's just copy this and we can remove this part and we can write this okay and let's remove it from here and this will be basically y again and this will be image match so let's run this and see what happens okay so now we can see some interesting results so there you go so now you can see all the matches that we are getting actually it's drawing all of the matches is it i think we have a lot that's why it seems quite a bit um but it is drawing let's let's say that we only want the let's say 20 best matches okay so now you can see these are the best matches and you can see that it's not that bad it is able to detect uh here it's really bad it's detecting this and then we have it here so again if we put let's say a hundred this is by the way only for displaying purpose and we can see that we have a good amount of matches at the top and this is what we saw in the very beginning that it was detecting all the key points at the very top um it was detecting it over here where we have text extraction and ocr tutorial so it was detecting here and the other thing is that when we remove this it will actually give us a better results so yeah so what we can do is we can copy this and just before it's about to display we can resize only the img match so that will give us a better result and there we have it so now you can see it's a little bit harder to see now what happened there so it's a little bit harder to see now because the scale we have scaled it down but you can still see that most of them are still at the upper area of your forum and that is because we saw all of the key points being detected there okay so the next step is to find all the source points and the destination points so that we can find the relationship between these two so the relationship between the query image and the test image so once we have that relationship which is basically our matrix then we can align the image based on that matrix so we will write here that our source points so we will write src points is equals to uh numpy dot float and we will write 32 and inside that we are going to write that we need kp2 so we are basically writing a list comprehension so we will write kp2 at m dot query q u e r y id uh sorry index we are going to get the point for m in good so basically we are getting all the points uh within our good uh list and then we are finding the actual points of them then we will reshape this reshape to minus one one and two this is something that you have to do to make it work not much intuition there okay so we will copy this and here we will write destination so dst and here we will write kp1 and then we will write train and that should be it so now that we have both of these we can find the matrix m and the second uh output or the second result we don't need so we can write here cb2 dot find homography so basically finding this relationship is called homography so we will write uh that our source points source points and then we have our destination points and then we have the cv2 dot ransack that we will be using and then we have 5.0 this is our parameters so these parameters i have taken from the documentation so we are going to keep them default so now we can use this matrix to actually align our form so to do that we will write image scan is equals to work work perspective and then we will write image and then the matrix m and then we have to give the width and the height which we already have which are w and h and that should be enough so let's remove this from here or let's just comment this one out and copy it and let's paste it here and then we can write image scan so let's see if that works okay i forgot to rescale it so here let's remove this here and bring it here and we will do it for the image scan and there we have it so now you can see that all of our images they were let me show you so let's say this was our image before and now it is like this this is what this is uh i think the first one yeah this is this one so initially our image was like this and now we have it like this with the bird's eye view so the next step is basically that we are going to extract these regions of interest and then we are going to send it to our pi tesseract and then we will find whether the check box is sticked or not so all of this we will do in the next video and we will see what kind of techniques we can use to actually extract our information and then save it to a file so stay tuned for that and if you like this video give it a thumbs up and i will see you in the next one
Info
Channel: Murtaza's Workshop - Robotics and AI
Views: 32,743
Rating: undefined out of 5
Keywords: opencv, text detection, image processing, ocr, optical character recognition, text recognition, ocr api, opencv text recognition, opencv ocr, Text Detection with OpenCV, OCR using Tesseract, Tesseract (2020), Text Detection with OpenCV in Python, image to text, how to convert image to text, recognize text, detect text opencv, extract character, character recognition, opencv tesseract, TESSERACT PYTHON, ocr opencv, ocr python, ocr opencv python, ocr 2020, ocr forms
Id: W9oRTI6mLnU
Channel Id: undefined
Length: 27min 23sec (1643 seconds)
Published: Fri Sep 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.