OCR OpenCV in FORMS and RECEIPTS | Text Detection p.2/2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to my channel this is part two of the ocr with opencv tutorial now the idea is that we are going to use a forum and we are going to detect the text individual text information out of it so we have this awesomeness form uh which basically you have to register to being awesome right so so far in the part one what we did was we extracted this a form and we got the bird's eye view of this forum using feature detection and in this part we are going to extract the regions of interest and then we are going to send it to our pi tester act to give us the final answers and then we will overlay it on the original image so without further ado let's get started so here we are in the pycharm environment and you can see that we have the code from our previous video we are importing the packages we have the image imported and then we have created the detector and found out the key points and the descriptors and then we extracted or imported all of these images that we had in the user forms and one by one we created uh or we found out their key points and descriptors and then we used the brute for uh brute force matcher to actually match these descriptors so we we took the best 25 of the matches and we drew them out and at the end of the day we used these matches to find the source points and the destination points and then we used the matrix to find the uh what you called the word perspective or the bird's eye view of the forum so now that we have all of this information and it is pretty much straightened out we need to find we need to extract the regions of interest so what we will do is we will write here a list so let's write down here so we will have a list of roi and this will contain two elements so the first one will be the x y and the width and height so this will be the bounding box information for example our image is uh our text box is present at 50 pixel and then 60 pixel and the width of it is let's say 300 and the height of it is let's say uh 15. now 15 is too low let's say 50. so this is the idea so once we have that um we need uh commas in between then we will have another attribute which will be basically whether it's text input or whether it's a checkbox so we can write text if it's text and we can write for example box if it's a check box and we can add another one uh which will be the name so we can write whatever the name of the element is for example this is phone number or let's say this is the check box for sign so we can write this like that so this is the basic structure of our region of interest so we need the bounding box we need the type of it and then we need the name of it so how can we find all of these information so if i go to my image let me open this in let's say paints so right click and edit so if i have this image and if i if you look at the space over here it will show me the pixel number so if i move here let's say this is my point so it's showing me that it is 372 by 1432 pixel so this is the xy position so what we have to do is we need to know the first position which is this and then the second position which is this which is the diagonal uh position so we need to know for each one of these so for example for our phone number we need to know this point and then this point and for our text box uh sorry the check box we need it from here till here so all of this is very painful to do if you are doing it manually and what if you change something in the forum or what if you have a new forum so that will be very annoying so what i've done is i've created a script that will generate all of this information for us so let me close that and this script is nothing fancy it is just using mouse points to save these values and what you have to do is you have to input as a user you have to input what is the name and the type so let me show you a demo so let me run this let's right click and run this so here we have this and what i have to do is i have to click on this this is the start point it doesn't have to be perfect and then this is the end point so here it will ask me what is the type i will say it is a text and what is the name of it i will say its name then it will keep running and i can write here for example this one is text again and then we have phone then this one is the check box so we will write here uh box and then we can write let's say sign and then we have allergic now you can do a mistake here as well it's fine it's not a big deal for example this is a box but we write it as text it's fine and let's say this was allergic so i will show you how to fix the error later on so then we will write email and we will write here text and then email again you can make spelling mistakes as well it's fine we can change it later on then we have awesome id we will write here text and then id then we will go back and write city again i have put it really high up here and then over here it doesn't matter that much just uh you know it should be in the ballpark so it is city so we will write text and then city city and the last one is basically our country so we will write here text and country now this part is very very important i know i will be receiving a lot of questions about this so i'm explaining this very slowly once you are done you have to click on this part the top part here where it says original image click on this part and click on s the keyboard s okay once you do that it will generate all the information for you and you can double click it and copy it back to the main part so here instead of roi you can just paste this here and if you want you can give it some spacing so that we can see it properly so let me bring this down and where were we okay we were here and there you go and there you go okay so now this is basically our roi so this is the xy position and this is the width and the height at the end so this is not actually the width and the height this is the width plus the x and the height plus the y so yeah and as i mentioned before if you make a mistake for example allergic is basically text is basically box so you can come here and just simply change it to box it doesn't matter if you made a spelling mistake for example you wrote email like this you can remove it and you can simply type it out okay so once we have all of this information what we will do next is we will loop through our roi so you have to think of this right now we have when we are running this we have one of the images now for this one image we have to loop through all of the region of interest so we will say that for x and r in enumerates roi so again we need the index as well so we are using x as well so r will be the region of interest so basically each time we are looping we are getting one of these so one here then the second one then the third one where is the comma for this should be a comma okay so yeah so that is the idea and then we will go down here and before we actually use these let's just print out uh all of this on our original image so what we will do is we will create a image which was pretty much the same size of what our original image is so we will write here that i am let's say show img show this will be the final image so we are going to write here that image scan dots scan dot copy so we are going to copy this image and then we are going to create a mask so image mask now this will be a little bit confusing but once we do it you will understand what is happening so we will write zeros basically we are creating a mask of the same size so we will write what is that it says it should be dot okay so we will write here image uh scan or emit no image show okay so we will write this here it's going to be the same thing but we will write image show so this will create this mask and what we will do is we are going to create rectangles on this mask and then we will overlay it on our original image so we will write here cv2 dot rectangle and in that we are going to write that we want to paste it on our image mask and then we are going to given our information so the first element will be the x so it will be the zero zero so basically we are accessing uh the first matrix or the first element which is this one and inside that we are getting uh our r0 okay i think i explained this a little bit wrong we have one bracket which is one list and within that list we have um four elements rather than three elements we have four elements and we have r1 r2 r3 and r4 so these are our elements and because we are starting from 0 this will be r0 so we are getting r0 at 0. so this will this will be our x and then our y will be our 0 at 1 then this will be basically our initial point and then we will give in the diagonal points which will be r1 so let me just copy this so here i think i'm making a mistake in the brackets uh no this is fine here okay that's fine okay so we will write here one and we will write here one so these are our um x and y and this is basically our diagonal points and then we are going to write the color let's say we will put green 2 5 5 and 0 and then we can write the font or no it's the thickness so we are not actually writing text so sometimes i forget stuff okay so we will write cb2 dot fill so this is the thickness and that should be enough so we will write here i am show i am g show is equals to cv2 dot add weighted and then we have the first image which is i am show and we will give it 0.99 and then we have the image mask which we will give at 0.1 you can change these values if you want and we will give the gamma as zero okay so then we can simply print it out so let's print out the i am show after the for loop so we will print it out here and let's see what it looks like oh this is the other code we will run the main ah it did not work why didn't it work so it is supposed to let me see cb2 and it is y and i am show okay so i am back and i have found the issue which is quite funny and this is the resizing because it is resizing it's not putting it in the right position so what we have to do is we have to cut this over here now again this is just for viewing purpose and i forgot about that that is why it happened so we will paste it here and we will convert it to i am show and i am show and that should be fine so if i run this now there you go so now you can see that these are the regions where we are getting our information from so next what we have to do is we have to crop these regions and then we can send it to pythorac so the reason we actually highlighted them to see that we are detecting the correct area okay so now that this is done i will comment this out just in case um then we are going to move on to the cropping so for cropping we are going to write image let's say crop is equals to img scan so we are just going to use the slicing method within our array so we will write here that we want to start at r0 0 1 so basically first we have the height and we want to go till r11 and then we have the width so for the width we have r 0 0 this is the starting point 0 0 and then we have um r 0 1 0. so this should crop the image so what we can do is we can print this out and if we want to see we can print each of these out so we can write here instead of y we can write x and we can remove this two and we can write image crop we don't need to resize it hopefully and we got an error image crop we have an error oh yeah this error is because of the string we have to make sure it is a string when we are printing it out so let's run that and there we go so each of these fields is now being cropped properly and we can see that we are finding the relevant information so even the what he called checkbox and all of the text so that is good so next we will check if our region of interest is basically a checkbox or a text so we will write here if that our r2 r2 is basically uh this value so this is r0 r1 r2 so r2 r2 is equals to let's say text then we are going to send this to our pie tester act so we will write here my data dot append so this will be the what do you call where should we paste it not here i think we can put it above here yeah so let's put it here my data is equals to empty so this will be the data for each of our images so each of the forum all of the data will be stored here so we are going to append this so where is it at mydata.append and then we are going to append uh we are going to append pi tesseract dot image image to string where is it image to string and then we will write image crop so that should append the data if you want to print it we can print this out as well so we can write here let's say print and we want to print this let's write a string and over here we can write that our r3 r3 will be the name so let's put it like this r3 is the name and then it is basically this so that should be good so let's print it out and see what happens there you go so we are getting some information so here we can see that we are getting some information so we are getting the name properly then the phone number the contacts and but we don't know which forum we are working with because here the forum finishes and the next one starts so i think it will be better if we print it out here so above our for loop we can write print we can write here extracting data from form and then we can write the forum number so the forum number is basically the value of j so we will write it here j and that should do it so let's run this again so it says extracting data let's let it run a little bit more let's stop this so here it will say extracting data from uh forum zero then extracting data from form one still it's not very clear so we can write let's say like this so it's a little more clear there you go so now it will be quite visible there you go okay so next what we will do is we are going to let's bring it down here uh we can remove this because we don't need to print this out and then we can we'll keep this here and then we are going to write that if it's uh let's say r2 r2 is basically equals to box then we are going to perform some other action so those other actions are basically first of all we are going to convert it into gray so we will write here image gray is equals to cv2 dot cvt color and then our source image is our crop image so crop and then we will write cv2 dot color underscore rbg sorry color underscore underscore bgr to gray there you go then we are going to use image thresholding so we will write here image threshold is equals to cv2 dot threshold and we have to give in our source image which is our image gray and then we will give in our threshold let's say 170 and then we have to give in the maximum value let's say 255 so this is understood it's 255. 170 is the threshold after which it will detect as one and below that it will detect as zero and then we will write cv2 dot binary threshold inverse so binary is it like this um it's thresh binary inverse yeah this one so then we are going to get the second element of this okay so now we need to find the total number of pixels that we have so if it's dark region um so basically we are inversing it so when we inverse it then the dark region will give us one and the bright region will give us zero so we need to find how many uh dark regions we have how many pixels are were dark that were converted to brights or one so we have to find the total pixels total pixels is equals to cv2 dots count non-zero there you go and we will given our image threshold and there we have it so what we can do is we can print out this value so print total pixels and let's see what do we get okay so here we are getting the values let's see what they are let it run and there you go okay so in the first form you can see that the first one is filled and the second one is empty so here we have the first one is filled so we have two thousand something and the second one is zero pixels for the second image it's the same for the third one both of them are filled so if we go to the last one yeah you can see here for john williams both of them are filled so we have something above 2000 so we can say that anything above 500 maybe can be considered as uh checked in the box so we will go here and we are going to say that if the total number of pixels is greater than our count uh or let's say pixel threshold then total pixels is equals to one else total pixels is equals to zero now why is it like this okay so what is that now we can define the pixel threshold so let's go here and over here we are going to let's say right 500 so if we run this again actually we didn't print it out so we need to print and we need to append so we have to do both things so we will copy this and we will paste it here and then for r3 we are going to print out the total number of pixels and then we are going to append our data as well so we will append and then again we will append the total number of pixels so let's run and see what happens and there you go so now we have um sine as one and allergic as zero then sine is one allergic zero then sine is one allergic as one so these are our um what you call check boxes but why do we have this what is this um let's see okay so i am back and i found the problem actually uh the reason it was displaying all of that is not because anything we wrote wrong in the code it was just the version so i wrote this code about maybe three or four months ago and at that time i was using a different version so if i go to file and i go to settings in pi tesseract i was using 0.34 and right now i was using 0.36 and it seems that there is some change in that and it's not detecting properly um i don't know why so here it was detecting i think an extra line or something like that so here i reversed it back to 0.34 so if you just double click and if you specify the version so here i chose 0.34 and now it works perfectly so if i run this now you will see that we have the name without any spacing or new line and all of the information and this is the data that we are saving at the end of our loop and then it goes to the next one it detects all the text saves it to the uh my data and then it goes to the next one so but the thing is that our data is not been actually saved it's just been appended to my data so what we can do at the end is to just save it in a file so we can go here and we can right click and create a new file and we can call this as data output out put dot csv and inside that we are going to write uh what were our headings so let me write it from here so we had name and then we had phone then sign then allergic then we had email then id and city and then country so these are our parameters that we have to save and then we can hit save so these are this is a csv file comma separated values so that's why we are putting commas here so that should be fine and let's press and enter and save so let's go back to our code and over there we are going to once once this loop is done we are going to write here that with open and we will write data output dot what is this okay so data dot csv and then we will write that what is happening why why would it do this that's just horrible why would you block the vision anyways so this is a plus which means we need to append and then we are going to write as f and we will write for data data in my data we are going to f write f dot write the string value of our data and we want to add a comma in front of each one of these so that will be our comma separated values and then we will write f dot write we need to go into the next line so we will write here new line so every time it runs it should save it in a new line and that should do it so let's run it and see what happens okay so it's done we can close this and we can go to data output and there we have it so now we have the name uh the phone number sign allergic and then we have the id then the city and then the country so we have all of the information and it is in the correct format so if i open this up let's say in excel you can see that we are getting uh the correct format and now we have all of this information in our excel sheet so that is pretty good and i don't yeah here the id we can expand so yeah so that looks pretty good and as we add uh keep adding more it will append if i run it again it will actually append again so if i go back and if i run it again permission denied because we have opened the excel file let me close that and let's run it again it's running and now you can see already we have two more and now it's done with the third one if we close and open it again we have the third one again so we have these three and then we have these three so it will keep appending as you uh as you keep adding more so this is good but uh what we want to see at the final result is the actual image with the names or the uh text detected written on the actual image so that is what we are going to do now so for that all we need to do is after every iteration of our region so whenever we find a region we are going to display the name on top of it whatever we found so we are going to write here we are going to write cv2 dot put text and we are going to given our image which is image show this is the image that is for the result and then we are going to write a string so the string is basically my data at x so whatever the data i have at this point with this index so that will be my string and then we have the region again r0 and then zero then we have r o zero and then r one and then we are going to put uh let's say cv2 dot font uh which one should we pick let's pick plane where is plane yeah this one is plane and then we will write let's say 2.5 and then we will put the color as red so zero zero two five five and then we will put it as four so that should be good let's run it again and see so again it's going to give us the original image so you can see it's giving the name right john williams and it's writing it there but again it's too huge to view so we are finally going to resize it again before we show it so let's see how it works okay so there we have it so these are the three images as our final results so you can see all of the names are detected properly and then we have the check box as well and then we have the id the email address even the ad is detected properly then we have the city the country so all of these are being detected properly and you can see that even though there are some gaps here and there you we are still able to detect it properly without any issues so this method you can apply to any forum or receipts all you have to do is you have to go to your region selector and make sure you change the image name so this is the query image so maybe i will put a path uh path variable here let's say like this and we can write here query.png and we can replace this with path over here so all you have to do is you have to replace this name and then you can select all of the points and you can bring it here to the main file and you can paste it here and once you do that it will automatically do all the calculation for you it will show you the output result so in my opinion the results are very good but again this is for printed font and for handwritten you have to add some different technique or you have to add your own data to actually detect uh the text properly so this code you can use later on as well uh if you actually train a better neural network model you can add this uh you can add that neural net model to this code and you can run it to detect the forums so all you have to do is you have to change this part where we are using pi tesseract you could use your own model to find the detection on the image crop so this is it uh for today's video i hope you have learned something new if you have any suggestions or comments leave them in the comments below and if you like the video give it a thumbs up and if you haven't subscribed do subscribe and i will see you in the next one
Info
Channel: Murtaza's Workshop - Robotics and AI
Views: 30,578
Rating: undefined out of 5
Keywords: opencv, text detection, image processing, ocr, optical character recognition, text recognition, ocr api, opencv text recognition, opencv ocr, Text Detection with OpenCV, OCR using Tesseract, Tesseract (2020), Text Detection with OpenCV in Python, image to text, how to convert image to text, recognize text, detect text opencv, extract character, character recognition, opencv tesseract, TESSERACT PYTHON, ocr opencv, ocr python, ocr opencv python, ocr 2020, ocr forms
Id: cUOcY9ZpKxw
Channel Id: undefined
Length: 37min 43sec (2263 seconds)
Published: Sun Sep 13 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.