Text Detection with OpenCV in Python | OCR using Tesseract (2020)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to my channel in this video we are going to learn how to detect text in images we will learn how to detect individual characters and words and how to place bounding boxes around them we'll also have a look at how to detect only digits I upload videos on a weekly basis so don't forget to subscribe and give this with your thumbs up if you found it useful so let's get started [Music] so first we are going to head on to the tesseract documentation and here we can find different methods of the installation so we have Linux we have Debian we have raspbian so we have a lot of different operating systems here for Mac OS we have you can use homebrew to install now for Windows we are going to go to the downloads page and in the downloads page we are going to go for the binaries so over here we will click on Windows installer so once the download completes we will just install it and the only thing you should know is where did you install it and most probably it is going to be in C and Program Files the folder name would be tesseract OCR so here I am in C and Program Files and we can locate the tesseract over here and you can see that we have our executable somewhere around here where is it there you go so we have tesseract dot exe so this is the file that we will refer to in our Python code so we have to copy this cut so now in pycharm we are going to first go into file settings and then we are going to go to the interpreter and we will add by tesseract so right bye just rocked and we will install it and the same way we are going to install OpenCV as well so we will write open CV and there you go so we will install that as one now while it's installing let me show you the image that we will be using so here we have our main folder and we have one dot PNG within our main directory so if I just open it up so it's just an image with my channels name which says workshop robotics and AI and then we have a few numbers 1 2 3 4 5 up till 15 so we can see how it detects now I think the installations are done so we can move on to importing so first we will import the opencv library so import c v2 and then we are going to import by tesseract import by tesseract now we are going to refer to our executable file so we will say apply the serac dots by tesseract dot tesseract CMD is equals to and here we will paste our link for the what you call okay let me put this here first and then here and then over here we are going to write tesseract dot e X game so let me close this out so you can see more clearly so there you have we have the PI tesseract executable file and then we are going to change all of these to backlashes and there you go so at this point we just want to run to see everything is fine and there we go so no errors so far so now we are going to read our image so we will write IMG is equals to see me 2.2 IM read and here we have to give the path and our path is 1 dot PNG and the thing with pi tesseract is that it only accepts RGB values and OpenCV is in BGR so we have to convert it before we can actually send it to the pi tesseract library so here we will write IMG is equals to c v2 dot CVT color and we will write IMG is equals to C V 2 dots color our GP 2 PG Arno PG r 2 okay let me just write it down so now that our image is converted we can display it so we can write C V 2 dot I am show and we will write the window name for example result and then image now I know that we just converted it into RGB and if you are converted back to be TR that's fine just for the demonstration purposes this should work as well then we will write C v2 dot weight key and we are going to delay it for infinity and there you go so we have our image and its display so how do we get the information or the text out of this now it's very simple all we have to do is we have to use one of the functions in the PI tesseract library so how can we know which functions are we talking about so we can just go to the by tesseract and over here we can see that we have a lot of different functions if you're not familiar how to go you can press ctrl + the left mouse button so ctrl + left mouse button and it will take you to the main script and then here you can see that we have the image two boxes image two data image to OSD so we don't know what each one of these does so what we can do is we can click on it again ctrl + left click and it will take us to where the function is written so here we have our doc string that actually tells us what this function does so the first one we will be using we have a few of them and the first one will be image - string so we will just send it an image and all we need is the text so let's go back and we are going to write right around here we will write print and we will write write a Surat dot image to string and then we are going to input our image and if we run that and there you go so we have our result and you can see that it's detected each and every character properly and it is displaying even the the space here is detected and that's why you can see there is a gap over here and then we have even the numbers the digits all of them they are in correct order and even though they are closely placed to each other the program is able to detect each one of them correctly ok so this is how you can simply get the raw information now how can we actually know where is the location of our character so that we can create a box around it and we can see directly what was actually detected so in order to do that we have if we go back to the functions we have image 2 boxes so this will give us the output in terms of the bounding box and text so if we go back and if we just change this or let me just copy this and I will paste this here and over here we will change it to image two boxes and let's print that out and there we go so as you can see now we are getting each character and corresponding to that we have the bounding box information so we have the X point and the Y point and then we have the width and the height of our bounding box so for each of these characters we have this bounding box information and you can see it goes all the way so how can we place these boxes on our image so let's do that so let's define this as detecting characters so first we are going to take the information of the size of our image so we will say Heights image and width image and then the third one channel we are not so concerned about that so we will leave it as it is and then we will say image dot shape so next we are going to what - let me close this okay next we are going to add a loop or should we write this first let's write this first we will remove that we will bring it down here and over here we are going to store all of the information in a list so we will say boxes is equals to all of the information being received so we will save for B in boxes dot split lines we are going to print B so let's run that and there you go so we are getting the same information but the thing is we need each one of these as a list so we can actually pinpoint which value are we referring to so it's easier to use that information so what we can do is we can say B is equals to V dot splits and then we have to define split using the space so we can see that we have space in between each one of these so we can say split each value based on the free space we need to print it again so we can see what changed so let's print that out and there we have it so now each one of these has been transformed into a list and we can access each element so the first one will be the text so the second one is X then Y then width and height so now what we can do is we can get this information first so we will say X Y width and height is equals to 2 B 1 B at 2 then B at 3 and then V at 4 now all of these are basically strings so we have to change them to integers so we can actually use them so we will write integer and then integer should have copy pasted and then integer so now we can use these values to create a rectangle and to put the text as well so let's first create rectangles so we will write C v2 dot rectangle and we have to define our image and then we have to define the X and the y so first we are going to define the X and then the Y and then we are going to define the width and the height but for the width and the height we have to add the value of x and the value of y so y plus height and then we have to define what do we have to define color okay so we will define let's say red so 0 0 255 and then the thickness as one and that should do it so we can comment these out and let's run that and there you go so now we have the boxes but something seems to be wrong something seems to be fundamentally wrong so what is it that is wrong so let's go back and look at our values let me just comment to one of these remove the comments from one of these and over here we can see that we have these four values but it seems that they are not in the in the order that we expected them to be so the X is fine but the height actually is opposite so what we need to do is we need to subtract our image height from Y so we are going to do image height from Y and then we do not need to add the X and the y normally we have two but in this case they're already giving us the direct information so we don't have to add this and we can just do height minus H and let's try that again and there you go the the thickness is not right let me just add a little bit and there you go so now we can see that it is detecting properly and we are able to get our bounding box for each of these characters now next what we can do is we can label these characters around their box so we can see that if they were detected properly or not so we can write CV to not put text and we define the image and then as you know the character is basically the first element in our list so we will save B 0 and then we have the X and the height image minus y and then we are going to write the font C v2 dot font let's grab any one of these and then we are going to write one as the scale and then the color we can put again red so 50 50 and 255 just to tip it out a little bit and then we can write thickness as 2 so let's run that and there you go but it's not very visible is it so we can we can bring it down a little bit so we can say here plus 25 let's say and there you go so now the characters are visible and we can see the bounding boxes around them so that's great so next we are going to look at how we can detect words rather than characters sometimes you don't care about the individual characters you care about the words so how can we do that so let me just copy all of this copy that and we will comment this out and we will go down and we over here we will write detecting words so now for that we have something known as the image image to data so let's just run this and see what happens so we are getting into an error now why is that let's just remove all of this and print out the boxes so prints boxes so let's run that and there you go so we have an output but this time around we have a lot of columns so we don't just have how many did we have before I think five and this time around we have a lot so we have level we have page number block number paragraph number line number word number then left top width height configuration and text so if we see the last one is our actual text so we are detecting words and what we can see even this is detected as a single word because it's quite congested so now we need to redefine our for loop so that it can handle the formatting of this data so let's go back and let's comment this out so first of all we can see that our first row is actually the heading so we don't want to use the information when it comes to the first row so the very first thing we will introduce is the counter so in order to add the counter we can say enumerate you can add a variable at the top and write count equal zero and then count that is fine as well but this is a little more efficient way to do it so now every time it loops it will put the value of the count in X so we can say that if X is not equal to 0 then we are going to perform all of these actions let me remove this B and let's see let's let's run it and see what happens so now we have again we have an error because our data is not in the correct format that is fine but as long as we are getting the be properly and we are not so if we look at our B it is not in the correct format now what we can do is to fix this we can remove our space and let it decide by itself so if we run that and there you go so now we have the data as you can see in the proper formatting and we can see the first four of them are how many how many columns do they have 1 2 3 4 5 6 7 8 9 10 11 so the first four have 11 and the rest of them the ones that actually have words our 12 so this is the information that we can use to actually print out only the ones that have the length of at least 12 not at least exactly 12 so we can write here do we want to keep it let's keep it so we can write here let me remove that so we can write here if length of B is equals to 12 then only we are going to perform these actions okay but now if you remember in the previous one we only had I think five values this time around we have 12 columns so we need to define which of these columns refer to the bounding box values so over here we can see that it is zero one two three four five and then the sixth one starts with our bounding box so it's six seven eight and nine so we have to use that so we can write here six seven eight and nine so if we use that and if we just run this I hope it works let's see okay so now we are not getting any errors but the location is definitely not right now this is the funny part here the information is given in a different format and I have no idea why they did this so here they are actually using the proper formatting and we can write simply Y and over here we can write width plus X and over here we can write height plus y and let's run that and there you go so now we are getting the correct bounding boxes and the last thing we can do is we can put our text so here we can remove that and b0 is not the text b11 is the text so we will write here 11 and then again we have to change the starting point which is x and y and then we have the font and everything pretty much the same so if you run that there you go so now we are getting each of the bounding boxes and we are getting the word that it corresponds to so next we are going to look at how we can detect only digits so if we want to detect only numbers what is it that we can do so let me copy this and okay let me comment this out as well and at the bottom we are going to paste it and now we can add configurations to our PI tesseract function and based on that it will filter out the data for us now what are these configurations let's find out so we can write configuration is equals to R and then we will write double - om and 3 and then double - p sm + 6 then we will say output base is digits now you can tell that these are the settings that we will enter here but why this what is the significance of this Oh a m3 and PS m6 so let me just put this here before I explain so we can write is it the second one no the second one is language so we need to define that we are defining configuration so config is equals to conk okay so let me just run this and see if it works and there you go so now we are only detecting the digits and not the alphabets so the same thing we can do with our characters so this code was for the characters and if we comment this out and if we add the configuration here and we add the config here and let me comment this out if we run that you will see that it works so coming back to our configurations we saw OAM and PSM and what are those so basically if you look at the documentation this image I have taken from the documentation and the first one we are referring to is OAM so here we have the numbers and om basically represents the engine mode so each of these Indian modes you can select so this is mostly the backend stuff that it runs so you can see here it will run tesseract only which is the fastest and cube only better for accuracy but slower then we have number two which is combining both and then we have number three so what we have done is we have put three and it will use the default so the next configuration is our PSM now again this is from the documentation the PSM is basically the page segmentation mode which basically represents the possible modes for our page layout analysis so here we can see that we have orientation and script detection only and then we have automatic page segmentation fully automatic and you can read all of these what we selected was the single block number six assume a single uniform block of text and this is pretty much the default that's why we use that now if you want any specific ones you can read them out and you can use which one you want now we'll put these images in the description so that you can refer to them and change according to your project so this is it for today's video I hope you have learned something new if you liked the video give it a thumbs up and don't forget to subscribe and I will see you in the next video
Info
Channel: Murtaza's Workshop - Robotics and AI
Views: 404,008
Rating: undefined out of 5
Keywords: python, opencv, machine learning, ai, text detection, neural networks, image processing, ocr, optical character recognition, text recognition, ocr api, python api, opencv text recognition, opencv ocr, Text Detection with OpenCV, OCR using Tesseract, Tesseract (2020), Text Detection with OpenCV in Python, image to text, how to convert image to text, recognize text, detect text opencv, extract character, character recognition, opencv tesseract, TESSERACT PYTHON
Id: 6DjFscX4I_c
Channel Id: undefined
Length: 28min 36sec (1716 seconds)
Published: Thu Apr 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.