Text recognition (OCR) with Tesseract and Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi welcome we're going to see today how to use desert with Python and OpenCV to read text from an image we're going to see first how to install tesseract and five tesseract to work with Python then second we would use test work we a really simple image where the text is really clear to see how fast we can read the text from an image and third we would use desert to read text from an image where the text is not so clear so that we can learn how to tune whether the OCR to work with different types of images if you are ready let's start we are now going to see the installation here is the official test fact documentation where you can find the instructions for the installation on Linux different versions of Linux you want different distributions of ubuntu debian and raspbian and order as well then there is also for Mac or s and also for Windows today we're going to see right now on this video only the one for Windows but if you want to install this Ergon mark on you boon so you can either follow this really clear official documentation or you can check on my blog where I will put a simplified version of both of them let's start we this page where we can find the download file for Windows they have four little bit version and 64-bit version most likely the new computer they all support 64-bit so just let's go with the 64-bit one and in case you have any problem that you can just download the 32-bit version we take this one and let's download this and while it's the loading let me quickly explain how the installation process works there are two main steps one is to install tessai which is the OCR engine but installing the engine is not enough if we work specially work with Python and open CV once we install the test engine we need to link the engine with Python and open CV so first we start to install the engine so let's do that so we press yes setup is loading English or whatever language you want I agree install for anyone using this computer then language data we leave everything as it is except if you want to add additional languages you can just click here probably the other one this one additional language data and you can choose which language you want I will just go with Chinese English is by default already so we'll just add as extra Chinese we can go further test Rakosi our we will leave the default path you can change it but I will suggest to leave default path so that it is easier to follow how to connect later the PI tesseract with the default path next and then finish first step is done then we go I have pie chart where we have Python and OpenCV if you don't have Python and open CV then I have a few tutorials how to install Python and open CV so I will link them on my blog page and now we have to install PI test drugs so from pycharm I go on settings here you have my projects of project and project interpreter and then all the modules that I have on my Python these are all modules that I have Python 3.7 and then I'm going to add a new library and then I say PI test rack and then install the package so we already installed test art which is the engine and then PI tesseract are the binding Python to link test art with Python so that we can call tester from Python code and here the installation went smooth it was really quick and so now we can go with our detection first of all let's import silly too then we're going to import numpy as NP and as first step let's just load the image and nothing else so the energy is equal to see if that in read and here if you need to put the title of your image and I have an image big slip dot jpg and then let's show this image still to the inch all Eng and then energy and see with that white key zero to keep the image open and that's pretty much what we have to do to show the image here is the dish drive image with the text the big sleeve this is just an image now our goal is to read the text from this image so what we do first of all let's import Titus word import file desert then we need to tell fighters act where we have our tesseract download install on our computer so we can do this by saying PI tester got PI tester dos dot test rack CMD and this is equals to now we need to put the path where we have installed the desert engine and if you left everything as it was the default on installation it is for everyone the same path so we have Program Files and then on program files we have to look look for tester of CR and here we have the tester dot X somewhere here so this one so I'm going to copy the top and then slash test Rock dot X so you don't need to look for this path that is the same for everyone if you download the 64-bit version and you install everything like it was as default now we need to read the text from the image so here we have the image let's go and read the text so it is the text is going to be equals to PI test rack dot PNG to image the string and here simply we have to give the image IMG and then let's just print the text let's run this one we take the image here we see the text let's now see how it performed so let's just take this and let's read it so the big leap and it recognized the capital letters really well and also here it seems really well detected it was about 11 o'clock in the morning made of other with this not shining and look and also how the text is the formatting of the text is like the original here we see maybe some a pieces that were not in the real text so there is something but in general it works really well so you can see how we've just like three lines so one okay actually two lines because this is the configuration of test work so we took the image and then we will detect from this image then we printed the text but this these are just the two lines that we need take the image and then give us the text and here we'll just print the text and of course this was really simple as we took our really clear image let's instead see how it works with some other images that where the text is not so clear so where the image is not as can be utilized but it just via pictures so let's go now to take the image so just just wait I'm looking for the image and here I have the image so this is a picture that I took from a book and so what can we notice in this picture we can notice that there is the page of a book here at the top we can see that the text is not perfectly in a horizontal line but here it's it is following the wave of the book so it is going up a bit and also the lightening is much different here we have a lot of light and we are there is less light and then below we have an area where there is nothing there is no text so we don't want this area to be recognized as a text let's see how the OCR performs with this type of images so I'm going to load this image into OpenCV so I will take now the image which is book page dot jpg and let's now print the text the image is quite big so this is the size of the original image it's a bit that we don't see half of the image with the height and this is our text as you can see the result is just terrible there it's just we can say that it didn't work almost at all what can we do to improve test worked we can approach this problem we can approach this problem in two different ways the first is to pre-process the image so try to apply some filler to the image so that the OCR engine works with an image which is clean which doesn't have all the things unnecessary things that are are in the image right now and this is the first method that we can use and the second is to change the settings of the OCR we will use both of them but let's start first with the image processing which in is the one that is going to be the best tool to have the best results let's run this again and let's see how we can approach this so first of all I'm going to shrink the size of the image a bit so that we can all see it and also the OCR will work with smaller text so M G is going to be equal to C we do dot resize we're going to resize IMG we don't specify any size right here instead we're going to specify the size by percentage so we didn't give different pixels but we just say F X 0.5 so we're dividing so multiplied by 0.5 so dividing the width of the image and also we divide the height of the image and so let's run this one probably already this the small change by making the image smaller give us some interesting result the interesting result is that we are able to read the title of this chapter designed before you implement design before implement you see and then some of the text that we have below so we can not read this first part but then if the project involves designing a product or service ensure you have the best possible answer so let's now see what we have here so the last part is not detected correctly also what can we see in this image we see that the change of lightening is affecting the detection in this part the text is not so clear as it is on this one so the first step that we did super set the image was to shrink the size of the image now another thing that we are going to make is to convert the image into black and white or let's say better grayscale image is it is not black and white yet it's just a grayscale gray it's going to be equal to C V so dot C V T color we're going to convert what we're going to convert the EMG see if it so color underscore we're going to convert the BGR format so blue green and red which is the default format of open CV to gray and now instead of reading the text from the image right away welcome to read the text from the gray image and let's run this one okay another improvement right now design before implement particle if the project involves designing a product or service ensure you have the best possible answer we're in the design phase before you start implementation ensure here still we are not able to read this part where there is some wave but then we can read the best possible answer in the design phase before you start and here of course we cannot read it an old 80/20 rule says that represent of the prod and it goes on this way so we see some part is recognized well but other parts heal not so well and also here it is going to end so we see the last sentence okay the last few words in some cases retooling and here it should be this case is retooling and then we have some extra world that isn't here so tesseract is trying to read words also in this part of the image where we don't have any text now how can we approach this problem there are a few ways but I will go with the most simple one and instead of converting the image into black and white so we converted the image into grayscale now we are going to convert exactly into black and white and this first let me show you the rescue image so that we can understand better what this is see that in show in Sheol gray and then gray and that's on this one okay this is the gray image grayscale image where we see we have different gradients of gray color and now our goal is to put a threshold and say below this value is background and above this value is text and as in this image we have different different levels of lightning we're going to use the adaptive threshold and I will explain that quickly how it works but if you want to understand this deeply I have a few videos about threshold so the basic threshold to understand will distraction and also what is adaptive threshold so from gray we say adaptive threshold is going to be able to see through that adaptive threshold we're going to apply the threshold on gray the maximum value is 20 55 so everything below the threshold will be made white and now we're going to use the method CV to threshold we're going to use the adaptive thrashin Gaussian and we will take an area of 91 and a constant of 11 and I will explain this so on so let's now see what's the result of this adaptive threshold in show adaptive threshold and then adaptive threshold and let's run this one missile requirements or something is missing and yes I forgot the method see it's to that thrush minor and let's run this one can the adaptive threshold is really useful to remove the background when we have a background which is not homogeneous let's put it this way because it takes really small area automatically and it finds the best value to pull the threshold to put either image into black either white and we're working with block sizes of 91 pixels so that's why we put with the pot 91 so I will show what it changes if you put different values so let's try with 11 this will be 11 so the bigger you make it the more noise we will have but also the text will be more clear so here we see that also some parts of the letters are going to be considered as background so that's why I use a really big area we can even use bigger than i1 let's try we label at 1:13 for example hmm maybe now there is too much noise so let's write a 5 this seems a good value so you see we have the text really clear but at the same time we were able to remove big part of the background second we have another value which is 1111 is a custom value which gets removed from the threshold value what it means in simple words there's mouldered this number the more noise we have the bigger this number the less noise we have but also there's the chance that with a big number it is going to remove also the text so if for example we use very one that's very good it is it is you it is deleting some part of the text if we use a really small number like 3 we have too much noise so that's why I've found these numbers to be to work well and anyway after I've explained this let's look for the text instead of on the gray image let's try looking for the text on the adaptive threshold and let's run this one and let's see now what is our result design before you implement probably the text is bit better of course there is still some I believe 80% of the text is fine but you have 20% especially okay so not this one especially on the left side where there is this wave there is no detected correctly this was some example of image processing there are a few more image pre-processing that you can apply for example you can put a Gaussian feeder to remove some other noise or some other maharajah transformation so also remove some noise but I will not go into it as not to make this really long instead let's see the settings of the OCR we have different type of setting which is pages mode values I will not go into details about this because my goal right now is just to give beginners a basic understanding of how to use the OCR but you need to keep them in mind if for example you want to read another full text but just a single word for example if you want to read the text from a plate so for plate detection you will need to use this number eighth so treat the image as a single word if for example you have just a single line single text line and not just a full page you will use number seven and this will what perform better now we're using the number three which is further mated page segmentation so if you want to use this setting set and change the page single you can do that adding a line config and in config you said PSN which is the page segmentation mode and then here you put a value from 0 to 10 by default is 3 and let's try for example another another modest right before assume a single column of text of variable sizes for example we can try for then also when we read the text from what I tesseract image the string we need what config config equals to config and then we can run this and let's close this I see that this works much worse than the number 3 that the default method at least in this case I see this number that are not in the text that if we if we use the default value we don't see now another thing and the last thing for this tutorial I want to show how to read the text from different languages so I'm going to load Chinese text from an image mg is going to be Chinese text I'm going to run this files so okay Chinese text the PNG so I did put the wrong deck right name but the wrong format so let's run this one so this the original text is after the adaptive threshold not big difference and this is our result because by default the OCR is set to the English language instead we need to add blank and then chinese Simplified so usually it's just three letters for the for the languages in this case is also we need to add simplified so for example English would be Eng Italian IT a other language is now didn't come to my mind but three three letters and let's now use this let's see how it works so I'm not expert in Chinese I don't know it at all so I will try just to see if the looks they look the same so this one it looks exactly no okay this one looks wrong this one might be the same no okay even this one looks wrong this one looks fine this one it seems correct it didn't perform so well so we will try also in this case to make the image even smaller so let's try 0.4 instead of 0.5 I see that the OCR doesn't work so well with really big images with really big text I also put this example that don't work really well just to show you that even if you put the OCR and if you try to use some pre-processing images still to tune it really well it's not at that really simple operation instead I noticed that some other services like the one I put also a video about OCR API that they work much better than tesseract when working with different type of images so if you want to use the API with self service that you will have to pay to have we got result then I will also put some link in the description on my blog about that I hope this was useful for you all I want to announce enough for everyone who is interested in computer vision that I am slowly building the Academy that will be available in a few months hopefully it's a lot of work that I'm putting I'm putting into it recording the lessons here YouTube I still put and will still keep putting really simple tutorials but for everyone who is interested to have much deeper understanding of computer vision want to learn more advanced topics and deep learning so the Academy is something that you might want to check out that's all for this tutorial and see you
Info
Channel: Pysource
Views: 45,587
Rating: 4.9727893 out of 5
Keywords: tesseract, pytesseract, text recognition, ocr tesseract, ocr opencv, python ocr, python text recognition
Id: JkzFjj2hjtw
Channel Id: undefined
Length: 31min 32sec (1892 seconds)
Published: Thu Apr 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.