Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey youtube my name is rob and in today's video i'm going to be showing you how you can use python to extract text from images there are a few different python libraries out there for extracting text from images and we're going to compare three of the main ones pi tesseract easy ocr and keras ocr as always i'll be working in a kaggle notebook the link is in the description we're going to be working on a data set called text ocr which has over a million annotations for text and images it's a perfect data set to test these different libraries out if you enjoy this video make sure you like and subscribe so you can be notified of any future videos i make and with that let's get to the tutorial okay before we start actually coding let's take a look at this data set that we'll be using it's called text ocr and it has a lot of images with the annotations of the text that appears in the images so if we look here in the files section there is a train and validation image folder that has a bunch of images that look like this with just a bunch of different images and their text and then we also see that there's here a few csv and parquet files annotations are in ano dot csv and they have all of the words that appear in these texts and the bounding boxes for those words as well as some information about the images in the image.csv okay so here we are in a kaggle notebook and like i said before the link to this notebook should be in the description below so you can click on it and you should be able to just copy your own version of it and run code if you have a kaggle account this image just shows some of the example images from our data set that we'll be working with with their annotations show you over here on the right side that we have imported this text ocr data set that we'll be reading from before we get too far though let's do some imports so we're going to import pandas as pd import numpy i think we're gonna need that then for interacting with files we're gonna import from glob we're gonna import glob and we might need to do some loops and track them so we're gonna import tqdm i do have a video on tqdm if you're interested in about how it works then for visualization we're going to import pi plot from matplotlib and then also from the pil package we're going to import image for viewing images now let's talk about a little bit about the outline of what we're going to do in this video first we're going to just take a look at the data that we have available after that we'll extract some text from images using three different methods the first one is pi tesseract then we're going to use easy ocr and lastly we'll use keras ocr after we're done with that we'll run on a few examples and compare the results so as i mentioned before we have a few different csvs actually let's read the parquet version so we're going to read in a few different parque files from our input folder the first thing we're going to import is our annotations which we're going to call anot and if i just do a head command on this we can see that this contains a unique id for each annotation each image that's associated with the actual bounding box and text that appears in that bounding box and some more detailed points in the area of that text so we're also going to read in the image parquet file if i run a head command on this you can see this just has each image id the width and height of the image and where the file name is lastly i'm just going to run glob on the directory that contains all of the images that we have labels for what glob allows us to do is run a star command which is just a wild card on this whole directory it's going to return to us the entire image path of all of the images in that directory so we're going to call this image fns for image file names now let's plot some example images now if i run matplotlibs i am read i can read one of these image files let's just read the first one and it returns a numpy array but i can show this using i am show like this now we can see this first image here so in order to associate each file to its annotation we have to pull out just the image id part of the text so i'm just gonna run a split command here negative one will get me the just the file name split here on the period and just take the first value from here which will be the image id we'll call this image id make this image bigger as an example and show this down here now let's just pull the annotations for this as an example and we can see here quite clearly that we have a bounding box for each of these words extra golden the text is here extra golden also the text from below okay and system let's go ahead and display for the first 25 images and i wrote this code that i won't go over too much but it's basically looping over the first 25 examples and then plotting them out into different subplots not only that we've taken the length of the annotations for each image and we'll plot that out in the title of each of those subplots let's use a style sheet so that we'll be able to see the text on the background here and there we go now we see each of these example images and you can see some of them have 30 different text annotations this one has 74 and some only have a few like like this one here only has five text annotations okay so let's start looking at the first method of extracting text from these images and that's pi tesseract if you're running pi tesseract on your local machine there's a few steps that you'd have to walk through to get it installed fully but luckily on the kaggle notebooks it's already installed so we can just simply import pi tesseract like this and then the way we call it let's show an example call is by running pi tester act and then if i hit tab here we can see all the different methods that we can use to run on our images and we're just going to run image to string we can give it the image file name let's do 11 here and give it a language which we're going to say is english this is image file names and let's go ahead and print this one so this has some text it's pulled from that image and if we display this image number like we did above yeah so we can see here that it's extracted some text but it's not too great now the thing that i found about pi tesseract is it seems to work the best on things like documents so for this sort of data set it doesn't seem to be a great option and that brings us to the second option which is easy ocr now easy ocr i believe uses some deep learning models underneath the hood and it uses that to run the image detection it's a little bit slower to run but it's um we're going to see how the results look so the way we run it is just by importing easy ocr and again this is already installed on the kaggle notebook environment so we don't have to install anything and then we create this reader now if we did have a gpu we could set that to true and it would make things faster i haven't turned on the gpu here you could and see how it runs by going here to the right side doing accelerator and turning on the gpu but let's go ahead and re set that to false to start out with all right now that we have this reader object all that we have to do is run read text on our image so let's give it the same images above i think we gave it index 11 and let's store the results as results okay now that that's run done running let's take a look at what the results actually look like so you can see that the results come in as a list of tuples and if we just look at the first one we can see that it has the bounding box location the letter that it identified or the text that identified it and then a confidence value for how confident it is in that text that it's read now if we do go into the documentation read text actually has a lot of different parameters that we can mess with so we could change the beam width the batch size we could also make some thresholding different for the contrast between the text and the background that it needs to see we also have some text detection thresholds for how confident it needs to be in the text before displaying that output we are just going to stick with the base parameters that this has but do know that if you do shift tab here you can read all the different parameters that you could change here to get more of a ideal result for your data set depending on what it looks like now just to make this a little bit easier to read i'm going to actually make this into a panda's data frame so we could see that if we name the columns we can see all the different text that was able to pull in from this image now we're going to move on to method three which is keras ocr and keras ocr actually is not already pre-installed here but that's okay we can easily install it just with pips so we're gonna do um exclamation point pip install keras ocr and we're gonna do dash q here just to keep it silent so it doesn't show all the text of when it's installing all right now that that's installed we're going to import keras underscore ocr and if we go to the keras ocr documentation you can see here that it has two core components a detector and a recognizer so we are actually not going to use both of these components we're going to use what they have which is a pipeline and the way we're going to do this is by doing keras ocr pipeline and we're gonna create the main pipeline which we will then use to run on our images so you can see that we could change the detector recognizer here we can also add a few parameters about the scale and max size we're going to take this pipeline you can see that it's automatically downloading some pre-trained weights that it will eventually use to make the predictions now that we have our pipeline we can run recognize on it if we look here again in the documentation we just give it image or a list of images so since it requires a list we will wrap this in a list and let it make the prediction now that this is done running we can see that keras ocr's results are a little bit different so we do have again a list of results but instead of having a confidence given back we're just given the word that appears in it and then the bounding box so i'm going to go ahead and store this as results and then i'm going to take the results and i'm going to like we did before wrap this as a in a data frame class and type in the columns which is text and the bounding box so at least in this example we can see that keras ocr did detect more of the text here but we're going to run this on a handful of selected images and compare them side by side now one of the nice things about keras ocr is it comes with a built-in annotation tool so we can just run keras ocr tools draw annotations provide an image like before and give it the results from the prediction now we do need to give it the actual red in image here and make sure we give it the first set of results let's go ahead and make this plot a little bit bigger so it's easier to read so now we can actually see on the image where it's finding all the different text and it's doing a pretty good job now because i found that easy ocr and keras ocr actually work the best on these type of images let's just run a comparison of a handful of images between the two and let's go ahead and take these first 25 examples that we have in our data set now you could run this on more images or your own images if you wanted to but let's just go ahead and run it on those first 25 for both keras ocr and ecocr so in order to do that we just need a loop so first let's do easy ocr we'll create our reader again we're going to loop through the first 25 images and let's also wrap this in tqdm so we can see how long it takes to run then we're going to get the result of that image by doing reader.read text on that image and then we'll also take out the image id which is this image with let's split on the extension name and just keep the image id we'll make a results data frame as a pandas data frame like we did above with these results and remember the columns we're going to say are bounding box we'll also take this image data frame and give it the image id with which we've extracted above and let's store this somehow so let's take these data frames and append this image data frame then we'll make easy ocr data frame which it will just be a concatenation of all these results and this needs to be for image in tqdm you change that to n this needs to be the reader okay we can see that this is now done it took about 2 minutes and 47 seconds to complete so it did take a little while but it's done running and if i do easy ocr data frame we can see we have i actually messed this up so let's go ahead and rerun this i needed to split this grab the end of it split again and grab this first one just to make sure let me run this in another cell and see yeah this will give me the correct image id i wasn't storing it correctly before but we'll just rerun this take a few more minutes okay now that i've fixed that bug and i check out the results it looks like now we do have the correct image id for each of these labels that we have here for the 25 different images that we ran now let's run keras ocr and remember again we can run this actually using that pipeline that we set up before so let's just actually make another pipeline just like we did before and then let's give it the list of images because this can actually take a whole list of images this will be just the first 25 images and then we'll call this uh results and let's see how long this takes by running the time command now it won't have to download these model weights again because it already downloaded them once when we ran it before and it'll run a little bit faster this time okay so it turns out that running that many images at once will actually overload uh what we're able to do so it restarted the kernel we're go ahead and start over re-run these first few lines and just to make things fair let's go ahead and loop through these results like we did the easy ocr results so using the same loop yeah let's just copy most of this and replace this line for using east to use keras ocr's pipeline now now this should be for this image and our result i believe is going to be the text and the bounding box and this result will call our keras ocr data frame okay a few things i also need to fix here is that our result is going to be just the first result from this pipeline since we're only running a single image at a time and you know what because this is taking a while i'm going to go ahead and turn on the gpu accelerator on this notebook and see what sort of times it takes to run this stuff with the gpus enabled so i'm going to rerun this with gpu set to true now so actually it does look like turning the gpu sped things up a lot so i'm going to rerun easy ocr and we'll see how long it takes to run let me turn gpu to true wow that's much faster with the gpu turned on for easy ocr you could see that took five seconds instead of before when it was taking over two minutes now let's run the keras ocr pipeline okay so keras ocr is done running using the gpu it was a good bit slower it took about a minute to run but i think i could have gotten this to run a little bit faster if i leverage the fact that this could take a batch of images instead of a single image we were just looping through and running image by image so might be something you want to test with later to see if you could get things to run faster but now let's actually plot these results we're going to use that keras tool that allows us to visualize the results but we're gonna run it for each image one by one so we're gonna create a figure and two axises let's try it like this size now the thing is we need here are our results to look like they should be in this format right a tuple of the word and then the bounding boxes i'm going to take my easy ocr results for just an example image id and if i think if i do this this will get us into the right format so i need to make this a tuple and i'm going to use some list comprehension here to do that there we go so i was able to take the easy ocr results for a given example and plot it here on the left side you see i gave it a table title easy ocr results now here on the right plot i just want to show the keras ocr results i think i should be able to do this here i just need to update which axis i want this to sit store to which is index one which will be the right side and give it a different title and let's go ahead and do plot.show very cool so we can see the results we can see that easyocr did get some of this text in the top left it got extra golden in the bottom text the keras ocr looks like it extracted out each word separately but it did a pretty good job as well but since we have the results for 25 different examples let's go ahead and plot those so this will be let's make this into a function called plot compare give it both of our results data frames perfect so let's do for image file name in image file names up to number 25 and then run this code block loop over results now we're going to see a different plot for each one comparing easy ocr with cara's ocr they both look pretty good it does look like karis ocr has split up the text a little bit more this one must be really hard to predict and again there's a lot of parameters that we could have tuned in each of these to get the differences but um this does give us a good example of what each of them get and miss all right so i'm going to keep scrolling through some of these examples so we can see them they look pretty interesting you can already see how some of them do better than the others and we can see here there's a lot of differences so one thing that's clear is that they both work their results uh do vary and it looks like it will just come down to what you're trying to get out of and then choose whichever one works best for your use case thanks so much for taking the time to watch this tutorial i hope you found it helpful we went through a few different ways that we can extract text from images and we compared two of the main ones that worked well cara's ocr and easy ocr i actually made this video because in one of my previous videos people were asking for this specific example so if you have a similar request please let me know in the comments below and feel free to check out some of my streams that i also do on twitch thanks again for watching and i'll see you all next time
Info
Channel: Rob Mulla
Views: 70,915
Rating: undefined out of 5
Keywords: optical character recognition, text recognition, python ocr, image to text, python tesseract, keras_ocr, keras ocr, easyocr, rob mulla, optical character recognition using python, text recognition python, how to extract text from image using python, optical character recognition python, python tesseract ocr, image to text with python, computer vision projects, image processing python, text detection, text detection in image python, text detection python
Id: oyqNdcbKhew
Channel Id: undefined
Length: 22min 21sec (1341 seconds)
Published: Tue Jul 12 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.