Text detection with Python | Tesseract vs Easyocr vs AWS Textract | What is the best OCR?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

So this is exactly the project in which we will be working today on today's tutorial, today we're going to work with text detection and I'm going to show you how to work with three super powerful OCR Technologies which are tesseract, EasyOCR and AWS Textract. I'm going to show you how to work with them and I'm also going to show you how to compare their performances, and now let's get started. Now let's get back to Google colab this is a notebook I have created in my Google colab and this is exactly where we will be working on today tutorial you can see that this notebook is comprised of 1 2 3 4 5 steps, in only five steps we will have completed this process and you can also see that the first three steps are ready and the only thing we're going to do is to execute each one of these cells then I'm going to show you how to do all the coding in this cell over here and also this other cell over here where is where we are going to do the text detection and this is where we're going to compare the performances so this is the notebook in which we will be working today and in this tutorial we're going to work in Google colab but we're also going to work in Google Drive so please make sure you create a directory in your Google Drive in order to work on this project, now let me show you the data we are going to use in this tutorial, this is the data I have prepared for this project and you can see that these are a lot of images which all of them contain text right all of them have one or two or a few words of text and this is exactly the data we will be using today and you can see that for each one of these images the filename is exactly the text that it's contained within this image right for example in this case it says collect moments not things and if you look at the file name it's exactly the same right collect moments not things . jpg and if I show you another example for example this image over here it says color is not a crime and if you see the filename is exactly color is not a crime jpg and if I show you another example for example this one... come on in we are open pull handle and in this case the filename is exactly come on in we are open pull handle.jpg so this is exactly the data we will be using in this tutorial this is a dataset I created in order to work on this project and if you want to follow along this tutorial you can just go ahead and use exactly the same dataset as I am going to use or you can just use any other text detection dataset you want, as long as you use images with some text on them and as long as you have the annotations for the images you are using you can just go ahead and use any other dataset you want let me give you an example if I go back to my browser this is a very popular and this is an amazing dataset in order to work in a text detection project which is coco text this is an amazing dataset please take a look at this website because this is an amazing dataset and this is for example one of the datasets you could use in this project, I have been reviewing the images in this dataset, I have been reviewing a few of these images and I noticed that some of these images have a lot of annotations for example this image we are currently looking at you can see that we have all these text over here we also have the license plate we have more text over here we have all these text over here we have this logo over here we have a lot of text in this image and if I show you a couple of other examples this is another example you can see that we have a lot of text we have the annotation for all these text and some of this text is even in a different orientation so this is an amazing dataset and there are many pictures which are like this right with a lot of annotations now remember that in this tutorial I'm going to show you how to use all of these different technologies but I'm also going to show you how to compare their performances and in order to compare their performances we are going to use a similarity metric which I'm going to show you later on and in order to use this similarity metric and in order to make sense of the results I think it's going to be much better if we use much simpler images like these ones over here which I have selected right it's going to be much better if we use an image with only a few words with only one sentence with only maybe a couple of sentences if we use much simpler data like all these images over here is going to be much better considering we are going to compare the performance of all these different algorithms so it's going to be much better and you're going to see why later on you're going to see why once I show you the similarity metric we are going to use today and remember this is a dataset I created myself I just downloaded all these images and I just annotated all these images right the filename of each one of these images is the text that's written on them so I created this dataset myself and if you want to use exactly the same data I am going to use in this tutorial you have two different options, you can either download all this data yourself I got all these images from pexels so you can just download all these images yourself searching something like 'text' you're going to see many of the pictures I'm going to use in this tutorial you're going to find many many many images with text and then you can just go ahead and annotate these images the same way I did with all these images over here right the only thing you need to do is to change its filename to the text these images contain right and that's pretty much in order to create this dataset so that's one of the options in order to use exactly the same data I am going to use today and the other option is just to download this data which is going to be available in my Patreon so it's going to be available to all my Patreon supporters so these are the two options you have in order to use exactly the same data I am going to use today, you can either create this dataset yourself or you can just download this dataset which is going to be available for my Patreons. Now let's continue and obviously remember you can also use any other dataset you want, if you want to use a much more complex dataset then go ahead and do it I think it's going to be much better to use much simpler images for what we are going to be doing today but if you want to use a much more complex dataset that's also fine right that's also okay you can just do whatever you want, now let's continue let's get back to this Jupyter notebook so we can get started on this tutorial, the first thing you should do obviously is to put the data here, in the directory you have created in your Google Drive you can see that this is my directory containing all the images and the only thing I did is to create a zip file right, I compressed this directory and I created this other file which is data zip and then I just uploaded this file into my Google Drive so that's obviously the first step in this process to upload the data into your Google Drive now let's get back to my Google colab and the only thing I'm going to do is to execute these three cells over here and then we are going to move to these other cells which are going to be the most fun part of this tutorial right this is where we're going to do the text detection and this is where we're going to compare their performances but let's take it one step at a time I'm going to execute this cell over here which is going to mount my google drive into my google colab this is very important because remember we want to access this file which is in my Google Drive so we definitely need to execute this cell over here and this is going to be very easy the only thing you need to do is to execute this cell and then just click here, then click on allow, and that's going to be pretty much all so now we can continue with the next step in this process which will be getting this file this data.zip file from our Google drive into this Google colab environment so let me show you the only thing I'm going to do is to press enter over here and this is going to copy the file and then this is going to unzip the file into this google colab environment and that's going to be pretty much all, the only thing you have to do when you execute this notebook is to make sure you change this value over here to the directory you have created in your Google Drive so please make sure to change this value over here in my case you can see it says tesseract vs easyocr vs AWS textract and this is a directory I have created in my google Drive in the root directory of my google drive so this is why this is the directory, this is the path I have created over here but please make sure you just change this value to whatever location you have uploaded your data into right that's the only thing you need to do... and then the only thing you need to do is to press enter and you can see that... this is all the data we have copied into our Google Drive right, these are all the images we have over here now all these images are into our google colab so we can just continue with the next step in this process and the next step in this process will be installing all the dependencies, all the python packages we are going to use in order to work on this project and we also need to install a few dependencies which are related to tesseract right, the only thing we're going to do is to press enter and this going to take care of installing all these dependencies, okay and now that we have installed all these python packages and all these dependencies over here now we can continue with the next step in this process which will be to do the text detection, this is where I'm going to show you how to use tesseract, easyocr and Amazon textract in order to detect text on images so let me show you we're going to start with tesseract, with this technology over here and in order to use tesseract we are going to use pytersseract which is a python wrapper around tesseract right, under the hood we are going to be using tesseract, but we are going to do it through this python library we have over here and this going to be very straightforward I'm going to import pytesseract and I'm also going to import pillow... or actually I'm going to import a very specific function from pillow which is from PIL import image and this is how we're going to do I'm going to choose one of the images we have copied over here I'm going to choose maybe... it doesn't really matter but let's choose this one over here 'it was all a dream', this is the image I'm going to use in order to show you how to use each one of these technologies, so I'm just going to copy and paste this image path over here... and I'm going to do something like this, image path... image path... okay, and then I'm going to call pytesseract dot image to string... I'm going to call imagev. open, image path, and then this is very important I need to tell tesseract that the text is English right so this is a very important parameter, and I'm going to call the output something like text... okay and now I'm going to print text... and let's see what happens... and you can see that this is what we got, right, which it doesn't make any sense whatsoever let me show you the image again let me show you the image we are using now which is this one over here, it was all a dream, and if we look at the output it doesn't make any sense whatsoever, so I have been doing many tests while I was preparing this tutorial and I noticed that pytesseract which remember uses tesseract under the hood it doesn't perform very well with this type of data and I have been doing some research and I noticed that tesseract works better with other type of data for example with documents and also it's very important to preprocess the data in a given way in order to use tesseract and it's also very important to use the most... the best configuration in order to use tesseract which is another thing we are not using right now you can see that the only parameter we are putting as input into this function is the language but then we are not doing any other configuration whatsoever, so long story short for all the tests I have been doing so far I noticed that tesseract doesn't perform very well and tesseract performs better with other type of data in other conditions right, so this is very important we are going to use it anyway because remember this video is about showing you how to use each one of these technologies and it's also about doing a comparison between them and also this comparison we are going to do between all these different technologies we are going to input the image without doing any preprocessing whatsoever and without specifying any configuration whatsoever right, this is the experiment I'm going to show you in this video because I have worked in many projects involving text detection and I have noticed that in some situations in order to achieve a very good performance you need to do a lot of pre-processing and you need to do like a very very detailed configuration, you need to see exactly what are the parameters you use and so on and finally you achieve a very good performance but you need to do a lot of reprocessing and so on right, and doing all these preprocessing and so on takes a lot of time, it's a very time consuming task, so that's exactly why we are going to do this experiment like this we're just going to compare the performances without doing any preprocessing whatsoever, we're just going to take the image and we're going to input the image like this and also we are just going to use all the default parameters so we're just going to compare all these functions like out box right, we're not going to do any preprocessing whatsoever because that's how I like to do text detection right that's how I like to use all these different technologies, I like to just use them from a very high level without really minding about all the preprocessing and all the configuration and so on so this is exactly why we are going to do the experiment like this, so long story short this is the output we got by using tesseract and now let me show you how to use easyocr with which is the next technology, I'm just going to open a new cell, something like this and this is what we are going to do the first step will be to import easyocr so I'm going to say from easyocr import reader, remember this is a library we have already installed over here, we installed pytesseract, easyocr, pillow and we also installed boto3 which is the other Library we're going to use in the next step, so we are over here from easyocr import reader, we are going to create a reader first, this is going to be something like this... reader... and we need to specify the language which is English, and then this is what we're going to do, I'm going to call this variable results, and we're going to call reader dot read text... and then we are going to input exactly the same as we did over here, image open and the image path let's print results first and then we are going to do something with the... with the output we are going to do something with this object but in order to move one step by a time let me show you how this variable looks like, okay we got an error and this is because this is not read text with a capital T but this is only read text with a t in lowercase right so I'm going to execute this cell again okay so this is what we got now let me copy and paste this output over here so I can show you better so this is all the output we got I'm going to zoom in... over here so let me show you this is all the information we got, so you can see that for each... for all all the text we have in this image... which if I show you the image again... the this is the image we are using it's the same image as before, and you can see that we got the location where we have detected the text, then we have the text we have detected and also the confidence value for this detection right, this is how confident easyocr is regarding this detection and this is what we care about in this project which is the text we have detected we don't really care so much about the confidence value we don't really care so much about the location in which we have the tected this text but this is the only information we care for this project so let me show you how to take this information in order to continue, I'm going to define another valuable which is text and this is going to be an empty string for now but I'm going to do something like this, for result in results I'm going to say something like text equal to text plus result... one... let me show you again remember this is how the data looks like and if we access the index number zero we are going to access the location, the bounding box, if we access the index number one we are going to get the text and if we access the index number two then we are going to get the confidence score and in this project we only care about the text so this is why we are going to access this index over here, and then I'm also going to... add a space in order to add all these different wordss we have detected but in order to separate all of these wordss by this character over here which is a... white space right, and then I'm going to do something like text equal to text up to the element minus one right, because at the end of this process once we have detected all the words and once we have... once we are here right, we are also going to append a white space at the end and we don't care about the white space at the end so we're just going to remove this white space at the end by just doing something this and now let's print text and let's see how it looks like and you can see that this is the output which is definitely not perfect but it's more similar right now it says 'it was all a Dion' something like that so it's definitely not perfect but it's... it's similar so this is how we are going to use easyocr and now let's continue and let me show you how to use Amazon textract, this is going to be a little more complex because we need to sign in into our AWS account and we need to work into our AWS account so let me show you so the first thing you need to do once you are logged into your a account is going to IAM... and we are going to create an IAM user so I'm going to click on users... create user, and the username will be something like textract tutorial... user... something like that okay, next, then I'm going to attach policies directly and I'm going to select this one over here which is Amazon Textract Full Access and I'm going to click on next then I'm going to click on create user and that's going to be pretty much all now I'm going to select the user I have created which is this one textract tutorial user and I'm going to click here on security credentials and I'm going to create access keys I'm going to select local code and then 'I understand the above recommendation' and click on next... create access key and that's going to be pretty much all for now this is the first step in this process which is creating this access role now remember something which is very important remember that absolutely every single thing you do in AWS is not free but you need to pay for it this is very important so please remember that for everything we're going to do from now on from all the times we're going to call textract then we will need to pay for all these executions right, most likely we will need to pay for it just keep that in mind in order to continue with this tutorial and now let me show you how to use textract so this is the access role we have created this is the user we have created and this is how we are going to use this user let me show you, so I'm going to create a new cell over here and this is what we're going to do I'm going to import boto3 which is the python library we are going to use in order to use textract and then I'm going to define two variables one of them is access key and the other one is secret access key okay and this is where we're going to copy and paste these two values over here so the access key over here and then the secret access key over here please remember to keep these keys private right remember these are your private access keys so do not share these keys with anyone right and never do something like I am doing right now which is just making a video with all my keys publicly available right never do something like this and never ever share these keys with anyone so let me show you how to continue... in my case obviously it's not a problem because I'm just going to delete these keys once this tutorial is completed but just remember to never share these keys with anyone so I'm going to create another variable which is textract client and this is going to be boto3 . client I'm going to input textract and then I'm going to input some parameters one of them is aws access key... id... and this is going to be equal to access key... then the other parameter is secret access key... this is AWS secret access key... and this is going to be secret access key... okay and then the region... in my case this is us east 1 right but it's going to depend on the region where you have... where you are working in right in my case this Is us east q but in your case this may be different now let's continue and now I'm going to keep on using exactly same image as before so I'm going to do something like this with open... I'm going to use the same variable which is image path rb... as im... and this is going to be something like this response equal to textract client... detect document text... and we are going to input the document like this... bytes im.read okay and this is exactly how we are going to use textract let's make sure everything works as as expected so I'm just going to print response and let's see what happens okay so I made a mistake over here This Is us east q like this let's see now, okay and this is the output we got and we're going to do something similar as we did with easyocr I'm just going to copy this value and I'm going to paste it over here and you can see that in this case we got a lot of information this is definitely a lot of information we got from textract and also you can see that the text we are looking for is over here, here it says 'it was' and then here it's 'all' and then if I keep on looking I guess I'm going to find the remaining text over here it says 'dream' and so on right so let me show you how to parse all this object in in order to get the information we are looking for so I'm going to define a variable which is called text this is going to be very similar as we did before with easyocr and then I'm going to parse this response like this for item in response blocks if item block type equal to line then we are going to do something very similar as we did before text is going to equal to text plus item text with a capital t plus this empty space and then we are going to do exactly the same as we did over here we are going to remove the last empty space we have added over here once we have read absolutely all the words, all the sentences, all the lines right, and now the only thing I'm going to do is to print text and let's see what happens, and you can see that in this case we got 'it was all dream' so it's definitely not perfect because remember this is the picture we are using this example as well and we have an 'a' missing over here so it's not a perfect output it's not a perfect detection but I would say it's very good nevertheless so this is going to be pretty much all in order to use each one of our technologies and now let me show you how to compare their performances now we are going to iterate in all the images we have over here and for each one of these images we are going to compute a similarity matric between the ground truth and the output we got from each one of these technologies and then we're just going to compare their performances, let me show you, we're going to use a similarity metric which is based on the jaccard index and the way this index works let me give you like a very quick and a very high level explanation of how it works the jaccard index will be the ratio between the intersection of two sentences and the union of these two sentences right, so long story short this index is going to give us how many words these two sentences have in common right we're going to input the ground truth and we're going to input the sentence we got from each one of our ocr technologies, we're going to see how many words are in common, and we're going to see how many words are in total in both of our sentences and then we're just going to take the ratio between these two numbers and that's going to be the jaccard index that's how we are going to compute the similarity metric in this case right that's how we're going to compare the performances and this is why I decided to use much simpler images like these ones over here with only a few words or with only a couple of sentences because this is going to make this process of computing the similarity much easier and much better and otherwise it would be much harder to make some sense of the results right so this is the similarity metric we are going to use and in order to have a function that takes as input two sentences and return the jaccard index between them let's ask chatgpt to give us this function so I'm going to ask chatgpt give me a python function that receives two sentences and returns the jaccard similarity between them so let's ask chatgpt to give us this function and let's see what happens, okay so this is a function we got and chatgpt also gave us an example of how to execute this function in order to see how it works so the only thing I'm going to do is to copy and paste this code over here and let me show you how it works first and then let's compute the similarities and then let's compare the performances so you can see we have a function which is jaccard similarity it receives two sentences and then the output is a number and let's see what happens if we input these two sentences over here, the only thing I'm going to do in order to avoid any mistake and any confusion and any... anything like that I'm just going to remove the periods at the end because just make sure we are computing things as we should right so I'm going to press enter and you can see that we got a jaccard similarity of 0.28 now let's analyze this result you can see sentence number one is this is a sample sentence and the sentence number two is sample sentence for testing so let's count how many words we have in the intersection you can see that the intersection is something like 'sample sentence' and that's pretty much all right these are the only two words we have in common in both of our sentences so the intersection is going to be two and then the union is going to be 1 2 3 4 5 we're not going to count these two words again 6 7 right, 1... 1 2 3 4 5 and then 6 7 and if I compute this ratio 2 / by 7 we got 0.285 right which is exactly what we have over here so this is exactly how the jaccard similarity works and this is exactly the similarity metric we are going to use in this tutorial, so I'm going to delete this example over here we're only going to use this function and now let's do something I'm going to create a function for each one of these ocr technologies we have over here so this is what I'm going to do, the first one is going to be something like... read text tesseract, something like that and it's going to receive the image path and that's going to be pretty much all and then it's going to return... text, okay we're going to delete this image path over here and then we are just going to take all this code... the import is going to be the first thing over here and then... the reader too because we're not going to define the reader every time we call this function and then we are just just going to create another function which is read... text easyocr and we're going to input the image path as well... and we are just going to do something like this okay... okay yeah this should work just fine so we're going to delete this cell and then let's move to this other one over here we're going to copy the import first and then we're not going to define the client and the access keys every single time so I'm just going to copy everything over here and then we are just going to... do something like this... and create another function which is def read... text... textract... image path okay and then we are going to return text okay that should be all, right? okay and I'm going to delete this other cell over here... and then once we have these functions I am going to... copy this function over here I'm going to iterate in all the images we have in this directory so for image path in os list dir content data... which is the directory in where we have copied all this data you can see over here, we are going to iterate in this directory and the only thing we're going to do is something like this image path will be os path join okay and we're going to import... os we can import os over here okay, that's pretty much all and then the only thing we need to do... is to define another variable which is grund truth and ground truth will be... image path... removing the extension and we're also going to replace... this value... with a white space right remember the ground truth for each one of our images it's the filename so we're just going to take the filename we are going to remove the extension and we're just going to replace this underscores by a white space and that's going to be pretty much all in order to get the ground truth now let's continue, I'm going to have some counters or some other values which are going to be something like similarity or maybe something like score... tesseract... which is going to be zero, then score easyocr which is going to be zero... and then score textract which is also going to be zero, and then the only thing we need to do... is to append... is to update this value... with the similarity, right? and another thing I'm going to do is I am going to convert this to lowercase and that's going to be pretty much for the ground truth and in this case I'm going to convert into lower case and I'm also going to replace all the... all the new line characters I'm going to replace them by nothing right so I'm going to remove all these new line characters and I'm also going to remove these other characters which are the exclamation point the question mark and then we are also going to remove... the period... and that's going to be pretty much all, we could remove even more characters, we could remove like for example this character and this character and so on we could remove many other special characters but in order to keep it simple let's just do it like this right which may be the most important characters which we are going to find in these pictures over here so and then the only thing I'm going to do is to do exactly the same with these other images okay... okay... this is going to be score easyocr and then we are going to read text easyocr and that's going to be pretty much all and then read text textract okay... and then we are just going to print score tesseract... equal to... this value divided by 100 because we have 100 images in this directory so we are going to take the average of all the scores we got in all the images, and we're going to do exactly the same with this two other values over here... no sorry this is textract... okay... read text textract, yeah everything's okay okay... everything seems to be okay... the only thing I'm going to do... I'm going to print, I'm going to execute it for only one image for now and then we are going to iterate in all the images, so let's see what happens I'm just going to... iterate once I'm I'm going to break the loop and let's see what happens, I'm going to execute this cell over here so we execute all these functions so we load all these functions into memory so we can use them later on and now let's see what happens I'm going to execute this cell over here and we got a mistake... right, this value it it's not right... and now everything should be okay, okay and this is what we got remember this is only an example but I would say everything makes sense so far so let's just run exactly the same function on all of our images and let's see what happens and these are the results you can see we got a 0.01 with tesseract, 0.21 with easyocr and 0.34 with AWS textract, so in this experiment with this data and under these conditions AWS textract is the best of ocr technology so this is going to be all for this tutorial, in a future tutorial, in a future video, I may make another tutorial on how you can use tesseract as an ocr technology because you can see that in this case we didn't have a good performance using tesseract but this is actually a very good and a very powerful ocr technology so I may make another video where show you exactly what's the data you can input into tesseract and exactly what's the pre-processing you should do on that data and so on in order to achieve some very good results using tesseract but for now this going to all my name is Felipe I'm a computer vision engineer thank you so much for watching this tutorial and see you on my next video.

Info

Channel: Computer vision engineer

Views: 1,583

Rating: undefined out of 5

Keywords:

Id: CcC3h0waQ6I

Channel Id: undefined

Length: 42min 15sec (2535 seconds)

Published: Fri Jan 19 2024