Text detection with Python | Tesseract vs Easyocr vs AWS Textract | What is the best OCR?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So this is exactly the project in which we will  be working today on today's tutorial, today   we're going to work with text detection and  I'm going to show you how to work with three   super powerful OCR Technologies which are tesseract,  EasyOCR and AWS Textract. I'm going to show you how to   work with them and I'm also going to show you how  to compare their performances, and now let's get   started. Now let's get back to Google colab this  is a notebook I have created in my Google colab   and this is exactly where we will be working on  today tutorial you can see that this notebook is   comprised of 1 2 3 4 5 steps, in only five  steps we will have completed this process and you   can also see that the first three steps are ready  and the only thing we're going to do is to execute   each one of these cells then I'm going to show you  how to do all the coding in this cell over here   and also this other cell over here where is where  we are going to do the text detection and this is   where we're going to compare the performances so  this is the notebook in which we will be working   today and in this tutorial we're going to work  in Google colab but we're also going to work   in Google Drive so please make sure you create a  directory in your Google Drive in order to work on   this project, now let me show you the data we are  going to use in this tutorial, this is the data   I have prepared for this project and you can see  that these are a lot of images which all of them   contain text right all of them have one or two or  a few words of text and this is exactly the data   we will be using today and you can see that for  each one of these images the filename is exactly   the text that it's contained within  this image right for example in this case it   says collect moments not things and if you look at  the file name it's exactly the same right collect   moments not things . jpg and if I show you another  example for example this image over here it says   color is not a crime and if you see the filename  is exactly color is not a crime jpg and if I show   you another example for example this one... come  on in we are open pull handle and in this case the   filename is exactly come on in we are open pull  handle.jpg so this is exactly the data we will   be using in this tutorial this is a dataset I  created in order to work on this project and if   you want to follow along this tutorial you can  just go ahead and use exactly the same dataset as   I am going to use or you can just use any other  text detection dataset you want, as long as you use   images with some text on them and as long as you  have the annotations for the images you are using   you can just go ahead and use any other dataset  you want let me give you an example if I go back   to my browser this is a very popular and this is  an amazing dataset in order to work in a text   detection project which is coco text this is  an amazing dataset please take a look at this   website because this is an amazing dataset and  this is for example one of the datasets you could   use in this project, I have been reviewing the  images in this dataset, I have been reviewing a   few of these images and I noticed that some of these  images have a lot of annotations for example this   image we are currently looking at you can see  that we have all these text over here we also   have the license plate we have more text over  here we have all these text over here we have   this logo over here we have a lot of text in this  image and if I show you a couple of other examples   this is another example you can see that we have  a lot of text we have the annotation for all these   text and some of this text is even in a different  orientation so this is an amazing dataset and   there are many pictures which are like this right  with a lot of annotations now remember that in   this tutorial I'm going to show you how to use all  of these different technologies but I'm also going   to show you how to compare their performances  and in order to compare their performances we are   going to use a similarity metric which I'm going  to show you later on and in order to use this   similarity metric and in order to make sense of  the results I think it's going to be much better   if we use much simpler images like these ones  over here which I have selected right it's going   to be much better if we use an image with only a  few words with only one sentence with only maybe   a couple of sentences if we use much simpler data  like all these images over here is going to be   much better considering we are going to compare  the performance of all these different algorithms   so it's going to be much better and you're going  to see why later on you're going to see why once   I show you the similarity metric we are going  to use today and remember this is a dataset I   created myself I just downloaded all these images  and I just annotated all these images right the   filename of each one of these images is the text  that's written on them so I created this dataset   myself and if you want to use exactly the same  data I am going to use in this tutorial you have   two different options, you can either download  all this data yourself I got all these images   from pexels so you can just download all these  images yourself searching something like 'text'   you're going to see many of the pictures I'm going  to use in this tutorial you're going to find many   many many images with text and then you can just  go ahead and annotate these images the same way I   did with all these images over here right the only  thing you need to do is to change its filename to   the text these images contain right and that's  pretty much in order to create this   dataset so that's one of the options in order  to use exactly the same data I am going to use   today and the other option is just to download  this data which is going to be available in my   Patreon so it's going to be available to all my  Patreon supporters so these are the two options   you have in order to use exactly the same data I  am going to use today, you can either create this   dataset yourself or you can just download this  dataset which is going to be available for my   Patreons. Now let's continue and obviously remember  you can also use any other dataset you want, if you   want to use a much more complex dataset then go  ahead and do it I think it's going to be much   better to use much simpler images for what we are  going to be doing today but if you want to use a   much more complex dataset that's also fine right  that's also okay you can just do whatever you   want, now let's continue let's get back to this  Jupyter notebook so we can get started on this   tutorial, the first thing you should do obviously  is to put the data here, in the directory you   have created in your Google Drive you can see  that this is my directory containing all the   images and the only thing I did is to create a  zip file right, I compressed this directory and I   created this other file which is data zip and  then I just uploaded this file into my Google   Drive so that's obviously the first step in this  process to upload the data into your Google Drive   now let's get back to my Google colab and the  only thing I'm going to do is to execute these   three cells over here and then we are going to  move to these other cells which are going to be   the most fun part of this tutorial right this  is where we're going to do the text detection   and this is where we're going to compare their  performances but let's take it one step at a time   I'm going to execute this cell over here which  is going to mount my google drive into my google colab   this is very important because remember we want  to access this file which is in my Google Drive   so we definitely need to execute this cell over here  and this is going to be very easy the only thing   you need to do is to execute this cell and then  just click here, then click on allow, and that's   going to be pretty much all so now we can continue  with the next step in this process which will be   getting this file this data.zip file from our Google  drive into this Google colab environment so let   me show you the only thing I'm going to do is to  press enter over here and this is going to copy   the file and then this is going to unzip the file  into this google colab environment and that's   going to be pretty much all, the only thing you  have to do when you execute this notebook is to   make sure you change this value over here to the  directory you have created in your Google Drive   so please make sure to change this value over here  in my case you can see it says tesseract vs easyocr   vs AWS textract and this is a directory I have  created in my google Drive in the root directory of   my google drive so this is why this is the directory,  this is the path I have created over here but please   make sure you just change this value to whatever  location you have uploaded your data into right   that's the only thing you need to do...  and then the only thing you need to do   is to press enter and you can see that... this  is all the data we have copied into our Google   Drive right, these are all the images we have  over here now all these images are into our google   colab so we can just continue with the next step  in this process and the next step in this process   will be installing all the dependencies, all the  python packages we are going to use in order to   work on this project and we also need to install  a few dependencies which are related to tesseract   right, the only thing we're going to do is to press  enter and this going to take care of installing   all these dependencies, okay and now that we have  installed all these python packages and all these   dependencies over here now we can continue with  the next step in this process which will be to   do the text detection, this is where I'm going to  show you how to use tesseract, easyocr and Amazon textract   in order to detect text on images so let me  show you we're going to start with tesseract, with   this technology over here and in order to use  tesseract we are going to use pytersseract which is a python   wrapper around tesseract right, under the hood we  are going to be using tesseract, but we are going   to do it through this python library we have over  here and this going to be very straightforward   I'm going to import pytesseract and I'm also going to  import pillow... or actually I'm going to import   a very specific function from pillow which is  from PIL import image and this is how we're   going to do I'm going to choose one of the images  we have copied over here I'm going to choose maybe...   it doesn't really matter but let's choose this one  over here 'it was all a dream', this is the image I'm   going to use in order to show you how to use each  one of these technologies, so I'm just going to   copy and paste this image path over here... and I'm  going to do something like this, image path... image path... okay, and then I'm going to call   pytesseract dot image to string... I'm going to   call imagev. open, image path, and then this  is very important I need to tell tesseract that   the text is English right so this is a  very important parameter, and I'm going   to call the output something like  text... okay and now I'm going to print text... and let's see what happens... and you can see  that this is what we got, right, which it doesn't   make any sense whatsoever let me show you the  image again let me show you the image we are   using now which is this one over here, it was all  a dream, and if we look at the output it doesn't   make any sense whatsoever, so I have been doing  many tests while I was preparing this tutorial   and I noticed that pytesseract which remember uses  tesseract under the hood it doesn't perform very   well with this type of data and I have been doing  some research and I noticed that tesseract works   better with other type of data for example  with documents and also it's very important   to preprocess the data in a given way in order to  use tesseract and it's also very important to use   the most... the best configuration in order to  use tesseract which is another thing we are not using   right now you can see that the only parameter we  are putting as input into this function is the   language but then we are not doing any other  configuration whatsoever, so long story short   for all the tests I have been doing so far I  noticed that tesseract doesn't perform very well and   tesseract performs better with other type of data in other  conditions right, so this is very important we are   going to use it anyway because remember this  video is about showing you how to use each one   of these technologies and it's also about doing a  comparison between them and also this comparison   we are going to do between all these different  technologies we are going to input the image   without doing any preprocessing whatsoever and  without specifying any configuration whatsoever   right, this is the experiment I'm going to show  you in this video because I have worked in many   projects involving text detection and I have  noticed that in some situations in order to   achieve a very good performance you need to do  a lot of pre-processing and you need to do like   a very very detailed configuration, you need to  see exactly what are the parameters you use and so on   and finally you achieve a very good performance  but you need to do a lot of reprocessing and    so on right, and doing all these preprocessing  and so on takes a lot of time, it's a very time   consuming task, so that's exactly why we are going  to do this experiment like this we're just going   to compare the performances without doing any  preprocessing whatsoever, we're just going to   take the image and we're going to input the image  like this and also we are just going to use all   the default parameters so we're just going to  compare all these functions like out box right,   we're not going to do any preprocessing whatsoever  because that's how I like to do text detection   right that's how I like to use all these different  technologies, I like to just use them from a very   high level without really minding about all the  preprocessing and all the configuration and so   on so this is exactly why we are going to do  the experiment like this, so long story short   this is the output we got by using tesseract and now  let me show you how to use easyocr with which is the   next technology, I'm just going to open a new cell,  something like this and this is what we are going   to do the first step will be to import easyocr so  I'm going to say from easyocr import reader, remember   this is a library we have already installed over here, we installed pytesseract, easyocr, pillow and we also installed   boto3 which is the other Library we're going to use  in the next step, so we are over here from easyocr   import reader, we are going to create a reader  first, this is going to be something like this...   reader... and we need to specify the language  which is English, and then this is what we're   going to do, I'm going to call this variable  results, and we're going to call reader dot read text... and then we are going to input exactly the  same as we did over here, image open and the image path let's print results first and then we are going  to do something with the... with the output we are   going to do something with this object but in  order to move one step by a time let me show   you how this variable looks like, okay we got an  error and this is because this is not read text   with a capital T but this is only read text with  a t in lowercase right so I'm going to execute   this cell again okay so this is what we got now  let me copy and paste this output over here so I   can show you better so this is all the output  we got I'm going to zoom in... over here so   let me show you this is all the information  we got, so you can see that for each... for all   all the text we have in this image... which if I  show you the image again... the this is the image   we are using it's the same image as before, and  you can see that we got the location where we   have detected the text, then we have the text we  have detected and also the confidence value    for this detection right, this is how confident  easyocr is regarding this detection and this is   what we care about in this project which  is the text we have detected we don't really   care so much about the confidence value we don't  really care so much about the location in which   we have the tected this text but this is the  only information we care for this project so   let me show you how to take this information in  order to continue, I'm going to define another   valuable which is text and this is going to be an  empty string for now but I'm going to do something   like this, for result in results I'm going to  say something like text equal to text plus result... one... let me show you again remember this is  how the data looks like and if we access the index   number zero we are going to access the location,  the bounding box, if we access the index number one   we are going to get the text and if we access  the index number two then we are going to get   the confidence score and in this project we only  care about the text so this is why we are going   to access this index over here, and then I'm also  going to... add a space in order to add all these   different wordss we have detected but in order  to separate all of these wordss by this character   over here which is a... white space right, and then  I'm going to do something like text equal to text   up to the element minus one right, because at the  end of this process once we have detected all   the words and once we have... once we are here right,  we are also going to append a white space at the   end and we don't care about the white space at the  end so we're just going to remove this white space   at the end by just doing something this and now  let's print text and let's see how it looks like   and you can see that this is the output which is  definitely not perfect but it's more similar right   now it says 'it was all a Dion' something like that so  it's definitely not perfect but it's... it's similar   so this is how we are going to use easyocr and now  let's continue and let me show you how to use   Amazon textract, this is going to be a little more  complex because we need to sign in into our AWS   account and we need to work into our AWS account  so let me show you so the first thing you need   to do once you are logged into your a account  is going to IAM... and we are going to create an IAM   user so I'm going to click on users... create user,  and the username will be something like textract tutorial... user... something like that okay, next, then I'm  going to attach policies directly and I'm going   to select this one over here which is Amazon Textract  Full Access and I'm going to click on  next then I'm going to click on create user  and that's going to be pretty much all now   I'm going to select the user I have created  which is this one textract tutorial user and   I'm going to click here on security credentials  and I'm going to create access keys I'm going   to select local code and then 'I understand the  above recommendation' and click on next... create   access key and that's going to be pretty much all  for now this is the first step in this process   which is creating this access role now remember  something which is very important remember that   absolutely every single thing you do in AWS  is not free but you need to pay for it this   is very important so please remember that for  everything we're going to do from now on from   all the times we're going to call textract then  we will need to pay for all these executions   right, most likely we will need to pay for it just  keep that in mind in order to continue with this   tutorial and now let me show you how to use  textract so this is the access role we have   created this is the user we have created and this  is how we are going to use this user let me show   you, so I'm going to create a new cell over here  and this is what we're going to do I'm going to   import boto3 which is the python library we  are going to use in order to use textract and   then I'm going to define two variables one of them  is access key and the other one is secret access key okay and this is where we're going to copy  and paste these two values over here so the   access key over here and then the secret  access key over here please remember to   keep these keys private right remember these  are your private access keys so do not share   these keys with anyone right and never do  something like I am doing right now which   is just making a video with all my keys publicly  available right never do something like this and   never ever share these keys with anyone so  let me show you how to continue... in my case   obviously it's not a problem because I'm just  going to delete these keys once this tutorial   is completed but just remember to never share  these keys with anyone so I'm going to create   another variable which is textract client and  this is going to be boto3 . client I'm going   to input textract and then I'm going to input  some parameters one of them is aws access key... id... and this is going to be equal to access  key... then the other parameter is secret access key... this   is AWS secret access key... and this is going to be  secret access key... okay and then the region... in my   case this is us east 1 right but it's going  to depend on the region where you have... where   you are working in right in my case this Is us  east q but in your case this may be different   now let's continue and now I'm going to keep  on using exactly same image as before so I'm   going to do something like this with open... I'm  going to use the same variable which is image path rb... as im... and this is going to be  something like this response equal to textract client... detect document text... and we are going to input the document like this... bytes im.read okay and  this is exactly how we are going to use textract   let's make sure everything works as as  expected so I'm just going to print response   and let's see what happens okay so I made a  mistake over here This Is us east q like   this let's see now, okay and this is the output  we got and we're going to do something similar   as we did with easyocr I'm just going to copy  this value and I'm going to paste it over   here and you can see that in this case we got  a lot of information this is definitely a lot   of information we got from textract and also  you can see that the text we are looking for   is over here, here it says 'it was' and then here  it's 'all' and then if I keep on looking I guess   I'm going to find the remaining text over  here it says 'dream' and so on right so let   me show you how to parse all this object in  in order to get the information we are looking   for so I'm going to define a variable which is  called text this is going to be very similar   as we did before with easyocr and then I'm  going to parse this response like this for item in response blocks if item block type equal to line then we are going to do something  very similar as we did before text is going to   equal to text plus item text with a  capital t plus this empty space and then   we are going to do exactly the same as we did  over here we are going to remove the last empty   space we have added over here once we have  read absolutely all the words, all the   sentences, all the lines right, and now the only  thing I'm going to do is to print text and let's   see what happens, and you can see that in this  case we got 'it was all dream' so it's definitely   not perfect because remember this is the picture  we are using this example as well and we have an   'a' missing over here so it's not a perfect output  it's not a perfect detection but I would say it's   very good nevertheless so this is going to be  pretty much all in order to use each one of our   technologies and now let me show you how to  compare their performances now we are going   to iterate in all the images we have over here  and for each one of these images we are going   to compute a similarity matric between the  ground truth and the output we got from each one   of these technologies and then we're just going to  compare their performances, let me show you, we're   going to use a similarity metric which is based  on the jaccard index and the way this index works   let me give you like a very quick and a very high  level explanation of how it works the jaccard index   will be the ratio between the intersection of two  sentences and the union of these two sentences   right, so long story short this index is going to  give us how many words these two sentences   have in common right we're going to input the ground  truth and we're going to input the sentence we   got from each one of our ocr technologies, we're  going to see how many words are in common, and   we're going to see how many words are in total  in both of our sentences and then we're just   going to take the ratio between these two numbers  and that's going to be the jaccard index that's how we   are going to compute the similarity metric in this  case right that's how we're going to compare the   performances and this is why I decided to use much  simpler images like these ones over here with only   a few words or with only a couple of sentences  because this is going to make this process of   computing the similarity much easier and much  better and otherwise it would be much harder to   make some sense of the results right so this is  the similarity metric we are going to use and in   order to have a function that takes as input two  sentences and return the jaccard index between   them let's ask chatgpt to give us this function so  I'm going to ask chatgpt give me a python function   that receives two sentences and returns the jaccard  similarity between them so let's ask chatgpt to   give us this function and let's see what happens,  okay so this is a function we got and chatgpt also   gave us an example of how to execute this function  in order to see how it works so the only thing I'm   going to do is to copy and paste this code over  here and let me show you how it works first and   then let's compute the similarities and then  let's compare the performances so you can see   we have a function which is jaccard similarity it  receives two sentences and then the output is   a number and let's see what happens if we input  these two sentences over here, the only thing I'm   going to do in order to avoid any mistake  and any confusion and any... anything like that   I'm just going to remove the periods at the end  because just make sure we are computing things   as we should right so I'm going to press enter  and you can see that we got a jaccard similarity   of 0.28 now let's analyze this result you can see  sentence number one is this is a sample sentence   and the sentence number two is sample sentence  for testing so let's count how many words we   have in the intersection you can see that the  intersection is something like 'sample sentence'   and that's pretty much all right these are the  only two words we have in common in both of our   sentences so the intersection is going to be two  and then the union is going to be 1 2 3 4 5   we're not going to count these two words again 6  7 right, 1... 1 2 3 4 5 and then 6 7 and if I   compute this ratio 2 / by 7 we got 0.285 right  which is exactly what we have over here so this   is exactly how the jaccard similarity works and this  is exactly the similarity metric we are going to   use in this tutorial, so I'm going to delete this  example over here we're only going to use this   function and now let's do something I'm going  to create a function for each one of these ocr   technologies we have over here so this is what I'm  going to do, the first one is going to be something like... read text tesseract, something like that  and it's going to receive the image path   and that's going to be pretty much all  and then it's going to return... text, okay   we're going to delete this image path over here  and then we are just going to take all this code... the import is going to be the  first thing over here and then... the   reader too because we're not going to  define the reader every time we call   this function and then we are just just  going to create another function which is read... text easyocr and we're going to input the image path as  well... and we are just going to do something like this okay... okay yeah this should work just fine so we're going to delete this cell and then  let's move to this other one over here we're going   to copy the import first and then we're  not going to define the client and the access   keys every single time so I'm just going to copy  everything over here and then we are just going to... do something like this... and  create another function which is def read... text... textract... image path okay and then we are going to return text okay that should be all, right? okay and I'm going to delete this other   cell over here... and then once we  have these functions I am going to... copy this function over here I'm going to iterate in all the images we   have in this directory so  for image path in os list dir content data... which is the directory in where we  have copied all this data you can see over here, we   are going to iterate in this directory and the  only thing we're going to do is something like this image path will be os path join okay and we're going to import... os we can import os over here okay, that's pretty much all  and then the only thing we need to do... is   to define another variable which  is grund truth and ground truth will be... image path... removing the extension and we're also  going to replace... this value... with a   white space right remember the ground truth for  each one of our images it's the filename so   we're just going to take the filename we are  going to remove the extension and we're just   going to replace this underscores by a white  space and that's going to be pretty much all   in order to get the ground truth now let's continue,  I'm going to have some counters or some   other values which are going to be something  like similarity or maybe something like score...   tesseract... which is going to be zero, then score easyocr  which is going to be zero... and then score textract   which is also going to be zero, and then the only  thing we need to do... is to append... is to update this value... with the similarity, right? and another thing I'm going to do is I am going to  convert this to lowercase and that's going to be   pretty much for the ground truth and in this case  I'm going to convert into lower case and   I'm also going to replace all the... all the  new line characters I'm going to replace them by   nothing right so I'm going to remove all these  new line characters and I'm also going to remove   these other characters which are the exclamation point  the question mark and then we are also going to remove... the period... and that's going to be pretty  much all, we could remove even more characters, we   could remove like for example this character  and this character and so on we could remove   many other special characters but in order to  keep it simple let's just do it like this right   which may be the most important characters which  we are going to find in these pictures over here   so and then the only thing I'm going to do  is to do exactly the same with these other images okay... okay... this is going to be score easyocr and then we are going to read text easyocr and that's going to be pretty much   all and then read text textract  okay... and then we are just going to print score tesseract... equal to... this value divided by 100 because  we have 100 images in this directory   so we are going to take the average  of all the scores we got in all the   images, and we're going to do exactly the  same with this two other values over here... no sorry this is textract... okay... read text textract, yeah everything's okay   okay... everything seems to be okay...  the only thing I'm going to do... I'm going to print, I'm going to execute  it for only one image for now and then we are   going to iterate in all the images, so let's see  what happens I'm just going to... iterate once I'm   I'm going to break the loop and let's see what  happens, I'm going to execute this cell over here   so we execute all these functions so we load all  these functions into memory so we can use them later   on and now let's see what happens I'm going  to execute this cell over here and we got a mistake... right, this value it it's not right... and  now everything should be okay, okay and this is   what we got remember this is only an example  but I would say everything makes sense so far   so let's just run exactly the same function on  all of our images and let's see what happens   and these are the results you can see we got a  0.01 with tesseract, 0.21 with easyocr and 0.34 with AWS   textract, so in this experiment with this  data and under these conditions AWS textract is   the best of ocr technology so this is going to  be all for this tutorial, in a future tutorial,   in a future video, I may make another tutorial on  how you can use tesseract as an ocr technology because   you can see that in this case we didn't have  a good performance using tesseract but this is   actually a very good and a very powerful ocr  technology so I may make another video where   show you exactly what's the data you can  input into tesseract and exactly what's the pre-processing you should do on that data and  so on in order to achieve some very good   results using tesseract but for now this going to  all my name is Felipe I'm a computer vision   engineer thank you so much for watching  this tutorial and see you on my next video.
Info
Channel: Computer vision engineer
Views: 1,583
Rating: undefined out of 5
Keywords:
Id: CcC3h0waQ6I
Channel Id: undefined
Length: 42min 15sec (2535 seconds)
Published: Fri Jan 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.