So this is exactly the project in which we will
be working today on today's tutorial, today we're going to work with text detection and
I'm going to show you how to work with three super powerful OCR Technologies which are tesseract,
EasyOCR and AWS Textract. I'm going to show you how to work with them and I'm also going to show you how
to compare their performances, and now let's get started. Now let's get back to Google colab this
is a notebook I have created in my Google colab and this is exactly where we will be working on
today tutorial you can see that this notebook is comprised of 1 2 3 4 5 steps, in only five
steps we will have completed this process and you can also see that the first three steps are ready
and the only thing we're going to do is to execute each one of these cells then I'm going to show you
how to do all the coding in this cell over here and also this other cell over here where is where
we are going to do the text detection and this is where we're going to compare the performances so
this is the notebook in which we will be working today and in this tutorial we're going to work
in Google colab but we're also going to work in Google Drive so please make sure you create a
directory in your Google Drive in order to work on this project, now let me show you the data we are
going to use in this tutorial, this is the data I have prepared for this project and you can see
that these are a lot of images which all of them contain text right all of them have one or two or
a few words of text and this is exactly the data we will be using today and you can see that for
each one of these images the filename is exactly the text that it's contained within
this image right for example in this case it says collect moments not things and if you look at
the file name it's exactly the same right collect moments not things . jpg and if I show you another
example for example this image over here it says color is not a crime and if you see the filename
is exactly color is not a crime jpg and if I show you another example for example this one... come
on in we are open pull handle and in this case the filename is exactly come on in we are open pull
handle.jpg so this is exactly the data we will be using in this tutorial this is a dataset I
created in order to work on this project and if you want to follow along this tutorial you can
just go ahead and use exactly the same dataset as I am going to use or you can just use any other
text detection dataset you want, as long as you use images with some text on them and as long as you
have the annotations for the images you are using you can just go ahead and use any other dataset
you want let me give you an example if I go back to my browser this is a very popular and this is
an amazing dataset in order to work in a text detection project which is coco text this is
an amazing dataset please take a look at this website because this is an amazing dataset and
this is for example one of the datasets you could use in this project, I have been reviewing the
images in this dataset, I have been reviewing a few of these images and I noticed that some of these
images have a lot of annotations for example this image we are currently looking at you can see
that we have all these text over here we also have the license plate we have more text over
here we have all these text over here we have this logo over here we have a lot of text in this
image and if I show you a couple of other examples this is another example you can see that we have
a lot of text we have the annotation for all these text and some of this text is even in a different
orientation so this is an amazing dataset and there are many pictures which are like this right
with a lot of annotations now remember that in this tutorial I'm going to show you how to use all
of these different technologies but I'm also going to show you how to compare their performances
and in order to compare their performances we are going to use a similarity metric which I'm going
to show you later on and in order to use this similarity metric and in order to make sense of
the results I think it's going to be much better if we use much simpler images like these ones
over here which I have selected right it's going to be much better if we use an image with only a
few words with only one sentence with only maybe a couple of sentences if we use much simpler data
like all these images over here is going to be much better considering we are going to compare
the performance of all these different algorithms so it's going to be much better and you're going
to see why later on you're going to see why once I show you the similarity metric we are going
to use today and remember this is a dataset I created myself I just downloaded all these images
and I just annotated all these images right the filename of each one of these images is the text
that's written on them so I created this dataset myself and if you want to use exactly the same
data I am going to use in this tutorial you have two different options, you can either download
all this data yourself I got all these images from pexels so you can just download all these
images yourself searching something like 'text' you're going to see many of the pictures I'm going
to use in this tutorial you're going to find many many many images with text and then you can just
go ahead and annotate these images the same way I did with all these images over here right the only
thing you need to do is to change its filename to the text these images contain right and that's
pretty much in order to create this dataset so that's one of the options in order
to use exactly the same data I am going to use today and the other option is just to download
this data which is going to be available in my Patreon so it's going to be available to all my
Patreon supporters so these are the two options you have in order to use exactly the same data I
am going to use today, you can either create this dataset yourself or you can just download this
dataset which is going to be available for my Patreons. Now let's continue and obviously remember
you can also use any other dataset you want, if you want to use a much more complex dataset then go
ahead and do it I think it's going to be much better to use much simpler images for what we are
going to be doing today but if you want to use a much more complex dataset that's also fine right
that's also okay you can just do whatever you want, now let's continue let's get back to this
Jupyter notebook so we can get started on this tutorial, the first thing you should do obviously
is to put the data here, in the directory you have created in your Google Drive you can see
that this is my directory containing all the images and the only thing I did is to create a
zip file right, I compressed this directory and I created this other file which is data zip and
then I just uploaded this file into my Google Drive so that's obviously the first step in this
process to upload the data into your Google Drive now let's get back to my Google colab and the
only thing I'm going to do is to execute these three cells over here and then we are going to
move to these other cells which are going to be the most fun part of this tutorial right this
is where we're going to do the text detection and this is where we're going to compare their
performances but let's take it one step at a time I'm going to execute this cell over here which
is going to mount my google drive into my google colab this is very important because remember we want
to access this file which is in my Google Drive so we definitely need to execute this cell over here
and this is going to be very easy the only thing you need to do is to execute this cell and then
just click here, then click on allow, and that's going to be pretty much all so now we can continue
with the next step in this process which will be getting this file this data.zip file from our Google
drive into this Google colab environment so let me show you the only thing I'm going to do is to
press enter over here and this is going to copy the file and then this is going to unzip the file
into this google colab environment and that's going to be pretty much all, the only thing you
have to do when you execute this notebook is to make sure you change this value over here to the
directory you have created in your Google Drive so please make sure to change this value over here
in my case you can see it says tesseract vs easyocr vs AWS textract and this is a directory I have
created in my google Drive in the root directory of my google drive so this is why this is the directory,
this is the path I have created over here but please make sure you just change this value to whatever
location you have uploaded your data into right that's the only thing you need to do...
and then the only thing you need to do is to press enter and you can see that... this
is all the data we have copied into our Google Drive right, these are all the images we have
over here now all these images are into our google colab so we can just continue with the next step
in this process and the next step in this process will be installing all the dependencies, all the
python packages we are going to use in order to work on this project and we also need to install
a few dependencies which are related to tesseract right, the only thing we're going to do is to press
enter and this going to take care of installing all these dependencies, okay and now that we have
installed all these python packages and all these dependencies over here now we can continue with
the next step in this process which will be to do the text detection, this is where I'm going to
show you how to use tesseract, easyocr and Amazon textract in order to detect text on images so let me
show you we're going to start with tesseract, with this technology over here and in order to use
tesseract we are going to use pytersseract which is a python wrapper around tesseract right, under the hood we
are going to be using tesseract, but we are going to do it through this python library we have over
here and this going to be very straightforward I'm going to import pytesseract and I'm also going to
import pillow... or actually I'm going to import a very specific function from pillow which is
from PIL import image and this is how we're going to do I'm going to choose one of the images
we have copied over here I'm going to choose maybe... it doesn't really matter but let's choose this one
over here 'it was all a dream', this is the image I'm going to use in order to show you how to use each
one of these technologies, so I'm just going to copy and paste this image path over here... and I'm
going to do something like this, image path... image path... okay, and then I'm going to call
pytesseract dot image to string... I'm going to call imagev. open, image path, and then this
is very important I need to tell tesseract that the text is English right so this is a
very important parameter, and I'm going to call the output something like
text... okay and now I'm going to print text... and let's see what happens... and you can see
that this is what we got, right, which it doesn't make any sense whatsoever let me show you the
image again let me show you the image we are using now which is this one over here, it was all
a dream, and if we look at the output it doesn't make any sense whatsoever, so I have been doing
many tests while I was preparing this tutorial and I noticed that pytesseract which remember uses
tesseract under the hood it doesn't perform very well with this type of data and I have been doing
some research and I noticed that tesseract works better with other type of data for example
with documents and also it's very important to preprocess the data in a given way in order to
use tesseract and it's also very important to use the most... the best configuration in order to
use tesseract which is another thing we are not using right now you can see that the only parameter we
are putting as input into this function is the language but then we are not doing any other
configuration whatsoever, so long story short for all the tests I have been doing so far I
noticed that tesseract doesn't perform very well and tesseract performs better with other type of data in other
conditions right, so this is very important we are going to use it anyway because remember this
video is about showing you how to use each one of these technologies and it's also about doing a
comparison between them and also this comparison we are going to do between all these different
technologies we are going to input the image without doing any preprocessing whatsoever and
without specifying any configuration whatsoever right, this is the experiment I'm going to show
you in this video because I have worked in many projects involving text detection and I have
noticed that in some situations in order to achieve a very good performance you need to do
a lot of pre-processing and you need to do like a very very detailed configuration, you need to
see exactly what are the parameters you use and so on and finally you achieve a very good performance
but you need to do a lot of reprocessing and so on right, and doing all these preprocessing
and so on takes a lot of time, it's a very time consuming task, so that's exactly why we are going
to do this experiment like this we're just going to compare the performances without doing any
preprocessing whatsoever, we're just going to take the image and we're going to input the image
like this and also we are just going to use all the default parameters so we're just going to
compare all these functions like out box right, we're not going to do any preprocessing whatsoever
because that's how I like to do text detection right that's how I like to use all these different
technologies, I like to just use them from a very high level without really minding about all the
preprocessing and all the configuration and so on so this is exactly why we are going to do
the experiment like this, so long story short this is the output we got by using tesseract and now
let me show you how to use easyocr with which is the next technology, I'm just going to open a new cell,
something like this and this is what we are going to do the first step will be to import easyocr so
I'm going to say from easyocr import reader, remember this is a library we have already installed over here,
we installed pytesseract, easyocr, pillow and we also installed boto3 which is the other Library we're going to use
in the next step, so we are over here from easyocr import reader, we are going to create a reader
first, this is going to be something like this... reader... and we need to specify the language
which is English, and then this is what we're going to do, I'm going to call this variable
results, and we're going to call reader dot read text... and then we are going to input exactly the
same as we did over here, image open and the image path let's print results first and then we are going
to do something with the... with the output we are going to do something with this object but in
order to move one step by a time let me show you how this variable looks like, okay we got an
error and this is because this is not read text with a capital T but this is only read text with
a t in lowercase right so I'm going to execute this cell again okay so this is what we got now
let me copy and paste this output over here so I can show you better so this is all the output
we got I'm going to zoom in... over here so let me show you this is all the information
we got, so you can see that for each... for all all the text we have in this image... which if I
show you the image again... the this is the image we are using it's the same image as before, and
you can see that we got the location where we have detected the text, then we have the text we
have detected and also the confidence value for this detection right, this is how confident
easyocr is regarding this detection and this is what we care about in this project which
is the text we have detected we don't really care so much about the confidence value we don't
really care so much about the location in which we have the tected this text but this is the
only information we care for this project so let me show you how to take this information in
order to continue, I'm going to define another valuable which is text and this is going to be an
empty string for now but I'm going to do something like this, for result in results I'm going to
say something like text equal to text plus result... one... let me show you again remember this is
how the data looks like and if we access the index number zero we are going to access the location,
the bounding box, if we access the index number one we are going to get the text and if we access
the index number two then we are going to get the confidence score and in this project we only
care about the text so this is why we are going to access this index over here, and then I'm also
going to... add a space in order to add all these different wordss we have detected but in order
to separate all of these wordss by this character over here which is a... white space right, and then
I'm going to do something like text equal to text up to the element minus one right, because at the
end of this process once we have detected all the words and once we have... once we are here right,
we are also going to append a white space at the end and we don't care about the white space at the
end so we're just going to remove this white space at the end by just doing something this and now
let's print text and let's see how it looks like and you can see that this is the output which is
definitely not perfect but it's more similar right now it says 'it was all a Dion' something like that so
it's definitely not perfect but it's... it's similar so this is how we are going to use easyocr and now
let's continue and let me show you how to use Amazon textract, this is going to be a little more
complex because we need to sign in into our AWS account and we need to work into our AWS account
so let me show you so the first thing you need to do once you are logged into your a account
is going to IAM... and we are going to create an IAM user so I'm going to click on users... create user,
and the username will be something like textract tutorial... user... something like that okay, next, then I'm
going to attach policies directly and I'm going to select this one over here which is Amazon Textract
Full Access and I'm going to click on next then I'm going to click on create user
and that's going to be pretty much all now I'm going to select the user I have created
which is this one textract tutorial user and I'm going to click here on security credentials
and I'm going to create access keys I'm going to select local code and then 'I understand the
above recommendation' and click on next... create access key and that's going to be pretty much all
for now this is the first step in this process which is creating this access role now remember
something which is very important remember that absolutely every single thing you do in AWS
is not free but you need to pay for it this is very important so please remember that for
everything we're going to do from now on from all the times we're going to call textract then
we will need to pay for all these executions right, most likely we will need to pay for it just
keep that in mind in order to continue with this tutorial and now let me show you how to use
textract so this is the access role we have created this is the user we have created and this
is how we are going to use this user let me show you, so I'm going to create a new cell over here
and this is what we're going to do I'm going to import boto3 which is the python library we
are going to use in order to use textract and then I'm going to define two variables one of them
is access key and the other one is secret access key okay and this is where we're going to copy
and paste these two values over here so the access key over here and then the secret
access key over here please remember to keep these keys private right remember these
are your private access keys so do not share these keys with anyone right and never do
something like I am doing right now which is just making a video with all my keys publicly
available right never do something like this and never ever share these keys with anyone so
let me show you how to continue... in my case obviously it's not a problem because I'm just
going to delete these keys once this tutorial is completed but just remember to never share
these keys with anyone so I'm going to create another variable which is textract client and
this is going to be boto3 . client I'm going to input textract and then I'm going to input
some parameters one of them is aws access key... id... and this is going to be equal to access
key... then the other parameter is secret access key... this is AWS secret access key... and this is going to be
secret access key... okay and then the region... in my case this is us east 1 right but it's going
to depend on the region where you have... where you are working in right in my case this Is us
east q but in your case this may be different now let's continue and now I'm going to keep
on using exactly same image as before so I'm going to do something like this with open... I'm
going to use the same variable which is image path rb... as im... and this is going to be
something like this response equal to textract client... detect document text...
and we are going to input the document like this... bytes im.read okay and
this is exactly how we are going to use textract let's make sure everything works as as
expected so I'm just going to print response and let's see what happens okay so I made a
mistake over here This Is us east q like this let's see now, okay and this is the output
we got and we're going to do something similar as we did with easyocr I'm just going to copy
this value and I'm going to paste it over here and you can see that in this case we got
a lot of information this is definitely a lot of information we got from textract and also
you can see that the text we are looking for is over here, here it says 'it was' and then here
it's 'all' and then if I keep on looking I guess I'm going to find the remaining text over
here it says 'dream' and so on right so let me show you how to parse all this object in
in order to get the information we are looking for so I'm going to define a variable which is
called text this is going to be very similar as we did before with easyocr and then I'm
going to parse this response like this for item in response blocks if item block type equal to line then we are going to do something
very similar as we did before text is going to equal to text plus item text with a
capital t plus this empty space and then we are going to do exactly the same as we did
over here we are going to remove the last empty space we have added over here once we have
read absolutely all the words, all the sentences, all the lines right, and now the only
thing I'm going to do is to print text and let's see what happens, and you can see that in this
case we got 'it was all dream' so it's definitely not perfect because remember this is the picture
we are using this example as well and we have an 'a' missing over here so it's not a perfect output
it's not a perfect detection but I would say it's very good nevertheless so this is going to be
pretty much all in order to use each one of our technologies and now let me show you how to
compare their performances now we are going to iterate in all the images we have over here
and for each one of these images we are going to compute a similarity matric between the
ground truth and the output we got from each one of these technologies and then we're just going to
compare their performances, let me show you, we're going to use a similarity metric which is based
on the jaccard index and the way this index works let me give you like a very quick and a very high
level explanation of how it works the jaccard index will be the ratio between the intersection of two
sentences and the union of these two sentences right, so long story short this index is going to
give us how many words these two sentences have in common right we're going to input the ground
truth and we're going to input the sentence we got from each one of our ocr technologies, we're
going to see how many words are in common, and we're going to see how many words are in total
in both of our sentences and then we're just going to take the ratio between these two numbers
and that's going to be the jaccard index that's how we are going to compute the similarity metric in this
case right that's how we're going to compare the performances and this is why I decided to use much
simpler images like these ones over here with only a few words or with only a couple of sentences
because this is going to make this process of computing the similarity much easier and much
better and otherwise it would be much harder to make some sense of the results right so this is
the similarity metric we are going to use and in order to have a function that takes as input two
sentences and return the jaccard index between them let's ask chatgpt to give us this function so
I'm going to ask chatgpt give me a python function that receives two sentences and returns the jaccard
similarity between them so let's ask chatgpt to give us this function and let's see what happens,
okay so this is a function we got and chatgpt also gave us an example of how to execute this function
in order to see how it works so the only thing I'm going to do is to copy and paste this code over
here and let me show you how it works first and then let's compute the similarities and then
let's compare the performances so you can see we have a function which is jaccard similarity it
receives two sentences and then the output is a number and let's see what happens if we input
these two sentences over here, the only thing I'm going to do in order to avoid any mistake
and any confusion and any... anything like that I'm just going to remove the periods at the end
because just make sure we are computing things as we should right so I'm going to press enter
and you can see that we got a jaccard similarity of 0.28 now let's analyze this result you can see
sentence number one is this is a sample sentence and the sentence number two is sample sentence
for testing so let's count how many words we have in the intersection you can see that the
intersection is something like 'sample sentence' and that's pretty much all right these are the
only two words we have in common in both of our sentences so the intersection is going to be two
and then the union is going to be 1 2 3 4 5 we're not going to count these two words again 6
7 right, 1... 1 2 3 4 5 and then 6 7 and if I compute this ratio 2 / by 7 we got 0.285 right
which is exactly what we have over here so this is exactly how the jaccard similarity works and this
is exactly the similarity metric we are going to use in this tutorial, so I'm going to delete this
example over here we're only going to use this function and now let's do something I'm going
to create a function for each one of these ocr technologies we have over here so this is what I'm
going to do, the first one is going to be something like... read text tesseract, something like that
and it's going to receive the image path and that's going to be pretty much all
and then it's going to return... text, okay we're going to delete this image path over here
and then we are just going to take all this code... the import is going to be the
first thing over here and then... the reader too because we're not going to
define the reader every time we call this function and then we are just just
going to create another function which is read... text easyocr and we're going to input the image path as
well... and we are just going to do something like this okay... okay yeah this should work just fine so we're going to delete this cell and then
let's move to this other one over here we're going to copy the import first and then we're
not going to define the client and the access keys every single time so I'm just going to copy
everything over here and then we are just going to... do something like this... and
create another function which is def read... text... textract... image path okay and then we are going to return text okay that should be all, right? okay and I'm going to delete this other cell over here... and then once we
have these functions I am going to... copy this function over here I'm going to iterate in all the images we have in this directory so
for image path in os list dir content data... which is the directory in where we
have copied all this data you can see over here, we are going to iterate in this directory and the
only thing we're going to do is something like this image path will be os path join okay and we're going to import... os we can import os over here okay, that's pretty much all
and then the only thing we need to do... is to define another variable which
is grund truth and ground truth will be... image path... removing the extension and we're also
going to replace... this value... with a white space right remember the ground truth for
each one of our images it's the filename so we're just going to take the filename we are
going to remove the extension and we're just going to replace this underscores by a white
space and that's going to be pretty much all in order to get the ground truth now let's continue,
I'm going to have some counters or some other values which are going to be something
like similarity or maybe something like score... tesseract... which is going to be zero, then score easyocr
which is going to be zero... and then score textract which is also going to be zero, and then the only
thing we need to do... is to append... is to update this value... with the similarity, right? and another thing I'm going to do is I am going to
convert this to lowercase and that's going to be pretty much for the ground truth and in this case
I'm going to convert into lower case and I'm also going to replace all the... all the
new line characters I'm going to replace them by nothing right so I'm going to remove all these
new line characters and I'm also going to remove these other characters which are the exclamation point
the question mark and then we are also going to remove... the period... and that's going to be pretty
much all, we could remove even more characters, we could remove like for example this character
and this character and so on we could remove many other special characters but in order to
keep it simple let's just do it like this right which may be the most important characters which
we are going to find in these pictures over here so and then the only thing I'm going to do
is to do exactly the same with these other images okay... okay... this is going to be score easyocr and then we are going to read text easyocr
and that's going to be pretty much all and then read text textract
okay... and then we are just going to print score tesseract... equal to... this value divided by 100 because
we have 100 images in this directory so we are going to take the average
of all the scores we got in all the images, and we're going to do exactly the
same with this two other values over here... no sorry this is textract... okay... read text textract, yeah everything's okay okay... everything seems to be okay...
the only thing I'm going to do... I'm going to print, I'm going to execute
it for only one image for now and then we are going to iterate in all the images, so let's see
what happens I'm just going to... iterate once I'm I'm going to break the loop and let's see what
happens, I'm going to execute this cell over here so we execute all these functions so we load all
these functions into memory so we can use them later on and now let's see what happens I'm going
to execute this cell over here and we got a mistake... right, this value it it's not right... and
now everything should be okay, okay and this is what we got remember this is only an example
but I would say everything makes sense so far so let's just run exactly the same function on
all of our images and let's see what happens and these are the results you can see we got a
0.01 with tesseract, 0.21 with easyocr and 0.34 with AWS textract, so in this experiment with this
data and under these conditions AWS textract is the best of ocr technology so this is going to
be all for this tutorial, in a future tutorial, in a future video, I may make another tutorial on
how you can use tesseract as an ocr technology because you can see that in this case we didn't have
a good performance using tesseract but this is actually a very good and a very powerful ocr
technology so I may make another video where show you exactly what's the data you can
input into tesseract and exactly what's the pre-processing you should do on that data and
so on in order to achieve some very good results using tesseract but for now this going to
all my name is Felipe I'm a computer vision engineer thank you so much for watching
this tutorial and see you on my next video.