Step-by-Step Handwriting Recognition Tutorial Using TensorFlow

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome back to my another tutorial last time I showed you how easily we can crack simple captchas how we can solve them with python so right now I decided to move on to harder tasks because it's pretty hard to find a captcha's data set that I might use I I decided that why not making handwritten words recognition they're very similar to captures they might be hard they have different length and text so I decided to move on with this so I decided to use IIM data set that's kind of Open Source data set that has a lot of board sentences Pages handwritten text and this is open sourced and released pretty 20 years ago but who cares if I want to learn to recognize handwritten text as well you can change the handwritten text to captchas and here we go you have a solver for your captchas later I'll move on to a harder task is handwritten sentences recognition and this is the part of OCR system that where's the great point to start beginning your own OCR implementation but talking about uh this handwritten text it nowadays it's still very popular because usually we need to for example read a whole handwritten text document the easiest way is to scan document and give it to the algorithm to recognize each word or each sentence transcribe it into readable text and move on so also you can use it for your students exams creating or essay reading or I don't know old papers to digitization from handwritten to digital text and Etc there is a lot of opportunities where you could use this kind of handwritten recognition and one very great point that this kind of data set is used for benchmarking your OCR with hen verta you can find online a lot of ocrs where they tell us the Benchmark that test data seller Etc how well it they were able to solve this kind of handwritten texts so I decided okay let's move on with this kind of implementation first thing uh I I think that you need to install the requirements I will use Python 3 I'll use tensorflow 2.10 and it's nice to know don't go to 2.11 if you are on windows so because they change a lot of stuff there and next I use my MLT package 0.1.5 for this tutorial so I recommend to install this kind of version in the future it might change because well uh I will be working on this repository improving it adding a lot of stuff changing books fixing everything documenting uh I'll I'll try to improve this kind of library for all of us to use it massively okay so let's move on with this kind of task and I'll first want to show you how the data set looks like as you can see there's a lot of boards it's pretty hard to read for us of course there's a lot of free easily readable text but I don't like the Dots here they are misunderstanding in this data set but okay never mind right now right now for example I have this kind of data set I have all these words and Etc and I downloaded this from a specific link and the link you can find in my text version tutorial I'll add the link below this video so if I go to Here There is a board's annotation file where it gives us the pass paths all to the these words and Etc gives us all the documentation so this is the ID of the folder then the fold another folder ID and there is a full name of my target image and this tells that for example it's it's everything okay with this it's annotated correctly and for example if we go down there is an error as you can see so we will skip these kind of errors and there's at the end there is a grammatical tag there's the actual value for our text I I mean I think that's the last sign this one yeah so I will need to create this data set so right let's move to the code I believe that's the most interesting part how I implemented everything in code I have here a handwritten recognition tutorial it's a train Moodle inference model and configurations I'll move on to all of these with a little amount of time for you just step by step so here is our all my mltu uh training stuff the losses callbacks metrics augmentors Transformer pre-processors and Etc and Drake the provider but first thing we need to download this kind of word data set it's pretty huge data set I don't know it I believe it's four gigabytes uh how much how much it takes oh oh it's not too large it's 800 megabytes but still it takes some time to download to download this it's I I wrote a short script that uses our requests library to do so so simply you need to give a link to the to this ZIP file where it lays down on the internet and my code will handle it for you and it it will you'll see that tqdm bar going from left to the right while it's downloading so wait for it to finish and then you won't see but it also takes a lot of time to extract these images because there's a lot of files in this raw so yeah it takes a lot of time so don't hesitate to wait so I believe it's way easier to run this script than downloading it manually and placing it into Data sensor Etc this file will will prepare everything for you so next we need to work with this annotation file that I already show you and the idea here that we are skipping this kind of sign if if it has is at the beginning you can see then we skip the board if it's uh an errored bar and then we uh split what the IDS of the folder we collect full path to our uh file name and last thing we take the last element from our list that's the actual label so that's the last word so Mr gate scale and Etc and that's it we collect this data set and it will be appended to this file and this will be listos list where the first element will be full path uh now it will be a relative path to my data set and the second will be the actual label then I collect the vocabulary that is kind of all the different characters that are in my labels and that I collect the maximum length of single word so this is also going to be used and then I make a configurations and if I go to my configurations you can see that here I have my modal path where I will see my trainable model I have vocabulary that I'll update with when I iterate through data set I have different width hate maximum blank that I'll update batch size Learning grade training epos training workers and Etc that's kind of basic stuff we go to the train and here I use update my configs as I mentioned and then I use my data provider for this kind of data set that's my custom script made for for tensorflow models it's pretty useful very useful and it's inherits from sequence but it has a lot of features that for example you can feed any data and we process is use any way you want so for example right now I have images so I give it an image reader and it will read all the images from here I could give a resizer here to data preprocessors but I'll give I'm giving it to image to the Transformers because actually we are transforming image size then we could use keep aspect reachers true uh it will when we are resizing our images then for example the left and right sides are padded if that's an A but for now I decided to train it like that and if you're interested to to test how it works with with padding and without padding it's up to you but you'll need to change the inference model code for especially for that so then we use a lab label indexer we need to transform our string into numerical type so I need to transform all my labels according the vocabulary to the index then later after the model is trained it will be really easy to me to transform it from integers back to string words and then I need to use a label padding for so because our models are accept only the same size input all the labels should be the same size so this means if we for example have a and we another word is home we need to a we need to add additional three padding symbols well the padding value will be the additional value from vocabulary Rank and yeah that that's that's it and other thing our code will handle for us we don't need to worry so next I'm splitting the data set uh into ninety percent or ten percent for validation and that's it and of course I'm using the augmenters and I will only apply these augmented to the training data set because well we want to achieve better results and there's no better results than applying for for example augmenter so this means that we have that's a similar way to getting a larger data set okay and then I'm defining my trainable model and it's very similar to the previous tutorial where I showed how to train a captcha but right now I I believe I have different trials here because my words are longer so I need more more details here so that's it about that and then I simply compile my motor with Adam Optimizer I use CTC loss of course it's one of the easiest way to recognize word sentences sound from and Etc and of course are using character word error rate metric in the next tutorial I'll use another metric I'll separate character and word error rate because well it's not a good practice but for now it's it's totally fine and then I use to print the summary and then I'm using uh six different callbacks and yeah it's early stopping not to train forever our model checkpoint only to save the best model or according to the validation character error rate and this is kind of rate that shows us how many mistakes are predicted word did according to the true labor and of course we want us less mistakes as possible so we track this character it already of course we track the word error rate then it means how many correct words we were able to predict but right now I'll Focus only on child error rate some kind of logs training logs that's for debugging or Etc tensorboard I'll show you some graphs of the tensible not to wait me to to train the model while I'm recording it might take a lot of time and of course decrease the low the learning rate while it's not improving after 10 epos well it could be better but that that's totally fine for now and of course I'm converting my model to own in X format um after it finishes training and yeah I believe that's that's it and then I call fit and then I save my training and validation data set together along the model in case I would be interested to changing architecture of the model in case I would focus on increasing the chart error rate or accuracy or Etc well or I add additional augmentations that that that's a good way how we can validate our models and uh and yeah if you're interested I can run the training of course and we'll see the model summary and see that this code works just fine and it it will start training but idea is not about that so okay yeah let's wait it to finish loading the data okay as you can see we had a huge data set that's 150 thousands of different handwritten words images that's a lot compared and you might see that this is the architecture of my model there is I I grabbed some ideas from the resnet implementation to implement this kind of or residual blocks or commercial blocks that know how you call it and this these have a skip connections to to better work with with this uh kind of convolution networks not to not to overfit the model and Etc and it if I go back here you can see the input is 32 by 128 to resize the image and then uh at the end I have some kind of reshape where I multiply this 4 by 16 and I will shape to 64 by 64 and then I feed this information to my lstm network and after lstm Network I simply received this kind of values and this information goes straight to the CTC net loss and it goes back and then well it's 75 time steps and 64 length of the whole string that will be decoded with CTC that's if you're interested what the CTC and haven't watched my previous tutorials I recommend checking them and in the in my first tutorial I explained how the CTC Works more in detail and you might be get familiar with that because well I don't want to explain it again and that's it as you can see it keeps training and and yeah who cares about how it finish and I would like but not to wait to finish I already trained the model before and let's go to tensorboard I'll show you the results I need to cancel this island one two very cool use my GPU okay I go back and I here I have my tensible and as you can see I trained it a few days ago yeah actually yesterday and yeah it goes down like that and you here you can see that orange is uh training and blue is validation and we definitely can see that the chart it was already was decreasing and model was improving Etc but well what's worrying is these kind of spikes on the validation data and it's really really hard to explain why they're here but it might be that that might or my training validation data is not perfect because of these commas dots and other signs that are simply without a word this is a headache for our model to learn what they actually are so I believe that's the reason if I if I would be removing them we could get a better result but that's not the point of this tutorial right now I want to show you the the way how easy it actually is to to train it and here also is the word error rate that gives us how many correctly boards were predicted with our model as you can see a 20 percent uh well actually 80 percent of words were bad and mostly that's possible with this where with signs dots and Etc so yeah we'll need to work a little more with this data set with pre-processing Etc if you want to get a good result but right now that's not the point of this tutorial so uh let's move on and I I believe you want to see how actually it works and if I go to my model I have here an inference model a script that already prepared for you and here is my train model that you can download from the Link in my textbook tutorial I added the link where you can download the model if you don't want to train and of course the validations SV files and Etc and here I simply uh reading the image and using the model to predict and the model is here image to word that's an on an X format model loaded and why I use onanx because for example if I would be interested to Port this somewhere else to different device I don't need to install tensorflow or Etc huge libraries I simply need to install on next runtime and I'm good to go and yeah I'm going to go over actually and of course I will resize the image for you and I'll show you actual predictions and yeah let's run this script and we'll see how this actually predicts our image what results we get so let it load what the hell is that okay okay um sorry this should be image show yeah that's correct let's reload this I wrote that by mistake that's my mistake sorry for that and let's look at it how it looks like ah loading oh no it's here so this is my first image and you can see that the prediction was pawn and label was upon that's correct it's even hard to say what it's written here but not not that's right this uh appears appears well well it's not that easy for me to tell what's actually is there and you can see that my model it predicts the actually correctly that's the comma who the hell knows that that's the comma that's kind of because label because and prediction because that's fine and premiere a well there's a lot of flip oh here we have an error August day oh ah it predicted R here so that's cool that's cool and you can see that that works pretty amazing and that's the easiest way to train any word detector you want by using my code of course you can find it on GitHub the link is below this video you can find all the video details how you how you can train it within the code Etc in my text version tutorial and of course don't hesitate to like share this video and leave the comment if you have any questions or suggestions what I should do next and I really hope that this video will be useful for you if you are learning python you're developing your projects or even if you are working or doing something for your for example task job or Etc it's up to you where you're gonna use I I really hope you're gonna like it because well I appreciate when that's helpful because I spent some time doing this writing documenting making it clear understandable and open source of course nothing pays me for that and I really hope that we'll see you in a next tutorial because that's the end of this one and in the next part I'll try to recognize full sentences instead of words and I'll make another tutorial and then we'll go to the hard stuff with our sound recognition we'll see what we can do with that so thank you for watching and see you next time goodbye
Info
Channel: Python Lessons
Views: 40,348
Rating: undefined out of 5
Keywords: handwritten digit recognition, deep learning, machine learning, TensorFlow, CTC Loss, Captcha to Text, Text Extraction, Custom OCR, Image Recognition, Training Model, Machine Learning, Python, Step-by-Step Tutorial, Image Processing, Data Science, Artificial Intelligence, Computer Vision, Data Analysis, Neural Networks, Model Training, Computer Science, python lessons, python tutorial, handwriting recognition, OCR, IAM Dataset, CNN, LSTM, Character Error Rate
Id: WhRC31SlXzA
Channel Id: undefined
Length: 22min 25sec (1345 seconds)
Published: Mon Jan 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.