How to Train a Custom Invoice Text Extraction OCR Model with YOLOv8 + Paddle OCR

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi everyone, I'm Rama, co-founder and CEO of Theos AI and in today's video we're going to take a look at how you can train your own custom OCR model for extracting the text of invoices. Alright, so the first step is to sign up to Theos obviously, so go here to the website of Theos AI and click on the sign up button here on the top right corner, then fill this form and you will have a free account on Theos to follow this tutorial. Alright guys, we are in the library section of Theos now, but before we get started I wanted to address a very common misconception when people try to train their own custom OCR model. What you really want to do instead of training your custom OCR model is to train your custom object detection model to detect and localize the text within the image and assign a custom class to each text. For example, the total price, the name of the buyer, the products and the date of the purchase. The invoice for example. You want to have classes for each type of text and then use a default pre-trained OCR model like Paddle or a big transformer model that already works very well. So the only case where you really need to train your own custom OCR model like Paddle for example, I have a video about that, I'm going to leave a link to it in the description down below so you can check it out. The most common mistake you make is when you have your default model for OCR not working very well. So this may be the case for example when using it on real life images or strange scenarios such as license plates for example, but for it to work well you'll need several thousand images to train it. The Paddle model for example. It's not easy to train so mostly this is for companies that have a big budget to hire a big labeling team to help them out. But if you're not a big company with a lot of money, I suggest you just to build a custom object detection model and then use that model with a pre-trained OCR model like Paddle for example or another one. Alright guys so now let's create a new dataset. I'm going to call it 'Invoices'. Okay now we have to upload the images to train the object detection model to locate the text. And before doing that I wanted to also address a very common misconception of people. It's more or less misunderstanding I believe. They just upload 20 images to the dataset, label all of them in like five minutes or less, and then train the model and obviously with that few images the model won't train well at all. It will be very bad at detecting the text or any object and they will give up basically and stop labeling more images. What they should do is upload at least 100 or 200 images and label all of them, then train in each model that with that amount of images I believe it will work somewhat well. Okay but what you want to do if the model doesn't train well, if you have zero percent accuracy for example or a low fitness, you want to upload more images and label again and again and again and again until you get that accuracy to go up. Because all of machine learning models and object detection is no different require a lot of data to train well. So you need to upload a lot of images and label them. That's the game at the moment so you have to do it. So don't get discouraged if you see that the model is not training well because you labeled 20 images. That won't work anytime soon. So please don't give up and upload more images and label again and train again. And I promise that the accuracy will go up a lot. All right guys so here we can see some examples. Of the images that I'm going to upload. So yeah you can see that these are invoices basically some real some digital. Yeah we have prices, sales tax, the products, the billing address, the shipping address and all of that, the date. So yeah let's go ahead and upload this. Okay so I'm going to upload 733 images now and label all of them. This is another thing that I want to have very clear. If you're going to upload 700 images you have to label all of the images before training. The 700 images before your initial training. Because if you don't label, if you label some images in the dataset and leave other images without labels, the model will still try to predict the bounding boxes on the images that you didn't label. So without labeling the images, the model will think that it's doing wrong when it's trying to predict correctly the bounding boxes around the text and you're telling it that it shouldn't label anything there when in fact it should label the text in that image. So you're going to hurt the performance of the model if you don't label all of the images before training. So if you want to level up fewer images than this, I suggest a lot of images that are labeled. suggest you to just upload 200 images for example, label all of them, train an initial model and then if you want upload more images after that. But I'm going to upload the 700 images and label all of them anyways. So yeah. Alright guys, so now let's go here and click start labeling on the top right corner and this will get us inside the dataset labeling tool. Okay, so the first step is to create a new class. Let's create a class and call it for example products. We want to label the products here. So let's name it products. Let's change the color here. I'm going to use this color for the products. So yeah, we created the first official detection class. We're going to select the bounding box tool and go ahead and make a bounding box. Now let's go ahead and create a new one for the... price, the total price. Total price. And label here the total price. Let's do that for the subtotal price. I'm going to call this price without tax. Let's now create the billing date, the delivery address, the VAT number, the client's name and finally the invoice number. Alright, so to submit the image you have to click the submit button on the bottom left corner. Or you can also click on the submit button and click submit. You can also go here and check out the shortcuts that we have that will accelerate your labeling process a lot. So I'm going to use a shortcut to submit the image which is the E key on the keyboard. You have to press the key like this and your image will be submitted to Theos. Yeah, you have to go then with the next invoice, label it again, submit and repeat this process until you label all of the images in the dataset. Alright guys, I finished labeling all of the images in the dataset as you can see. And here you have all of the classes that I created with all of the number of labels for each class. Another thing that I want to tell you is that you should strive to have a balanced dataset. And that means that all of the classes of the dataset have roughly the same number of labels. Here you can see I don't have a very good balanced dataset. Because... The delivery addresses are just 87 and then we have the products that are 728. So if you have a dataset that is not very well balanced, I suggest you to apply this method of abstracting some objects from some images in order to equalize the number of labels between all of the classes. Alright, so now we're in the train section of Theos. Finally, we can train an initial model. So let's click on New Training. Let's call this Invoice Detector. Let's select our dataset. Let's select the algorithm. We're going to use YOLOv8 Extra Large. Because we don't care really about the speed of this model, but we do care a lot about the accuracy in this case. So we're going to select the largest model from YOLOv8. Let's click Next now. And this is the machine that will do the neural network training. So as I'm on the paid plan of Theos in the business plan, it comes with two cloud GPUs that you can use to train models in parallel. So yeah, I'm going to choose now. If you are on the free plan of Theos, you can go here and use Google Colab or connect your own machine if you have an NVIDIA GPU running on your own Linux and Python 3.10 at the moment. So yeah, I'm going to select the cloud GPU from Theos and click Create. This is the training session. As you can see here, we have epochs and batch sizes. Epochs, if you don't know, are the number of times that the model will try to predict all of the images in the dataset. So 300 epochs means that the model will predict 300 times the same images in the dataset. And with each iteration, it will be getting better and better at predicting those images. And the batch size is the number of images that the model will see in parallel. So obviously, if you increase this number, the model will train faster, but will also require more GPU memory. Okay, so let's go ahead and click Start Training. And now we just have to wait. The training metrics should start coming into your browser in real time. So yeah, go grab a coffee. And wait until the model finishes training. Alright guys, so my model finished training. I stopped the training really because it wasn't getting much better. And I want to try it out. If we need to keep training, we just have to create a new experiment and set as Initial Weights the weights from the previous experiment. So Experiment 1. We can use the weights from Experiment 1 as an initial starting point for the training of a new experiment. So it's basically like this. It's basically like resuming the training experiment. Alright guys, I deployed the model here on the Deploy section of Theos. And yeah, let's go to the Playground and try it out. Okay, so you can see that it correctly detected the classes, the products, the total price without tax, the VAT, the total price, the client's name, the billing address. But it didn't perform the OCR. For that, we have to go here and select the, for example, Medium OCR model. Let's check all these boxes. Alright, let's try it again. Perfect. So we can see here that it correctly detected the prices. 2300, 170, 2470. And let's see here. 35JEST HEIGHTS, Hilton. And here we can see there. But yeah, here you have the JSON output that you will receive when you use the API with the positions of the bounding boxes and the classes and the text inside each bounding box. If you want to send images to the API, you have to call this URL from your software with making an HTTP POST request. Here you have the docs. I will leave a link to them on the description down below so you can check it out. Here we explain all the parameters. So you can change, for example, the OCR model, the classes you want the OCR model to detect. The small model has a language option. And here you have a few examples of commands that you can use to use the API using the terminal or with Python or also with React. So, for example, you can use this to make Python scripts. That takes thousands and thousands of invoices and process them in batches in parallel batches of 100. So you can process them super fast using the same API URL, basically. All right. So let's try now a few more examples. All right guys, with that we finish the tutorial. Please like and subscribe if you want to see more of this content. Yeah, I also wanted to let you know that if you're a business, you're an employee at the company or a business owner and need to solve this for real, you need a fast way to process all of the documents that you have. It doesn't need necessarily to be invoice, it can be any document basically. Please contact us at our business email contact at Theos AI and someone from our team will be happy to help you. Also, I wanted to tell you that you should join the Theos AI University WhatsApp group and the Discord server. I will leave a link to them in the description down below. So yeah, see you guys in the next video. Bye bye.
Info
Channel: Theos AI
Views: 1,605
Rating: undefined out of 5
Keywords:
Id: VskF2gClZ_A
Channel Id: undefined
Length: 15min 14sec (914 seconds)
Published: Wed Nov 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.