OCR-SAM | Optical Character Recognition (OCR) with Segment Anything Model (SAM)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone today I'll show you OCR same step by step implementation in Python and before going to implementation I will show you some basic idea of OCR same all of you know that user is the optical character reignition and same is the segment anything model so OCR is used for recognition and detection of the test from the image and same is used for Segment anything inside the image so we know that same can be applied on OCR model where OCR same is the combination of of the Shelf ocean model mm OCR and same which can be put Mass on the deleted text and several application can be developed using OCR same like segment test from the image text removal from the image and test in painting basically mm OCR is used for direction of the text and recognition of the text and after take detection by mm user model same model will put marks on the bounding box of the detected text here you can see the original image and inside the image you can see some text over the image so first we will use mm OCR module for detection of the test and recognition of the text mm OCR model so it can detect the text and also it can recognition of the test for detection of the test we use dbnet module foreign ation of The Taste we use ABI net model mm OCR model written the bounding box of the detected text and this bounding box will put the coordinates of the bounding box to the same model and same model put marks on the detected bounding boxes so how we can implement this step by step in the python I will show you now first you need to open the python and then we just need to clone the OCR Sim just you need to copy this command and paste it in the python terminal and then automatically OCR same directory will be created here after clone OCR same directory will be created here and then you need to install s towards vision and supervision here you can here you can use just install doors [Music] 1.12.1 version here you can see the requirement already is satisfied because I have already installed this then you just need to install vision zero point 13.1 requirement already satisfied then you need to install supervision 0.11.1 and then you need to install open meme mm engine and mmosphere so you just need to copy this command and paste into python terminal and requirement already satisfied then just need to meme install mm engine paste it here the requirement already satisfied and then you need to install meme install mm push here you can see also requirement already satisfied and after that you just need to copy this line and paste and paste it into python terminal always it is showing requirements because I have already installed all of this package then copy this I'll give you all this command in the description box in my video hope it will help you then you need to install mmcls now I need to install segment anything module doing that I just need to copy this just copy this command paste it here so I have already installed the segment anything model and also you need to install the record package for Sam so you can copy this command so requirement.txt now you need to create some directory inside the project in the OCR Sam we just right click here and create new directory wait inside the words directory we just need to create another three directory because because all of this process we use three model weights one is for detection and another is takes recognition and another is for Segment anything model so we just need to create three directory for direction we use DB net for recognition we use Avi net model and for segmentation we use same model that's why I created three directory and then you just need to download all of this module 08 for downloading to a to a you can download this model words one is just you need to copy this command and paste in the python terminal or you can just manually download it from this link just copy this link in the Chrome browser just paste it here and they'll just press enter then automatically here you can see the model where it will be download here so I have already downloaded this model so just cancel this download this is for API net and another is for DB net so divinate you just need to use this link just copy and paste it another terminal and just press enter then automatically divinate model which is used for detection of the test will be downloaded here I have already downloaded that's why I can see it and last one is segment anything model 08 to download the same and anything model 08 we just need to copy this link and paste in the Chrome browser and this and then just press enter to download same module manually we just need to search here same model GitHub then here you can see here you can see the GitHub link of the same model then you just need to scroll down and here you can see the model set point you know we all of you know that the same model there are three different same model one is beat ace another is bit Al and third one is VDB feed Ace is heavy model and bit L is the large model and hit B is the beta model so beta model is the faster and it is suitable for implementation in the PC without GPU so we just need to click it here then automatically same model where it will be download yeah you know that I have already downloaded this same model word that's why I have canceled here I have manually downloaded all of these three module 08 inside the directory API net recognition of the text DB net for the detection of the test and same model for Segment the detected text first I need to copy Avi net model wait just copy it and paste it here so here you can see this model 08 and then we need to copy the DB net model for it just copy and paste it here you can see the divinate model of it and finally we need to copy the same model weight so just copy this model weight and paste it here and finally we got all of this model wait inside this three directory [Music] now we just need to create a python file first we need to import dots CV2 [Music] supervision then you need to segment anything module same model same model registry and same predictor to import mmosphere from mm OCR Dot apis apis dot inference import mm equals here Dot details Imports according to box in the first step we just need to set the device as I am running in my laptop and here is no GPU so we just need to check first cross Dot your eyes [Music] it is CPU paint device now we need to check is it working or not yes you can see here the CPU because I am implementing this OCR same model in my laptop and there is no GPU here and there is no erode successfully we have imported all of these packages and there is no issue found here assign parameters of same model so uh you know that there are three types of same model you feed a is for a heavy model with L for large model and if it is a bit B for beta model here we will use heat b as it is faster so same model [Music] hit B now Sam model registry thank you same model type then we just need to ascend the checkpoint to ascend the second part we just need to go to the weight directory and then same and just right click here and copy path then just need to copy this and paste it here and then we just need to same predictor variable and Sam predictor we need to assign a variable here same and then here you can use same now you can check it so there is no issue successfully we have assigned all the parameters and the same model C Point successfully and then you need to assign all parameters of the mm OCR model now you need to assign all parameters of the mm OCR model we need basically four parameters one is detection configuration path detection con detection weight and another two is recognition configuration path and recognition weight path so detection config here integration config path detection weight path recognition [Music] config path and recognition [Music] wait path so how you can get this parameters uh so how you can get this path so first we need to go mm osir Dev directory then here you can see the config directory and inside the config directory you can see it takes detection and takes recognition so instead the text detection you can see uh to python file so here the second one [Music] would be the right path of detection config so copy it and paste it here and then we need to copy the text recognition path so here this one is the right config path of text recognition so you just need to copy and paste it here now we need to assign detection weight path so how you can get we have a director white directory and here we keep all the three models so it inside the directory so all of you know that for detection we use DB net module so we can go to DB net module and here you can get uh the width file of the unit model just you need to copy this path and paste it here and then if I need is used for text recognition and here you can see the model wait and we just need to copy this model poet path and paste it here then we need to call this mmosphere model so mm was here currents is MM influence here we need to assign Pi variable there is some issues here we just need to remove this comma just need to remove this comma model so yeah is MMOs here inference so here we need to assign five parameters one is recognition is recognition config then recognition oh it is recognition weight then detection or first you can assign the detection so detection config is detection config [Music] uration weight is equation wait then we just need to assign the device so devices so uh now we can run and check it is it working or not now we just need to now we just need to read the image uh from here I I have four Images here then you can read any of the image using CV2 we will read this image which number three and then we need to change it to RGB format then we need to resize the image send this image to my motion model and we'll check the result Send image to mm OCR model so now we need to send this image to immersion model for getting the result so there it is mm OCR inference this inference and just we can send this RGB image to here and you can check the result keys uh we've got two keys petitions and visualizations so inside the predictions we'll get the detected bounding boxes and the recognition of the text so we can put it here the predictions and we can print the result uh predictions [Music] here you can see the recognized text and recognition scores indicated polygons and detected score so we can so insert the result uh result zero we get some dictionary type variable recognize text ignition is scored and deleted polygons so we can keep this result individually in some variables like detected polygons needed polygons is result just copy and paste it here then you're calling us so then we can then you can keep the recognize text recognized text is result this one [Music] and we can also keep the score recognized score result now you can take it individually polygons some nice text and artistic score yes you get all of the result deleted polygons detected text and the deleted score you can see the image here is the text so it's correctly recognize the text so now you need to convert this detected polygon into the bounding boxes so how we can do this [Music] boxes is you know answer then okay we need to 42 boxes and put it the only available and here we can try it for [Music] detected polygons then you need to assign another variable is device is now we need to transform these boxes by Sam predictor Sam predictor dot transform the apply boxes stores then detected boxes and a to the [Music] who I need to send the image shape in this way we can transform this boxes [Music] and now you can print this transform boxes [Music] yes uh we have got the tensor on the transform box now we need to uh uh we can deactivate this all of these paint so yes we have got the bounding boxes two bounding boxes now we need to send this coordinate of the boundary boxes to the same model and same model the generate must over the text so how we can generate Mass using this coordinate of the bounding boxes predict Dot set image image then marks and same predictor Dot predict that I need to give some parameters here Point coordinates is also known basically we'll send bounding boxes so box into mine box but how you can get or we can just copy this point and paste it here uh maybe you need to convert this to NP array yes then you can see the result let's go fast if it is okay then you will get some output here this is actually transform box and the same model will generate the mass and the score already waiting for this result of Sam predictor yes we have got the marks that means it is what the same is working so now we need to plot before before going to [Music] a plot we need to uh check the boxes and keep it in some variable so we can we can make a loop here instead of manually putting the bounding boxes we need to find all the bounding boxes from the transform boxes so for my box in transform boxes foreign box is MP and my box so inside the loop you can keep this and here we need to sense my box so in this way uh we can generate marks for all the bounding boxes to display the original image and the marks image we need to use supervision to annotate the bounding boxes and put marks on the bounding boxes basically marks on the recognize or detected text the same model will put marks on the detected text only so here you can use supervision to annotate the Box annotator is sp Dot box and editor here you can use the color SB Dot color we can use red color and then we need to assign another variable mark sanitator basically we can assign this variable over the loop here we can assign box editor and Max annotator okay and then here you can use blue color okay here we can use detection SB Dot detections and then [Music] you must so that this mass is to repeat mass by the same and then another so now it is okay for the detection basically this code is only for plotting the Mars on image sorting marks on image using supervision disabled but Mastermind using supervision so we need to write another line predictions is [Music] Max [Music] area now we need to annotate the image this is [Music] not sanitator dot annotate is [Music] USB here Dot copy detections then you can plot thank you the next protein is good then we need to assign the grid size good size is one two then we need to set a title thank you yes then humans images is [Music] USB 0 and [Music] um so we can assign this variable at the top here you can assign [Music] Ed here [Music] yes we have got marks on the single text but in second test we did not get any marks maybe there is some issue we need to check and we can also draw the bounding boxes over the image so what can I do we can add another line annotated image is for drawing the boundary boxes box annotator Dot edit [Music] see it's position this line will draw the bounding boxes uh uh we can assign another variable here and copy the original easier image to 29 image is image busier so here you can uh animated image here you can write original image then here we can put the foreign in an image I think this will solve the problem uh and then there is some issue that uh it can put Mass only one bounding boxes so here you can print my box the issue Maybe we need to add another line here that will solve the problem the original image is hope this will solve the issue and events I think this is on the issue let's check here we have already got one burning box and it's working for another bonus here we have here you can see one mounting box and same is working on it it takes little bit more time so there is some issue you can solve this issue by changing by resizing the original image actually uh so we need to resize the original image here you just copy one more line and original details hope it is solve this shape issue I have made a little bit type mistake here bdnet this should be divinate so we need to change this directory name uh here uh just need to change the directory name it should be DB net dividend is used for detecting the text from the image and also we need to divinate and also we need to change the path here VD net to DB net actually this was a type of uh now we'll check now we'll check the output for the image too yes you have got the masking output but the masking quality is not so good we can improve this quality by adding a single line beside the code so we just need to add here in the same predictor just need to add one parameters multi marks output should be false then definitely the masking quality will be improved on the detected text is you can see the masking Mass quality is very good and it's improved but here I want to show you one thing here you can see the bounding boxes level is none we can add these recognized text here so how you can do it we just need to make this skip level it's true it it should be skip level should be false skip level because and levels levels is the recognized text so here you can get this recognized text and list of texts is here so we can put it by a product by counter so now we can assign the counter variable over the loop read here and we can take it for image 3. foreign maybe there is some issue you need to check the detected text fast so basically labels is the least here so we just need to add this and you can make it list hope it will solve the issue to display the level of the bounding boxes uh we can check it uh yes finally we got the bounding box of the detected text and marks on it we the respective level so it's really fantastic basically MMOs here cost detect and recognize the text and same segment anything module put marks on the detected text and we can display it with the level of the bounding boxes which was really fantastic in the next video I will show you mmocr details and also after the next I'll show you segment anything model in details and step by step implementation of Sam and mmucer model in my next two videos and finally we can say for the last team is number six oh nice must think quality here is very fantastic and very good because the taste is a larger taste on the image here we can check the level of the bounding boxes you are getting hello so it was really fantastic if you are interested to watch the next video of my Channel Please Subscribe my channel thank you very much thank you all

Info

Channel: SILICON VISION

Views: 997

Rating: undefined out of 5

Keywords: OCR, SAM, optical character recognition, segment anything model, easy ocr, segment anything model github, mmocr, sam ocr colab notebook, ocr-sam github, ocr-sam, ocr implementation, mmocr gitbub, mmocr implementation, ocr image to text, ocr python

Id: uJ6l91qUox8

Channel Id: undefined

Length: 47min 24sec (2844 seconds)

Published: Mon Jul 17 2023