MMOCR-Optical Character Recognition | Modular Architecture of MMOCR

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone today I'll show you one of the most popular syntax recognition and detection algorithm mm OCR and how we can implement it in pi sum step by step I will show you uh all the things so before going to implementation uh I'll just give you some basic idea about the mm OCR model and the working procedure how its work so basically mmosa is an open source toolbox based on the pythons and EM detection for the text Direction text recognition and the key information extraction from the image so what is test Direction basically it takes detection model or algorithm will find where is the text inside the image basically Direction algorithm uh or Direction model will return some important information like the bounding boxes or where the text inside the image and this recombination algorithm will find or the text recognition model will find what is the test like wild feminist this text will recognize so one of the most important things makes it different from the other OCR model mm OCR model is the modular design because in mm OCR model we can use different types of detection and recognition model for the text Direction and recognition and this modular design basically makes the mm Mercier model difference from the other OCR model here you can see the python command for this different type of detection and recognition model here you can see the list of text detection model like dbnet dbnet ppfc init Max rcnn lots of Direction modules and also you can see here below the text recognition model like ABI net crnn here you can see the crnl model is used for the text recognition text recognition and dbnet model is used for the takes detection so inside the mm OCR inference you can use different type of detection and recognition model listed here so basically this modular design makes the mmosia different from the other hostier model so now I will show you the text detection model dbnet basic idea how it decided where is the bounding boxes of the test inside the image and how it's work basically DB net which is used for the text detection ah basically it takes Direction algorithm we return the bounding boxes of the text inside the image so this differentiable binarization network is a segmentation Network who is perform binarization process segmentation Network can set the threshold for the minorization and it is also simplifies the post processing and enhance the performance of the text Direction so basically here is the image and this DB net uh network will generate the approximate binary map and then bounding boxes will be generated from here so with the DB net we can add ASF adaptive scale fission to improve the performance of the DNA it's basically improve the E scale robustness by fusing features of the different scales adaptively by incorporating both the divinate and a safe or adaptive scale fission with the segmentation Network basically this is the segmentation Network and we add here a adaptive Fusion scale module then it can consistently detect syntax and achieve the state of art result in terms of Direction accuracy and speed on 5 standard Benchmark basically in the DB net uh Network we just add another additional adaptive scale Fusion module to generate more accurate approximate binary map and from where the divinate model will generate the bounding boxes uh of the text inside the image so basically this is the basic idea how a dbnet model detect that text and now we can go for the step by step implementation and basically I will show you here during the implementation of mm OCR model I basically use today DB net model for the text detection and ABI net for the text uh recognition here we will use DB net and here we use Avi net for the txt recognition algorithm or model and now I will show you how you can implement it in Python during the implementation step first you need to open the python terminal and we need to install some important related packages to implement the mm OCR model so first I'll show you the steps to implement the mmocl model clone the mm OCR from the GitHub they need to install torque centers vision and then in the third step we need to install uh basically mm engine mmcv mm detection mcls and mm OCR and then in fourth step we need to create two directory for the detection and the recognition because you use different to model one is for the text detection here we use dbnet and for the text recognition we use Avi net that's why inside the weight directory we need to create two directory uh for the module weight inside this two directories and finally we need to download these two model 08 from the link GitHub Link in the python first we need to clone the mm OCR from the gate so just need to copy this line and paste it in the python terminal let's take little bit time to download so it's already done and mm OCR directory created inside uh to initial torch and transmission we just need to Simply type 0.1 version so you can see here the requirement already satisfied because I have already installed it and then we need to install Source vision zero point [Music] 13.1 so here you can see the requirement already satisfied uh then we need to install this one by one basically I will give all the command in my video description box so from here you cannot use it Recon already satisfied I have already installed all of this package so just you need to copy one by one and paste it in the python terminal so it's very simple requirement already satisfied just you need to one by one copy and paste it here [Music] follow the packages I have already installed that's why here you can see the requirement already satisfied you just need to install one by one all of this and finally we need to create two directory [Music] so inside mm OCR we need to create two first you need to create words directory [Music] it's and inside the words we need to create another two directory one is DB net and another is Avi net and now we need to install the module weight for the Avi net and DB net so just we need to copy another two line from here you can just copy and paste it here or another way you can download you you just need to copy this link and then open the Google Chrome and just paste it here then model where it will be download here then just need to move this module uh with file from download directory to the respective project directory so I have already downloaded this uh that's why I need to cancel so another is Avi net you just need to click the you just need to copy the link and paste it in the Chrome then automatically download the Avi in it which is used for the test recognition so from the download directory you just need to move this to the respective Project Director in the python so I have already downloaded all of this two model words that's why I cancel it and I'll show you how you can copy and paste it paste it in the respective directory in the project so you just need to copy this Avi net model weight copy and paste it in the Avi net directory just paste it here and foreign so you just need to copy this and paste it inside the divinate directory so now you can see here inside the ABI in a directory you can see the model weight and DB net director you can see the respective given it model weight so now implementation environment for the mm OCR is ready so just now we need to create a python file then we need to import some important python packages [Music] then we'll import supervision supervision is used for display the image foreign [Music] Polito boxes from mm OCR Dot apis dot inference import mm what's your influence then first we need to set the device device is dot device plus 0 is dots Dot a DOT is available is CPU now you can print the device and run it here you can see CPU because uh I don't have any GPU here uh then you need to import and some then you need to assign four parameters for the module configuration path and the model weight path two model is used one is for the text detection and another is for the test recognition this detection basically return the bounding boxes and takes recognition return the actual test inside the image so we need to assign four variable here one is detection config path config path then another one is detection poet path I said model width that you have downloaded and another two is recognition config path and the recognition model wordpath so just need to change this [Music] recognition configure and recognition with path now I will show you how we can get these four variable path first I will show you the config path for the detection and the recognition for getting this part we need to go mm OCR and here you can see the config directory and inside the config directory you can see the text detection and the test recognition so uh mm OCR basically used to model one is for test detection another is for the test recognition so inside that is detection directory you can see lots of model here DNA definite plus plus Master scene in lots of mod available model here uh basically I have downloaded earlier the divinate plus plus model 08 that's why I need to go inside the dominant plus plus directory and here I have basically downloaded this model wait that's why I just copy this path file path config file path and paste it here and for the text recognition uh I just need to go text recognition directory and here you can see the lots of available models API net or crnn lots of model here I have downloaded the Abnet modulator for the test recognition that's why I have to go inside the event editory and here in this file need to copy this file part need to copy and paste inside the recognition config path then we have to assign another two path duration weight path and the recombination with path I've already downloaded the model 08 and keep it inside the weight directory and here you can see the Avi net and the dominant so Divinity is used for the test detection and the model wait already downloaded inside here so I just need to copy the model with path and paste inside the detection and another one the Avi net model uh with is here so we just need to copy the model weight path which is used for the text recognition and paste it inside the test recognition path now we need to call MM OCR inference so here we can assign a variable mm OCR inference is mm position inference here you need five parameters uh first one is the detection so direction is the detection config path and detection weight is the detection weight path and another two is recognition recognition is recognition config path and recognition weight is and recognition weight is recognition and last one is the device devices device now you can take the name OCR is perfectly working or not and process finished with the exit code 0 so it's working fine and there is no issue and now we need to read the image and send this image to the amocr inference for getting the result uh basically using opencv I will read the image first you read the image 3 and then convert this image to RGB and resize both of the image now we just need to send this image RGB to the mm OCR information RGB image to the mm OCR inference for getting the result foreign [Music] so here you can write just a result is MM posterior inference then just send image RGB first you need to check the result fees result dot is then we get uh here you can see the prediction and the visualizations two keys inside the result so we just need to print the predictions and we can write here just result and first of all uh we need to see the image 3 here you can see two text so we need to check the predictions OCR inference will return the recognized text inside the image irritated polygons recognition score and also the detection score so here you can see the recognized text and perfectly it recognized here you can check it's perfectly recognize the text from the image and also it will return some parameter like recognition score it's very fantastic 92 percent 99 percent rejected polygons and the detection score basically we need the directed polygons and basically we need the deleted polygons and the recognition test for processing further because we need to convert this detector polygon into the bounding boxes and then we put rectangle uh over the text uh first you need to convert these detected polygons into the bounding boxes then we need to draw rectangle over the text using these bounding boxes so it's very simple now we need to now you can keep the petition result like recognize test and the deleted polygons in separate variable so we can write here foreign we can just copy this and paste it here and detected polygons is result and here you can just copy these deleted polygons and paste it here uh we can check it again print it is polygons and print recognize text maybe there is some issues uh uh actually all of these parameters was in the predictions zero let's say we're getting the zero how could you solve the issue successfully recognize the detected text and we have got the deleted polygons now we just need to convert these detected polygons into the bounding boxes so just few line support here to convert this polygon into the bounding boxes detected boxes is [Music] dots Dot tester then here NP Dot array polytube the boxes that I have imported here then here you can write poly or only in detected polymers this directed polygons and we can say it another just device is now you can check the deleted boxes we have got the detected boxes here but in in form of the tensor so we need to convert this into NP Aries and purposes is we have got two separate bounding boxes for these two detected text inside the image now we need to draw bounding boxes uh over the detected text inside the image so we can simply use CB2 dot rectangle command to draw the bounding boxes so we can write here for bounding box and hit it in boxes then X1 [Music] y1 X2 Y2 is ordering boxes then we can simply use this command our time is busier we can draw the rectangle using this coordinate we just we need to convert it to integer then we can display the image we use supervision for displaying the image so SV Dot plot image grid here you can use images is original image and image busier so we can keep the copy of original image here which is original image and image busier to assign the grid size foreign [Music] to set a title one is original image and another one is another one is MM OCR image now you can check it yeah successfully model recognize the text and draw the rectangle over the detected bounding boxes basically in this way we we can detect and recognize the text inside the image you can display the recognize text over the bounding boxes because we have recognized that it successfully here and we can display this over the bounding boxes for this we just need to add one more line uh CB2 dot put text so it's very simple image wizard similar to text and here's the recognize text uh we just need to initialize the counter over the loop we can set the counter variable is 0 and here we can increase the counter variable in this way basically from this recognized test we can display the label over the bounding boxes basically this is all about the mm OCR implementation in the python gives us the flexibility to choice different types of text recognition and text detection model basically the modular architecture of the mm OCR module gives us the flexibility to Choice from the listed detection and the recognition algorithm for the text detection and the test recognition so here you can see that inside the moisture directory you can see a config directory and here uh takes Direction in the text recognition insert these two directory you can see lots of model is available here DBA dividend plus plus Master CNN for the test detection and for the text recognition Avi net crnn in RTR lots of model available model is here and basically I have used here API DB net for the test detection for that detection and EBA net for the text recognition if you wish to use different model like if you want to use mass or CNN for that is detection you can use it or for the text recognition if you was to use crnn then you can easily use it but how how you can use this model basically if you wish to use Master CNN for the test detection then you have to select detection config path here and you have to download it the mass rcnn moduloid from the GitHub but how you can download it just you need to search here mm who's your GitHub then you have you have to click the link and just scroll down and here you can see a model 0 and test detection lots of available model here and for the test recognition you can see here lots of everyone model here with Avi in it and here is the dividend plus plus basically dividend plus plus is the latest one and here additive scale Fusion applied on the DB Network and it gives us the more accurate binary map for the text Direction inside the image that's why I have used it but if you want to use Master CNN for the test detection then you can just click in the link and if you wish to use crnn for the test recognition just you need to right click on this open in new tab and here format the mass Arsenal for the detection you can download any one of the listed model and wait here and you have to assign respective in detection config path according to the model name downloaded model weight name similarly if you wish to use crnn for the text recognition you just need to download the module weight from here and you have to carefully assign the model config path from here you just need to click right click here copy the path and paste inside the model config one so basically this modular architecture gives us the flexibility to Choice various text Direction algorithm inside the mm OCR inference and basically this is all about the mm OCR model which is used for the opera character recognition from the image and it's really fantastic because of its modular architecture if you wish to watch my next video from my Channel Please Subscribe me thank you thank you very much

Info

Channel: SILICON VISION

Views: 1,531

Rating: undefined out of 5

Keywords: OCR, SAM, optical character recognition, segment anything model, easy ocr, segment anything model github, mmocr, sam ocr colab notebook, ocr-sam github, ocr-sam, ocr implementation, mmocr gitbub, mmocr implementation, ocr image to text, ocr python

Id: Snyu-o8ZdDk

Channel Id: undefined

Length: 30min 12sec (1812 seconds)

Published: Wed Jul 19 2023