Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi there and welcome to this new and exciting session in which we'll see how to extract automatically this kind of table into an excel sheet so if you've ever wondered how it's possible to programmatically get this table and transfer it into an excel sheet like this then you are the right place as we are going to make use of the python programming language together with the paddle ocr library which contains text detection and recognition models which we are going to use to extract this information and then later on structure it out such that we obtain the table we have right here don't forget to subscribe and hit that notification button so you never miss amazing content like this we are going to start by looking at the different steps we are going to take to produce our pdf table to text extractor the very first thing we will do is start by detecting the table from the pdf which is in the form of an image like this so come on detect the table in this image because all in this page because we have several pages so we will go page by page and then detect each and every position where we have the different tables so you could see here you see this table that's been extracted this other table which has been extracted and then from here we are going to move on to carry out text detection so once we've detected this table you see this table here we will now detect the positions or the locations of each and every text in this table image right here so you could see we detect that there's component all right we detect that there is this image or there is this uh text at this position there's text at this position and so on and so forth and then from here we move on to text recognition so we want to recognize the text which has been detected that said we get here and then we recognize each and every text we have here now you just see from here that we have selected this line you see beverages alcoholic and an alcoholic the the fct is called updated fct and then this difference so at this point we've already recognized the text we've detected and recognized the text the next thing we want to do is bring out this text in a way that we could represent it in some excel table now to do this we are going to make use of our table recognition or rather table reconstruction model so for table reconstruction what we're going to do is we're going to draw some let's say we have this lines here you see we have this horizontal lines for each line and then we have this vertical lines for the columns now at the intersection of this lines we are going to get the text which is found there and then reconstruct this csv file you have here so what this means is after carrying out the detection and the recognition we know the exact positions of each and every text and then we make use of this intersection to compare this intersection position with that of each in every text we have here to be able to get the exact text which is found at this location and then be able to reconstruct our excel or csv file here so you see data sets extracted this extracted this this line here you see there this this row and then the next row reconstructed and so on and so forth so we see how we are able to get from this here from this uh initial image of this initial pdf page to this excel which is right here now let's go ahead and see how to bring this project to life the pdf we shall be using as example here will be this one journal of full composition analysis so here we have this five page pdf here you see five page pdf and then we're gonna see how to extract those different uh text in this tables so the first step we'll have here will be to convert each and every page here into an image so this will be an image there's an image uh this this and this will be an image now to convert this pdf into the image we're going to make use of this pdf to image library right here so here we go i have the installations pip install pdf to image and then we're going to install popular utils which will be useful in making this pdf2 image work so we'll simply run this and that's it without the installation the next step will be to go on to convert this pdf we have here into the different images so we have from pdf to image which we've already installed we're gonna import convert from path so we have this uh image here this pdf here which is on a given path and then let's run this and then we're gonna take the images from there so we have images equal convert from path and then we simply specify the pdf path so there we have test.pdf we run that that should work fine uh convert from path not defined we have convert from path okay so that should be fine and now we have our images we're going to store those images in a given uh directory so let's make that directory we just have make dar and then let's say we have pages so we make the directory and you could see that here see we have this pages directory and then we're going to store those images in our directory so we have for i in our range lan images basically we're getting all the different images we'll simply do for every image we'll save that and then we'll specify the path so here we have pages and then we have the page so we'll give it we will give this uh a different name making use of this i right here so here we have the page as we said and then we'll have we'll add this we attach that to the page name and then we'll specify here that it's a gpg file now um our encoding scheme is jpeg so we have jpeg okay so we run that and everything should be fine that's great we open this and this you see page zero oh that should be fine let's open this say page two let's open up page zero okay so that's it you see we have this image now we have page two that's it and there are uh uh in a similar manner too so you see how we've left from this here this pdf to the images okay so that said we now move on to the next step which will involve extracting the tables from those different images in this next step we'll make use of a deep learning model which has already been trained on a very large data set which is the pop layout net data set now the way the different data points in this data set look like is as we have here we have an input image like this one and then in the output or as output we have different um locations for different elements in this image so for texts you see here text are in blue and then the titles are in purple the lists uh we have no list here uh the tables you see the tables and right and we also have the figures uh which are not here there's no figure here so uh basically what this deep learning model uh is gonna do is going to learn from hundreds of thousands of these kinds of images and the corresponding outputs such that when now given a new image is able to detect that there is a table found at a given position in the image right here on this ibm platform you could see this uh pop lay net description and you could get a sense of the number of um and you could get a more precise idea of the number of images used so you see here we have uh 358 thousand images uh broken up into training validation testing uh then you see the different sizes this is um 102 gigabyte now let's check out this paper this is the the official paper and you could see this descriptions here so here let's scroll this way here you see how the pdfs of this image has its layout presented to you and this layout now can be used to train a deep learning model now if you're new to the topic of deep learning you could feel free to check out our platform where we have several courses on uh deep learning and applying it to solve real world problems you could also check out our youtube channel by clicking on the different links we have here in this free playlist page now the trained layout detection model we'll be using will be from the paddle ocr project which you can have on github so here we have several tools for ocr and one of them is the layout detection um we should check somewhere here we have layout you see oh you have layout information about the layout detection and as we'll move on in this uh course we'll see how also to do text detection and text recognition while making use of already pre-trained models available on this parallel ocr project so that said let's go ahead now to see how we can extract these tables from the image which is from the pdf that said we'll go ahead and install our layout parser from the padle ocr tool so we have that and now before we move on we're going to get back here to this github repo and you'll see we have this quick start where the have this image defined so you read the image and then you actually build the model and from here you detect the different or you detect the the layout or extract the layout from this image now you notice that here we have this configuration path whether specify the model which is being used it's here ppeolo v2 r50 which stands for uh res net 50 for those of you who uh have already some background and deep learning and then we have this threshold now the the reason why we have this threshold is because the model detects a table like this one with a given probability and so when we have a threshold of 0.5 we're saying that if a table like this one is detected with a probability of greater than 0.5 then we'll consider that detection valid whereas if it's less than 0.5 then would not consider that detection valid so that said we get back here we have this level map to specify the different uh levels we can have at a text title list table or a figure here if we to use cpu by default this is false and then here we have the math kernel library by intel which is used for the acceleration of deep neural networks dna stands for deep neural network and mkl stands for kennel library so by default this is true now talking about the the mkl dnn the mkl dnn uh which as we said is matt canon library for deep neural networks is used for accelerating deep neural frameworks and talking about the frameworks the specific one we're using here is paddle paddle now or maybe you may have been used to working with other more popular frameworks like tensorflow or pytorch but you could check out paddle paddle and most officially this padle ocr repo which has many pre-trade models readily available to be used so that said once you define your model once you load your model the next step will be to carry out the predictions and from here we could go on to extract just the tables so we'll just get this here let's get this and then paste this out here okay so that's it um now our image we have in here we're gonna take the speech tool take this page tool uh basically you could do a follow-up to go through each and every page so you could always do that so here we have this page tool and then um let's copy this path so copy path and then there we go we have this path here specify and then uh this pre-processing right here once this is done we go ahead then to pass this in our model so as we've said already we have a deep learning model here which has certain parameters and then doing the training that's after this model has been trained with about 358 000 different inputs and outputs the model now learns to get an input and predict reasonable outputs and that's basically what we're doing here with this uh model.detect so this model has already been trained and so from here we'll just run this so we're getting no model name paddle ocr so we'll go ahead and install this now another installation we'll go ahead and start by installing the powder padle the gpu version we'll install powder ocr protobuf and then we'll clone this paddle repo so let's run that now we have this installed let's get back here and we run this we run that we now have our layout generated you should notice that of this model was downloaded here and it's about 221 megabytes and then we have our layout so we run that and this is what we get now you'll see that here we have this layout right here and then we have information like this one here you see block the text block rectangle and then we're given the exact position of this text then as we move you see we we we see type text then the next text block again then the next text and uh let's just search for table so let's search for table okay so that's it so here we see we have type table now this type table you see the score 0.97 and then we have its position there we go we should have this this this actually is position so that's it this is position and we should have another table block so if we search again for table block table let's search here you see we have this other one right here now to verify that we actually get a table here you see we have this type and then we have its coordinates here so let's open up the image that is the page two as an image see it's a jpeg and then um we could get the image height and the image width so if we if we get to this point now you could check this out here the coordinates so first thing you should notice the the if you have this image this position right here is a position zero zero so the top left corner is a position zero zero you could you could see that here as we move we go towards zero you see zero z zero six anyway just not that as we here that's zero zero to obtain the the width of this image you see we start from this position and then as we go notice how this um coordinate here the first coordinate is going to be changing while the second one is going to be somehow fixed so you see as we move that coordinate changes it increases up to a thousand six hundred two and then when we at this position and we instead move downward the second coordinate is instead increasing and if we get right here as we move right here you see we get up to 2198 so we now get the image width and the image height now to get the coordinates of the bounding boxes off a table you see we just need to get to this point about this we have about a hundred you could check this out here so we have about a hundred and two 110 206 and then here uh we are about or 1556 710 okay so let's go back to the code you see here 102 218 1547 715 so it's approximately uh the same as what we had to check this out 1547 715 you could get this location uh this 1547 we go to the left 47 and then 750 will go slightly up around this so we see that with this uh coordinates given to us here we are able to locate this table in the image and this location is based off the is based on this year that is based on this let's take this off is based on this position this top left corner position and this bottom right uh corner position so we make use of those two positions here this position and this position to locate this table in our image okay so we now know how to locate we and we've actually seen house garden and here we are so we have our layout and now we have uh two tables as you can see now we will just make use of one table the the order table so let's search again table could check out the order table um table this i know we haven't seen this this is the second table so we have 850 to 844 get back here you see 850 you could check this out here still 850 to 844 about this uh go down 844 52 slightly to the left okay so we are about this see that is the position and then if you want to get this top this bottom right position we about this 1550 1770 there we go see one five fifty one seven sixty six approximately one seven seventy so this is our second table right here and clearly we have our x mean so here we could call this our x mean um if i let's let's just copy this from here simply so we could copy this from here as well as we could extract this automatically so we have your we call this x1 so our x mean so let's just have that x1 and then we have y1 and then we have x2 and then we have y2 so all this is gotten automatically thanks to this r paddle paddle model now we have x1 y1 and then x2 y2 the next thing we'll do is we are going to go through this full layout and select only the tables so for l in layout if l uh type if l type is equal table then let's print out l so you could see this here see here because we have two tables we have this tool we'll print it out these two text blocks printed out actually here we have l dot block we pin out the block this time around we run that and you see we have now our rectangle which is x1 y1 x2 y2 so let's divide a means of automatically getting this value as we've seen right here now what we'll do is we'll just uh let's would we only consider this first table this table here or this the to be specific this table here so what we'll do is we we just do a break so from here we have break and we just print only this one uh no we should we should do a break once it gets the first not after looping the first time so let's run that and that's fine um now we have this we're interested in getting this coordinates here so to get this let's do l block and better understand this so here we told let's type rectangle and then we given the arguments x1 okay so yeah it's arguments actually so what we could do here is simply do x1 and we get the value for x1 so you see that's it okay so that's set now we have x1 which is equal l block x1 and we have x2 x3 and x4 we run that um has an attribute x3 oh this is x1 or x x2 y1 alright let's better say x1 x1 y1 y1 and then here we have y1 let's copy this and then take this one off so here we have that um x2 y2 and then your y2 y2 okay there we go let's run that and then in here we're gonna print out x1 um y1 x2 and y2 so we get that automatically okay now that we have this we're gonna read this page here so let's copy path i'm gonna use um opencv to read the speech so we have m cv2 in read there we go and then once we read that page we're going to crop out the required or the selected region we're going to crop out just this region now what we'll do here to crop this out we just have uh image or let's just let's just uh save this so cv2 in right and then we'll save this as um extracted image so let's just say x extracted image dot gpg and then this extracted image is this our image but then we're going to select certain portions so we go from um y 1 y 1 to y 2 and then x uh one two x two so basically what we're seeing here is because uh we start with the with the height dimension if we're going to the width dimension what we're saying here is we're going to take our pixels from a given y to another y and from a given x to another x and then when we run this we should have um the okay here it doesn't take floats so we need to convert this to floats um that said what we could do right from here is simply have this ins there we go and that's fine so we'll run this and then from this here see we have now ins run that again and that's fine okay so let's go ahead and check out our cropped image um extracted image and that's it you see we've left from our pdf to this extracted image so now we have the table and we're done with the first part which is that of the table extraction from here we are then going on to the text detection and recognition phases so in this text detection and recognition we're still going to make use of the paddle ocr and it should be noted that this paddle ocr comes with several text detection and text recognition algorithms so here we have east db sized pce net fce net and for text recognition we have the crna resetter and all those other different deep learning models which you can either use directly or you can train with your own data set now given that the kind of data we're dealing with your which is this kinds of inputs or let's let's just get this from the collab this kinds of um images we're dealing with your uh very common the texture isn't very different from what you'll find in most books uh we are just going to make use of this model so now we'll go ahead and import paddle ocr and draw ocr from here we're going to start with defining or creating this uh paddle ocr object right here so we have ocr which is uh paddle ocr which is this year which we've imported could simply have this we have that paddle ocr and then we're going to specify the language so we have that now you now that you have this we run that we have the ocr and then you will notice that there is this downloads so we have this a very light but very high performing models provided to us uh by paddle ocr so you see we have the detection in france you have the recognition in france and you also have the class or rather the angle in france so here you can see we have this detection in france and recognition in france uh we're not going to make use of this order one year we're interested in just detection and recognition so once we have this tool um and we've loaded the we have now uh ocr engine ready we could define the image so let's say we let's get this image from here let's copy this we have let's call that uh image let's define this imager image cv and then we'll do cv to him read and then there we go let's just put this image pad right here so we have our image path and then press that out and we have your image pad so we read our image and that's it so once we have that now we could get the height in height and let's just say in height so we have our image height which is this image right here um take the first um dimension and then we we get the the width there we go we have image cv we get the width okay so once we have this information we can now go ahead to obtain the results so let's run this first and see what we get because we've already downloaded that the different models you see it's not downloaded again now we could print out the image height and then the image width okay so we see that we have our image height and our image width now from here we move on to get the results so we have here from let's let's just put this out here we have result um or output out put um is equal ocr this ocr we have here and then we're going to call the ocr method which takes in the image path okay so we have that uh we'll run this now and then we will can close this up okay so we have that we have our output let's print out the output see there we go we have this output with the different locations of the bounding boxes and the corresponding text so just as we had said here in the slides you see it's going to detect or locate each and every text and then later on let's scroll down here's our diploma model in the middle later on it's going to now locate uh it's going to now give us exactly what um words or what sentences we have in each located region so that's it um let's get back to the code and then what we'll do is we'll take output let's get a shape see we inspect that shape that's at least okay let's take the the the zero value here get a shape still a list um anyway let's let's just get the zero value you see here we have the box then we have the text and then we have the probability so let's open this up right here you should see that component you see component so we've got we get the the position and then we get the text and then we get the probability of getting of that uh box being there so we have that we could um simply do this so we say for i in range length of output or we could just say for i in output for out in output we could get out okay so that's it you see here we now have the different uh boxes different boxes with the text and the probabilities so from here now we could say okay we want to define those boxes which is going to be this zero element right here and so for every line for every line in our output every line that we have in our output we are going to get that zeroed element okay so for that so this would be our boxes now uh we could run that and we print out boxes so you get to see what we have there we go you see we have the different boxes now we only have the boxes here we have this uh let's let's let's print out uh for i in range land but for for box in boxes we could print out box so it's a bit more clearer okay see we just get only the boxes so with this we could get only the boxes now the next step will be to get only if this were output remember we have to get also the text so we want to get the text here we go you could also get the probabilities so let's let's go ahead and we we get this text to get this text we will have uh text um equal we get the line and then instead of the zero this zeroth index instead of the zeroth index we move on now to the next index which is the first index so this index one so we move on to this next one so we see here peak one and then in this next one we want to get this zeroed uh index here so here we have one zero and that's it and then from here uh still procedure for line in output that's it then the next thing we want to do is probabilities probably leaders okay so that we you should have guessed that right you get to this one you get to this and then you get now this one so this is box um line so you have line one and one for line in output okay so now let's let's get the probabilities probably really this there we go see we have the different probabilities and then let's also get the different texts we have texts there we go we have the different texts so now we could get the the boxes the text and the probabilities separately that's it let's shut that up and then we move on to the next step from here we're going to draw this so we'll define our image uh boxes which is going to be our image cv and then we make this copy so we run that and then the next thing we'll do is we'll go through each and every box uh for box in boxes we are going to do cv2 rectangle where we pass in our image boxes and then we'll pass in uh different coordinates of the given box so like here we'll have box and then the way we get this coordinates is by looking at the way this box is here was constructed so a single box like this one let's highlight the single box as you could see has this x y x y x y and x y now to be able to locate this on the image if you open uh let's download this image and open it up so we could be able to locate this so there we go we have this open up now okay so to to to locate this uh let's look at the component so here for example would have to reduce this slight uh downward slightly um there we go okay so we have this here you you should you should monitor this here so let's reduce this again um reduce that and there we go so monitor this so you could get those coordinates you see here we have about 15 10 so it's uh this and then when we move here we add 11 11 13 around this move this way um 127 about this and then we move this way we are 1931 around this so this means that the way a box is located is by going in the clockwise direction start from the top left we move to the top right then to the bottom right then finally to the bottom left now to construct this bounding box or to draw this bounding box with opencv what we need is this top left and this bottom right so what we'll be using here will be this one and this one okay so that said we'll just do box zero let's pick this and then we pick this zeroed value so box zero zero let's let's have int i'll convert that to integers box zero zero zero and then we have oh no that we we have int oh let's just copy this then here we have the box zero one okay so we have that and then from here we could also do the same thing we've done here so we have this here there we go oops let's close that pass this out and then here we move on to this one so this is box zero box one box two box three so index zero index one x two and x three we're interested index zero and index two as we've explained so here we will do two zero and then two one okay now that's fine the let's rearrange this let me show that everything works well here okay so we have this and then we follow on with this one oh looks fine okay the next thing we'll do now while we're having this the next thing we'll do now is uh specify the color so here we specify the color you could change this and then we also specify the thickness of the boxes so we close that so once we have this now we could run this this should be fine uh no we we should have your box we should have your box um for box there we go um that should be fine an integer is required got it to pull so let's check this out oh actually this is boxes uh image boxes actually this is image boxes because this is our image on which we want to print out this different boxes so yeah all right there's different boxes so yeah we have that we run that again uh that's fine okay so let's do cv2 we write we write our image we specify that we want to have detections and then we have image boxes that should be fine so we run that and that's true detections you get to see the detections it's here it shows us how well our model is performing you see it detects all this okay so now we've we've had this detections and that everything looks fine the next thing we want to do is to add the text so here after adding this let's go ahead and have this uh cv2 uh put text method which takes in the image uh boxes and then it takes in the text so we have our texts and then for each box we have a given our specific text so let's go back here and we take i so we enumerate we could we could do uh we could do box or box and then we have text in box six all zip zip boxes texts and that should be fine so here we have the each box and each text so that that'll be it we get the box we get a text and then we now uh get see see this new our detections and then we specify the location for the text the font the thickness and we have the color okay so we have that uh let's run this and see what we get that looks fine let's check this out see here our detections components energy water product and so on and so forth see it shows us our different uh detections and the text which has been recognized now given that we've done with the detection and recognition we can now move on to the text reconstruction so in this reconstruction year uh let's look at this one uh this reconstruction year we're trying to make use of the detections and the recognition recognized text to be able to generate a table like this one which we could open up in an excel sheet now the way this is going to work is we are going to do create this vertical you see this vertical lines here we have four vertical lines and this horizontal lines based off the different uh detections so the way we create those horizontal lines is for each and every bounding box like let's take this uh one and put to the side for each and every bounding box we're gonna extend it right oops we're going to extend this right to the edges so we take this one for example and we extend it right to this uh that was it this next one we're going to extend it you see we extend this and then this other one here we extend it we have that we extend that this other one we also extend oh there we go so you see after extending this and then let's let's pick another uh another line let's say for example this line in fact we're gonna just note that we're gonna extend each and every box here let's we could pick this one this one looks uh we have only three here so we go faster so we extend this to the end we extend this to the end we take this one we extend this oh there we go we extend this one we extend there we go okay so now we've extended all these different boxes so we will repeat the same process for each and every box we have here now what you will notice is that boxes which are in the same line would uh occupy a similar region so like you see here if if you do this if you bring this here and bring this here it doesn't really look like we have three separate lines here whereas we actually have three separate boxes not lines it doesn't look like we have to reciprocate boxes whereas we actually have three boxes in here one two and three for these three different regions and so the algorithm will make use of now will be the non-max suppression algorithm to remove this one and this other one so we'll be left only with this and then here we'll also remove this one and this one and see this one so we left only with this see that's how we we we we go from all these bounded boxes to just this one and then for you this one see would have this here let's let's uh extend it to the end see when you extend that you now also take off this other boxes let's take them off oh there we go would have also this other boxes taken off okay so what we left we will be something like this and we repeat this process from the top to the bottom up or just in general we'll repeat this process for each and every box now the way the non-max suppression algorithm works is when you have boxes which are around the same region let's let's take for example let's make the simpler let's um there we go let's have this here so let's let's take all this off and then we have a box let's have this box here we have this box and then we have some other box pieces out and we have some other box so when we have boxes like this which are very close to one another like these two boxes right here we can make use of their probabilities are two remove one so like here if we have this let's add a third one let's add this third one here paste that out and let's say we have something like this okay so this third one you see this we have this box this box here let's pick this box here um this box see yes is probability this other box here it has this probability and then this other box it also has its own probability so uh what we're gonna do is we're gonna delete this box delete this other one delete this one then delete this one so that we're left with only this box which has the highest probability of all those other boxes now another question you may be asking yourself is based on what criteria do we suppose that three boxes are close enough for you to start deleting some others and leaving uh the one with the highest probability the simple answer to this is uh by making use of the iou or score so let's take this off and then we have these two boxes let's have let's reduce this so we could take two examples so when we have these two boxes here um let's take this this way let's reduce [Music] there we go okay so we have two boxes like this see each one with its own probability though for the i o is called we don't really need the probabilities so we could take this off so we could take that off we have these two boxes let's now have here and then let's put this one this way then this one does a difference and here we have something like this there we go so you see that this this box this box and this box may come from two different lines and so that's why you see the difference the the the the space between or the intersection of these two boxes is quite significant as compared to the intersection we have here so this intersection is significantly smaller than the intersection between this two boxes right here see this intersection see this one is greater than this but then it doesn't suffice to just or compare the intersections because if you have a very large box here and a very large box the intersection may be greater than this so what we're going to do is we're going to make use of the iou's car now the iou's car i o u let's have that here we have this i o u i o u is equal the intersection divided by the union so we get the name intersection over union so take into consideration this union will help us in knowing how close the two boxes are so you see that if two boxes are very close like this the iou will be greater as compared to this box here because this uh ratio of the intersection to the union for this is smaller as compared to this ratio for the intersection and the union with this other box right here and so that's how the non-max suppression algorithm see the term known max suppression makes use of this uh let's copy this separately makes use of this iou to know which boxes can be deleted okay so we now understand how this numx operational guardian works let's go ahead to create these boxes then delete those which we are not going to make use of such that we'll be left with something like this but before we move on we'll also look at how the same process is repeated to get the vertical lines so here what we'll do is we will go again that's as we had before with the horizontals we go again through each and every box and then extend it horizontally uh here we're gonna carry out our extension vertically so like here we extend this see we extend and there we go we take this other one we extend see we extend and then we run the non-max operation which will group together all the boxes which are close to each other and then delete all those which are not uh which don't have the maximum probability score so we have this and you could get right up to this final one here so we see that now we reconstruct this vertical line of this vertical box this is very called this vertical and this vertical and then what will be left weight at the end will be something like this so if you take this here to be a vertical box this about a vertical box this this and this and then all this vertical or horizontal lines you see we would have something like this now once we have this you see we are now going to reconstruct our table so let's go ahead now and uh put out the code which will permit us come up with this uh sort of greed uh cells right here get into the code we're gonna define the horizontal boxes list so we have your horizontal boxes and then we have vertical boxes okay and then we're gonna go now for every box in our list of boxes we're gonna extract the the different boxes and add into this year so the first thing we have to do is we define our x horizontal and our x vertical now what's going to be our x horizontal x vertical if we get back here and we take this example let's suppose we have this box are we interested in getting the x horizontal let's copy paste so we would have this let's change the color this will be for vertical so we we have this box here you see and then we want when we extend this to the end extend this here to this end as supposed to be n here uh you see the x coordinate which was here has now moved backward but what's important to note here is our x now will always be zero recall from here if we have this box here if we have oops let's get this back anyway let's say we have a box here now if you want to get or we want to push this to the end then the x-coordinate here will be zero you could check this from here see the x-coordinate is zero see that zero now uh for the the x if we wanted to get the vertical line so let's get back here if we have this and put this here okay so if we extend this here like this extend this right to the end suppose the end is here now if you want to get the x vertical that's extended right to the end you would find that it's going to maintain its same x coordinate so these boxes you see the initial boxes we had is going to maintain the same x coordinate because the distance from this year from this edge here to this x coordinate remains the same when we extend this vertically but when we extend it horizontally this distance here you see the bounding box moves from this the x moves from this to this year so this means that in the code we're gonna have our x horizontal which will be zero see and our x vertical which will be the same or value of x we've been working with so far so that's it remember if we have a box let's say we have boxes zero you see remember here this is going to be our x um on our y and then this is gonna be our x and our y for the bottom right position so here we have box 0 0 to get that and then we move on to y now for y we have the y horizontal and the y vertical let's get back here um the if if we take this let's let's move this a little anyway let's just leave it this way so what we're saying is we have this uh horizontal oops we have this horizontal here now what has the question is what has changed in the the vertical position you see that before we had this box which was here see before we had this box here now after extending you notice that the horizon the vertical position hasn't changed so the y position doesn't change now if we get back here you see our y or horizontal remains the same so that said we're going to still have int box 0 1 so we still have this same value and then for the vertical our y vertical let's get back here our y vertical now we'll move take this off our y vertical given that we had a box which was initially this see the balls was essentially this now when extended to the end to the end like this the y-coordinate has moved through the whole image and so our y vertical here would be the the the the height of the image so here we're gonna have um let's get back here we're gonna have the image height so remember we had to find the height previously let's get back to get the exact uh yeah we had the image height okay so we have the image height not just height so here we have image height okay so that's it we understand how to get the x coordinates the x values and then the y values then from here we can now get the widths and the height so after getting the x and y uh coordinates we now get the widths and the heights let's start with the horizontal again let's shift this this way and then we get back here so now we've we've extended this to this end to get the weight of this box you see the width of this box is simply the image width and then that will be it so here we have the the width the width for the horizontals is going to be the image width and then the width for the verticals the width for the verticals is going to be uh from here the width for the verticals we could shift this this way get this back so initially we we have this box now we have this vertical and we're trying to get the width of the vertical the width here will simply be the difference between the x mean the minimum x that's the x for this the x at this position and the x at this other position so we get this difference so if we if we have this x at this position we can get the x at this and then subtract to obtain this with vertical we get back to the code uh to get this we just simply have int we have box see we have no this is two so because we only want to get this x subtract with this to obtain that so we have zero there we go um minus box zero zero and that's it that's how we obtain the width for the verticals now once we have this we move on to the heights so we want to get the heights for the horizontal and then the height for the vertical now for the height here this will be the image height let's let's take this off and then we take this off okay so as we're saying this uh height for the verticals if we get back here the height for the verticals we have the vertical the height is simply the image height so that's easy we just get here and then we have image height there we go oh and then for the for the for the the height for the horizontals we shift this and get back to this one the height for this horizontals is simply the y-coordinate this y-coordinate this first y-coordinate and then the difference between this first and this last y-coordinate so we get get this difference here so to obtain this uh last we just simply have something similar to what we had already here we have box 2 1 minus box um zero one so that's it you see we obtain that height now the height for for the horizontals and then the the height for the verticals so once we have this set the next thing we want to do is construct our horizontal boxes so we just have um there we go we have our horizontal boxes that append now we're gonna append the x horizontal the y horizontal and then the x horizontal plus the width see um for this for example since you know you already have this coordinates here at this point you so now so to obtain this coordinates here at the top at the bottom right corner you just suffice it to get the x this one plus the width and then the the y plus the height so you shift to the right and bottom so the bottom right and you end at this point here so with that you see we have this here see x plus height uh x plus width sorry and then we have y horizontal plus the height you see that's it um there we go now this is height horizontal so we should be careful with that height horizontal and then here we have uh width horizontal there we go now i think we should change this here so this is um width width and then here we have height and then height okay so we have that okay so we have the the the x y um x plus w x plus with and then y plus height okay so that's it we have the set the next one will be for the vertical so we have vertical boxes there we go we append that and then we do the x vertical x vertical the y vertical obviously and then we get the x vertical plus the width vertical and then we do the y vertical plus the the the height it's actually the height the height vertical so that's it height vertical take this off there is the x vertical plus see x vertical plus the width vertical okay so we have the set now we have the horizontal on the though the vertical boxes and that's it so we could simply run this here that's it and you see now we could we could have uh this let's have this here let's say we want to have horizontal boxes um zero see for the same box he has been transformed to this horizontal line c zero thirteen six ninety eight thirty one now be interesting for us to be actually to be able to actually uh visualize this so what we're gonna do here is we're gonna have this image so we'll define let's get back here we have this horizontal boxes here so let's add this so we'll call this image and then uh the way we'll define this is we're gonna have a copy of our image of our initial image so we copy that initial image there we go and then let's run this and then what we have here is we apply the rectangles so rectangle we have the image oh yeah that's it and then we have x h y h there we go and then we have x h plus w h basically this here so let's let's just copy this we're going to have this and that will be it okay so we copy that oops um is that that should be fine so we have x or y x plus w y plus h and then we will specify the color so let's say zero two five five zero and then uh the thickness one we repeat the same process for this next one for the vertical boxes actually so here we would have um this x v so this is v y v then x v plus the width v yv plus height v now let's change the color so you could see that difference clearly um that's a variable two two five five and here oops anyway we should have zero two five five okay so we have that set now let's run this and see what we get see that's our image now let's go ahead and do um cv2 in right um let's say horiz vert gpg and then we have our image we run that and let's check out what we get for our image this should be fine okay so let's check out our revert and what do we get here okay so you see we have the horizontal lines but we don't get the verticals so let's get back and be sure that everything works well um so what we do here is we're going to comment this and visualize this again you see we have no box let's look at this we have no box um let's increase this here let's increase this thickness see what we get still no box but we see we have this here this means that there's a problem with the coordinates now if we check at this this looks fine the image height the width of v is this looks fine and the yv um your this shouldn't be the image height this should be zero reason being that when we get here for a vertical box like this one to get the y coordinate vertical it should be a zero so given that you extend it right up to the n then the the the first y coordinate this card in the year the y should be a zero so that should be the problem let's run this again and check this out okay so you see we detect those now let's reduce this and then get back right here run this again now let's let's get back here just copy around that check this or is word and that's fine so you see now we have the horizontal and the vertical lines now what we'll do is we're going to apply the non-max oppression to take out the the lines like here to leave out actually only a single vertical line you're only a single vertical line and you're only a single vertical line so let's take that off and then we move on to the non-max operation now that said we're going to use tensorflow's or non-max operation method here we specify the horizontal boxes so all you need to do is specify all this horizontal boxes there we go one specified the boxes you give the probabilities because we're making some of the probabilities to remove uh the other boxes we don't have the max probabilities and we have this iou threshold so this is uh from here let's get back here if you recall when we explained the non-max oppression we're gonna suppose that two boxes belong to the same line if the iou threshold is actually uh greater than 0.1 so that said would get back here and then uh let's call threshold or negative infinity okay so let's let's just have this and then we run it and then print out our horiz out so let's get this let's print this out let's let's print that out take this off see what we get uh you see that we have a length of 37. now what we'll notice is when we open this up this length of 37 should coincide with the number of uh lines we have here see we have one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen ended at calcium uh 19 22 25 of 28 uh 31 timing uh 34 37 okay so you see we have 37 lines and that's why we have this here now so this represents each and every line so you see this representation every line now what we can do is we could sort this out and so would have horiz lines which is np sort i'll convert that to an array uh hurries out okay so we we we have this we we're going to sort this and then we'll print out our horiz lines and now this has been sorted out you see that we have box zero box three box five box ten box thirteen up to box 108 so if we do boxes zero see this is going to be uh you see first box and then boxes three will be our next box boxes three will be our next box see so on and so forth now uh what we could do here is also visualize this like we visualized this uh horiz vert right here so basically what we could do is we have for i in uh range len hurries lines or basically for for for for vowel in horiz lines see we get for each and every volume we are going to have let's let's do um aim or not max suppression we have our original image there we go we copy that we run this and then we will do this rectangle let's get back here to this rectangle okay so we'll have something like this copy this out and then here space this will do cv2 that aim right um rather the m nms gpg and then ms okay so we have that and then here we have em not max operation so we have this vowel see every time we we we get one of this we have the vowel you see we have the vowel instead of the box oh we have this vowel which are those different boxes in the line with the highest probability scores okay so that's it uh that looks fine we run this we run this and then we run this in invalid index to scalar variable let's check out the issue here um if we do print val you see the the values we have here are actually this different values so we actually supposed to do that box so we also have the boxes and not that so we select a box like zero see we pick out a given box and then we get its value so you see val so this box is val instead boxes vowel there we go boxes vowel and finally here we have boxes vowel so this should be fine now oh we have that let's take this off and then we visualize our image not mic suppressed so there we go we have here and that's what we have you see uh what we're getting here if you notice it's just a box for each line so this year is for this this is for this this is for that and so on and so forth but what we're supposed to do here is instead of having these boxes we should take horiz boxes horiz boxes there we go horiz horiz and final year we should have hurries so as we were saying this this this year this year represents all the boxes with the highest probabilities so you see these are boxes with highest probabilities now when we do the haris boxes we get the the boxes which occupy the full width of the image let's run this again and see what we get um getting an error now subscriptable oh let's be sure that we have this right horiz uh boxes there we go always boxes and that was it anyway let's let's do print arrays boxes a given vowel and now we should take this off print let's check that out and just not subscriptable so you see we we actually get this we actually get the the the horiz boxes but we should get just this value okay yeah yeah remember the the horiz boxes is uh given out differently or has different format with the boxes themselves the boxes we hope we have this here um this reconstruction here we have the image boxes let's let's let's just put that somewhere here so we have here horiz boxes [Music] and then we have image boxes image boxes you see that the the structure is different uh no this is um boxes not image boxes you see that the structure is going to be different like you see here see this is the the x mean y mean x max y max that's the the top left bottom right but here we have top left and bottom right ear so because of this difference in formatting we have to be careful when um using it here so here we have horiz boxes valve and then we pick out zero simply so yeah does it we have zero and then here we have one see we have one here we have two we'll just take this off and then here we have three okay so we have that let's run this again and hopefully now we should have no problems um syntax error while we're getting this error let's check this out see that that's fine oh it should be this one looks fine now take this off and run that again and visualize now okay so let's check out uh this one the mic surprise image and we get this you see that's what we get now see we only have a single line so we've we've done the non-max operation and we've gotten each and every line so we see that we have 37 different lines now this same process can be represent repeated for the vertical lines so we just uh repeat that by the same numx operation getting the the vertical lines and then sorting it out because uh the starting is very important um then that will be it so there we go we have this vertical out and specify the vertical boxes and the probabilities so let's run that and then we print out word out um yeah we have that and see what we get see we have just to be see we have just three so this tells us that this um we could get this from the box 52 box 45 and box one so you see we have just three which is for obvious reasons because we have actually just three columns here we have one two three columns so that's it we we we get this three you could repeat the same process to to visualize it like this but we're gonna skip that for now oh oh let's just let's just try to do it because it's going to be helpful um to to for you to understand how this works globally so let's uh take from here we're just going to go through as we had for this and we have this here so we could paste this out here and then we we have this then we do vert lines equal np sort we get the word out just as we did with the horizontals so we have the red lines here [Music] let's print out vert lines there we go now here we should have vert convert lines that's fine we've replaced all these horizontals by verticals we also made a slight error here where we had a two instead of three we've modified that that's fine um we'll go ahead to run all this again so we should be able to visualize our image now with the horizontal and the vertical boxes okay so you see we have horizontals and we have the three verticals now given that we want to take each and every element we have here and uh put this in some sort of array which can then be saved in the csv format in red uh in an excel sheet what we'll do is we're going to create this array but first we'll suppose that the elements which make up this array uh are simply empty strings so let's shift this and then we see we have this array created right here uh basically what we're seeing here is we go through the vertical lines and then the horizontal lines so here let's run this uh let's get the shape let's get the shape you see we get the shape this should be 37 by three oh it's a list so we should have let's convert this to an array so we could get the shape there we go oh we have that uh-huh so we have uh 37 by three so here you see we have 37 lines by three columns and what we'll do now is for each and every line so we take note of this we'll go line by line and then for each line like say this one year so we will let's let's consider this line see this line so for this line we are going to look each and every um vertical line so we pick out a line and then we go through all the different vertical lines so obviously we would have four vertical lines here let's just put this out here we have this one and we have this other one let's add this one here there we go you see we have this four lines you see we have this now let's let's pick this one this this horizontal line as we've said we're going to go for each and every horizontal line we're going to pick now uh each and every vertical line for this vertical line for example we're going to get the intersection of this horizontal with this vertical so clearly this intersection is going to be um this here let's draw it so you could see that this is going to be the intersection there we go you see that's going to be the intersection and so now we we get the intersection of this with this remember we're going to go through each and every line here so for this line we go we we get this intersection let's also do this here although here we'll have no word oh no no text actually um here we get this this intersection of the two see this is for the first line we get this intersection this intersection and this intersection with this one so once we have the intersection the next thing we'll do is we'll then compare with each and every box remember we've had all the different positions of all the boxes so we compare for a given box and see which box is coinciding with this intersection it's clear that uh the the the original bounding box we had here is going to have a high iou remember the iou intersection of our union is going to be like this uh as compared to another box which we just will pick at random so if we pick let's let's have this here see so the iou between this box and the original bounding box of this text will be greater than the threshold and if you compare this original box with another box you wouldn't get that so simply we just get the box which has the max iou with this box so we get the box which coincides the most with this box and once we get that it means we have extracted the text because for every box we know it's uh text and since we know it's text we're going to replace now uh this output array but this year we're going to replace this enter string with that text let's say for example component you see that'll be it so what we're saying here is we're going to go for each and every line repeating this again and then we go we compare each horizontal line with the vertical lines we get the intersection like here this is going to be the intersection and you see that it's only this text or this text is going to have the highest iou with this inter section right here and then uh for this position we're going to have the text component extracted so let's close that up we see we have our output array which when we print out our array it sees all empty strings with 37 different lines and each line having three columns as we could see right here so now we're gonna go through each and every horizontal line and for every horizontal line we'll go to each and every vertical line and then from here we will obtain this resultant which happens to be the intersection of our horizontal and our vertical line now let's let's let's have your horiz or boxes or these boxes actually because here let's let's have for these boxes we have all our different horizontal boxes here which we could see and then uh the horizon lines the origin lines corresponds to the the boxes which have the maximum probabilities which we have seen already here in this uh max operation so we each and every one of this produces a box that's why we have many horizon boxes or many elements in horizon boxes but only those with the highest probabilities are retained so that and those form the horizon lines which we can see here and so what that said we want to get the intersection of this line with each and every uh vertical line so that's it we get back here we have horiz boxes and then horiz uh lines harry's lines and we specify let's reduce this here or is lines and we specify that uh given index see that's it that's why it's important to sort this because sorting permits also start from the very first one right to the last okay so that's it we we we have this already we have the the horizontal line the next thing to do will be to get the vertical line okay let's check this out so here we have vertical boxes and then we have vertical lines and then we pick out your j so we now compare this with each and every one of these vertical lines um that said let's define our intersection method so here we have intersection and then what intersection does is it takes in a box box one another box box two and then it outputs that is a section simply so let's return the intersection now this intersection will be uh basically this year let's let's get back to this so what we'll do is we'll say okay we we have this horizontal let's take this off we have this horizontal line and we have this vertical line together intersection we need this point here see this point and we also need this point just right here now the x-coordinate of this point uh coincides or corresponds to the x-coordinate of the vertical line and the y-coordinate corresponds to the y-coordinate of the horizontal line see if i move downward up to this corresponds with the y-cutting of the horizontal line and for this point the x-coordinate corresponds with the that of the or or the x of the vertical line and the y-coordinate that if i go downward corresponds to the y of this line here of the vertical line but you should know that it's at this point so you have to be very careful with that now that said what we have here is uh box two so we have this box two that's oh not box two box two yeah we have this box too this is horizontal and this is vertical horizontal and vertical so the vertical will be the x and then we have uh its x coordinate then we have box one is y coordinate that's it so we have x mean y mean though the vertical x and the horizontal y confirm that here the verticals x there's a vertical line it's x and go this way x see from zero to this x vertical and then for the y we go from zero downward up to this corresponds with the horizontal y so that's why we have that there and then the next uldo will be uh box to the verticals or second that uh or rather second index actually uh this second index corresponds here to its ymax so uh all right it's x max sorry so we have box two the vertical is x max see it's x max is this distance from here right up to your distance from here to there this actually here so we could say from year to year it's the same distance actually so all you need to understand here is we go from zero to x max of the vertical and then to get uh that's how we get this the x coordinate of this point and then to get the y-coordinate we go from zero downward up to this point and that corresponds to the x off the the the the horizontal line so let's get back here we have that and this y coordinate here this distance corresponds with a y of this horizontal line so getting back here you see we have box or box one that's horizontal and we get its y so there we have box three okay so we have that set we have our intersection uh we get the resultants and let's print out this different resultants oh there we go print out the resultant see that's it um that said you see 14 434 51 457 we could check that out quickly here let's get this um there we go we have 14 just check this out here 14 4 34 x y uh 14 it doesn't match actually 44 is around this year 434 is around here oh there is a problem see 434s are on this so there's some problem here let's increase this and push this this way here we have four about four thirty four four inside right here so this means that i call it size instead with this box what we get is inside this box because we get the year six seven nine thirty six so now we could see this resultant we have two ninety five thirteen three three thirty four thirty one let's check that out here it doesn't correspond to this corresponds actually to this one here see 295 you could see that here 295 13. uh then look at this uh here we have uh 3 42 28 approximately this so this course this corresponds this first one is actually the first corresponds to uh this second box instead whereas we're going for each line we want to go for each and every vertical box uh now let's look at the next one 446 446 13 on this 2084 31 around this you see this is 484 31 this corresponds to oh let's check this now it's actually this 484 31 doesn't get there uh okay it's around this because the intersection although the vertical box was was like this not up to this end so it should be around this so here we have four let's see 492 40. so you see corresponds with this one here now we see we have this intersection but our problem now is it is unordered for the horizontals uh it's ordered because the way the the the initial algorithms the way this here the way this text detection and recognition algorithms work is they go from top to bottom so it means that for the horizontal lines they are already sorted so if you get this line here if you get this index for this box it's already uh sorted in the way that uh you don't need to rearrange them but for the verticals it's possible that when you when you have let's say three vertical boxes and then you're trying to create those lines with your non-max operational algorithm it may happen that the first vertical box you get will be this one in the middle and so we have to make sure that the first vertical box becomes this one and not this one so that or when running through or when we creating our table we should have uh the this outputs making some sense so let's get back here and try to reorder the vertical boxes to again understand what we've just said you see this here are vertical lines see after sorting we have 1 45 52 it looks it looks normal but when we take this here let's take this image and put the side here you see that this box two nine five ten two nine five ten let's go 295 10 should be around this 295 10 is about um this year it's around here so you see this is the box this box here that's after sorting box one corresponds to this so that's it corresponds to this box right here now box 45 corresponds to four four six three ninety it's about it's about this no no it's about uh four four six three ninety we need to go downward about 446 390. let's go down four four six it's about it's about here see it's about this box so this is the box which provided a vertical line and then here this box provided our vertical line meaning that this box had the highest probability and this other box had the highest probability for this lines now it happens that this one came before this one and then this one comes last so instead what we want to do is something ordered uh here you see 14 4 62 14 see 14 and then 462 go downward uh it's about this year see about this so this is the box which provided a position or a vertical line here so what we're having now is this has been sorted in this vertical direction because you see this is the first this is the second and this is the third but what we need to do is instead sort this in the horizontal direction such that this will be the first this will be the second and this will be the third for the horizontal lines we don't face any problems because this has already been structured in a way that this year will always come before this and so on and so forth so that's no problem but for the vertical lines we need to reorder this such that we consider this direction instead of this direction now more concretely what we're saying is when we have this vertical lines as we have here and that we're going through each and every vertical line here that is in j we're going we we go zero one then two that is we'll go to this one go to this next box and then this other box we now have to sort this such that we instead go through uh this box then this one before this one so what we're gonna do now is convert this into two um zero one you see that's what we want to have uh that said now let's uh sort this box right here based off its position in the horizontal direction to do that we're gonna have this unordered boxes so we have on ordered boxes and then for iron vertical lines we have this unordered boxes here which are gonna append the the values we get here so we we have uh for this unordered box for example we're gonna attach the vertical box so we add this vertical box uh vertical boxes i see pick that i and then we're gonna get this x coordinate and you're gonna see shortly why we need this x card in it let's run that and then add this code so right here let's print out uh on other boxes see that's on other boxes basically what we expect uh see let's no let's let's print this let's print this higher oh let's print vertical lines i uh vertical lines here that's fine we pin out the vertical boxes let's run that see we have our vertical boxes here now what we want is this second index to go to the zeroth position as we've seen here second should go to the zero and then the this one this one goes to the first and this one goes to the second so that's what we want to do uh but to do that we are going to make use of the uh unordered boxes so this unordered boxes here if you notice takes only this x coordinates and if we get back here you'll see that in order to sort the surge that this comes first this comes second and this comes third all you need to do is just get this position here you get this position you get this x coordinate you get this x coordinate then you compare them based off your x coordinates and that's what we're doing here we're comparing this base of this x coordinates this here and if we if we have that now see we have this we have this we have that we are now going to create an ordered box so here we have ordered boxes uh which we're gonna have as the ax sort so we have ax sort of ah on other boxes we started now with respect to this x coordinates so let's have that and we print out our other boxes around that and you see we have now two zero one so we've gotten that other which we needed and you could always test this with different images so we we left from 0 1 2 now to 2 0 1 let's take this off and now we're ready to modify this right here so every time we go instead of j we're going to get ordered boxes of g so when we had zero we would have two see we have two when we add one we will have zero when we had two we would have one so this permits us to map from the unordered boxes to the other boxes okay we run this now again and we should get something better uh scroll up there we go see we have um 14 see at this point here 14 14 13 51 31 there we go something around this 50 uh let's use this different tool let's get back so it's clear um increase that you see we have 4 19 around 14 13 and then here is about uh this 106 30. anyway uh it depends on the the box which which represents all this it should be this one so if it's this one then uh this year this distance here we have a 124 34. uh here's 51 31 let's go 14 13 it's about this that's fine uh we go 51 let's go this way we stop around here 51 and then we have 31 31 we go slightly downwards about this anyway we see that the box is around this so this is our intersection so now we we have this intersection the next thing to do will be to compare that intersection with each and every box and see which box uh corresponds to that intersection now the latest comparison we made as we've said already will be by using the iou's core now the iou here takes in two boxes box one and box two and then it computes the intersection and the union here we have intersection divided by union the way this intersection is computed is by making use of the box parameters so to get um this intersection here let's look at this intersection is x2 or minus x1 doesn't max between x2 minus x1 such that we don't have this difference or we don't have negative values so this parameters start to have negative values now the way uh x2 and x1 are gotten is by getting the max of the the x coordinates of the two boxes and here's the max of the the max x coordinates of the two boxes now let's explain uh let's get back to this here you see at this point at this point right here let's add some text let's copy this here at this point uh we have x mean y mean at this point here this point and then let's reduce that and then at this other point we have x max y max let's let's just generalize this let's put this here for the two that's x meanwhile i mean and this is x max y max you see that's it so now we have x meanwhile mean x max y max to obtain this intersection we need to get the this point and this other point here not this point not this one not this one but this one here this one right here and this one now to get this one what we need is the maximum of the two minimums so here we have these two minimums this two minimums here we need to get a maximum x of the two minimum x's so the the maximum will be this and then here to obtain the x here we need to get a minimum of the two maximums so the minimum will be this one here see and we'll coincide with this point and then for the y we need to get the maximum of the two minimums we have this you see you fall on this point so these are the two minimums here it's two points here top uh left two minimum to get a maximum or y we'll get this point and then to get this you'll get the minimum of the two or maximum wise which will fall on this here see it's going to fall at this point this these are the two maximum whites we just start from year to year distance from year to year the minimum will be this one so we'll fall on this and that's how we get these coordinates and once we get those coordinates we have this intersection see that's it now to get the union all we need to do is to get uh is to add this up the area of this box plus the area of this box minus uh the intersection so we don't calculate that the space twice okay so that looks clear uh that's what we just did here see x mean for the means we have the maximum of the two uh for the y you see this is one one the maximums and then for the for the uh x2 y2 that's the max to get x max y max we need the minimum of the two uh coordinates and that's how we get the intersection because uh here oh let's get back here to get this area all we need to to to do is to take this x max minus this x mean times the y max minus the y mean because that's we get the the the width times the height let's get back here that's what we've just done see that's it uh now if this decision is zero it returns zero simple you see here we get the x max minus x mean times y max or minus y mean that's it and we'll do the same for the box too and then we we compute this so let's run this iou we've already run oh no we yet to run this because we still have some work to do right here so what we're saying is now we have the iou and we have the the the resultant will be able to compute the result or we'll be able to compare each and every box so for i in range uh lane of all the boxes or let's know let's say b because we used i already for being range land of all the boxes we are gonna get the specific box there we go you see remember the the box notation was a bit different from the usual notations we have in here so we will have to get x mean we have 0 0 and then here we have 0 1 for y min here we have 2 0 and 2 1 for y max so that's how we get the box once we get the box the next thing to do will be to compare this iou so we have if the iou between the resultant resultant and the box is greater than 0.1 what we're gonna do is we're supposing that or we now know that this is the box in question and so we're gonna have the output array of i j corresponding to the line and the horizontal line on the vertical line so that's it we have i j which now is our uh text of b because this this remember we had already let's around this year we have texts texts uh we're on that there we go and each text is for each and every box so we have this text here so if the box matches then what we do is we make sure that the output gets that so we have i j so let's open this up oh there we go we have this so we go if this coincides so if this here this box this intersection coincides with this box then we get the text for this box that's it that's why it was very important for us to make sure we sort this correctly because without that sorting then we would have this one's coming before this which is not what we expect so this is uh this looks fine let's run this uh that's fine here and then let's look at our output array there we go that's what we get you see we have component unit borrowed article data energy see that all these values which have been extracted and in the right order now the last step now will be to convert all this information into our csv so let's have that uh we've imported pandas here so important this is pd so that's what we're using using pandas to convert this how to put this information in our csv so let's open this up and collaborate also visualize csv so this should be no problem here see that's it there we go see oh that's it see we left from let's compare this to see we've left from this this one looks too big so we shall use this one instead let's get back there we go we reduce this one and then we compare this here okay so what we're saying is we've left from this see to this now could compare this side by side get some qualitative analysis uh let's go this way and then check this out see component unit borrowed it's called data in percentage that's it energy see that's it the water i love that so it gets practically all the information see here there's another space here that's also the space so on and so forth so you see how we've been able to extract this information from this table and now what you can do is create a full pipeline which just takes a pdf and outputs all the different tables in some separate csv files so anyone who does this that will write a method which will take just the the pdf and output all the different csvs which contain all the different tables we get a scholarship for one of our courses at neural learn so here you just uh what you do is if you get that you get your and then you by mail you're going to send us a message telling us or showing us your collab which actually works which we shall test if that works then we shall give you a scholarship for any one of our paid courses

Info

Channel: Neuralearn

Views: 40,856

Rating: undefined out of 5

Keywords:

Id: HZh31OGiQRQ

Channel Id: undefined

Length: 112min 18sec (6738 seconds)

Published: Sat Jul 09 2022