How to extract text from a PDF using Power Automate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and a very warm welcome in today's tutorial we'll see how you can extract text from image based documents using power automate so before we get into the actual demo let's look at our use case so we have few documents which we need to extract text form these documents are image based that means when I try to search for a word for example hovered it's not visible I want to extract the inventor of this particular patent request and update it into a column in SharePoint so this is how our SharePoint document Library will look like as I told you I want to update the inventor text so what I'll do I'll go ahead and create a column and I'll name it inventor we'll keep it multi-line of text just in case there's a long address or if there's a long name in them next what I want to do is the source and destination is the same in that case this can result in an infinite Loop so I need to manage the infinite Loop to do that I'm going to create a choice column so I'll create and yes no column and I will say OC art and I'll set it to no so this is how our document Library structure will look like the document Library will have two columns that are that we created inventor and OCR OCR is an yes no column and the default value of the of the OCR column is set to no so that being said let's get go into our power automate so let's start building our flow so I'll click on automated flow I'll click on skip so first things first we want to select a trigger so our source is SharePoint so I'll type in file and I'll select when a file is created or modified properties only I'll select the site and I'll select the document library now that I have the site and the document library in the trigger before adding a new action I will configure the trigger condition so a trigger condition will look like this okay so what does it do let me actually type in this correctly so we have a column that is OCR one of the flow triggers and if it finds the OCR column set to False only then it will go ahead or it will go to the next step only then it will go to the next step now that we have the trigger condition now let's get the file content so I will say SharePoint again and I'll get file content here I can pass it the identifier which is the output of the trigger now this is where the real magic happens because this is where we are going to extract the text so I'll select the mohembe action and I'll type in extract text using OCR what I need to pass it I need to pass it the file name so it will come out of the trigger file name with extension the file content will be the file content which is the output of the get file content now yo my friends you need you see that I need to pass in x y coordinate and also the width and the height so let's try to extract the data from our PDF document to do that we can use the pspdf kit measurement tool so I'll just quickly use the demo and I'll type in measurements and here I can go and upload my document perfect right so let's start with the x coordinate so distance so it's approximately 1.73 so if I open my calculator I'll do 1.73 into 72 which is approximately 125. so next comes the y coordinate again I will use the measurement tool and I'll try to draw a line so the approximate will be 1.80 so again open my calculator 1.80 into 72 which comes to approximately 130. now let me try to find the width and the height so let's start with the width width is approximately 2 inches I don't need a calculator to measure that 2 into 72 it's 144 and then I'll try to measure the height oops that's the wrong one delete this annotation and let me try to draw a line okay this looks good so it's approximately zero point seven so 0.7 into 72. which comes to approximately 50 right so we have the measurements now we need to add these measurements in our power automate so X was 125 y was 12 130 the width was 144 we did not use a calculator because it was 2 and then the height was 50. and I want to extract the text only from page number one right it has six pages but I just want to use page number one now that being said next I need to update the properties so update file properties let's do this I'll pass it the ID which is the output of the trigger and let me put in the title in place of title the inventor will be the out text and here I'll set this to yes and I can set the description to description perfect so our flow is ready so it's time to test our flow here it tells me that it's it can trigger in finite Loop but we have already sorted that out by putting a trigger condition so I'm going to upload a file and let's see how does it works sample three so the file has been uploaded and my flow is running so it's trying to extract the text and the text has been extracted and if you see out here it seems that it has extracted the correct text so it starts with the name Harvard and it ends with one zero two three and if I go back into my SharePoint you see my friends it has been updated so that being said your flow has completed successfully and we have also managed the trigger condition thank you [Music]
Info
Channel: Muhimbi
Views: 2,446
Rating: undefined out of 5
Keywords:
Id: mLY3Ithyhwo
Channel Id: undefined
Length: 7min 21sec (441 seconds)
Published: Fri May 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.