RPA - How to extract data from PDF file in Microsoft Power Automate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome back to the channel so in this video I'll be talking about how to use Microsoft power automate to be able to extract information from a PDF file and copy it into an Excel file all right so these are the steps that we're going to do here so let me just um we're going to extract uh data from an Excel from a PDF file right and then we're going to paste uh copy the extracted data back uh to an Excel file um we're going to be doing this automatically right so we're going to be using the tools that we're going to use that we will use are Microsoft power going use Microsoft Excel Excel and we're going to use one drive and we're going to use the PDF files and samples right so I'm going to be showing you how to do this using RPA tool RPA and RPA tool is an RPA is a what it called robotic process automation automation tools uh which is essentially a tool which allows you to automatically uh do repeative tasks so for example if you are extracting data from a PDF file and copying it to an Excel file or to a word file and you doing this in a multiple for multiple documents maybe 50 20 30 documents you can use RPA tool in this case to automatically extra try the details from your PDF document and paste it into an Excel file and you simply sit back and you watch as I do all these tasks for you and then you just look at the results and then you go on to the next task so the biggest advantage of RPA is for you to be able to minimize your workflow activities by an average of 40 50% an hour and reduce time in hours okay so let's go ahead and Jump Right In okay so for the first step is we're going to look at how to load uh Power automate from the Microsoft Office portal okay so the first step is when you lock into your Office 365 you should be able to see this and you have two ways for you to be able to um copy uh or S to access power automate you can do it by clicking um apps here and searching for power automate or you can go to here and click all apps and you should be able to search for Power automate which should be listed here somewhere so power Pages power automat right here the easiest way to do it is simply to type in power automate and selecting it right here and it's going to load up in a separate tab I do recommend using Microsoft Edge as the browser as it integrates directly with Office 365 and specifically your Microsoft Office appli directly okay so this will load up your power automate it might take a few minutes once you're here go on to the left side and click on AI Hub and that will load up the power automate AI application or functionality and from there we're going to select the different options available uh for us to be able to configure for your our document okay so let's take a look at AI capabilities click on that and we're going to see some uh options here we're going to see click on see more AI models once you click on AI models then click on extract custom information from documents okay this is the uh this is the document workflow which we're going to use which allow power automate to be able to extract information from an invoice okay so let me just show you an example of what we're going to do I'm going to show you the PDF files which we're going to extract information from it so let me load it up here just one second okay so this is a sample uh PDF file which I'm going to uh show you here and um essentially what this does is this will show you the PDF documents so let me just minimize it so this is a sample invoice right and it has an invoice number it has the date the invoice date it has the due date and it has some items here with the price and it has the total amount okay so what I'm going to do is I want to extract the invoice number and the invoice date from this document and copy it directly into an Excel document or an Excel file okay so we're going to work on that just to make things very very easy and very straightforward I want to extract the invoice date minim maximize it here the invoice date and the um let's say I'm going to ask extract the uh sorry the invoice number and the invoice date two data points I'm going to extract from this document okay so let's go ahead and begin we'll go to our Microsoft Edge and I'll go ahead and click on extract custom information okay I'll click on this and this will give me the uh details of the sample I just scroll down uh it's basically sample template which allow gives me an option how I'm going to what's it going to be doing and what uh information it can extract from scam. documents so I'll create create custom model and uh this will lead me to the U the model which I will use to be able to extract the data from a scan document or sorry from a PDF document into an Excel document so this will load up the uh Power automate uh AI model for extracting data from PDF files and copying it into another file okay sometimes it takes a few minutes to load okay so here I'm going to be choosing structured document so go ahead and click on structured document and here this is um where you can name your model so I'm just going to name it something like invoice processing okay invoice processing okay that is now going to be renamed to what my model is going to be and then I'm going to use structure structured documents structured document means it's contains structured data which means it's self-sufficient uh sorry self- inuitive which means the the data in the document is um is clear and concise it can be easily understood then I go down here and click on the blue button where it says next okay then I'm going to add the information which I want to extract from my document so I'm going to click add and I'm going to choose a number field because I want to extract a number from document from the PDF document I click next and I'm going to enter the name of the number field which I'm going to so I'm going to put invoice number okay and the format is I'm just going to use a comma because it's going to be a whole number and I'll click done okay then I'm going to add another data field which is another which is going to be a date field which is we said we we're going to extract the invoice number and the invoice date so I'll click on date field I'll click next and then I'll put invoice date and I want to make sure that I match the invoice date format into the format that I'm going to choose so in this case here the invoice date is actually uh the date format is December 6 2021 so it's month date and year or month day and year so I want to match that so I'm going to choose month day and year and then I'll click done okay now I've selected the two data points which I want to extract and I click next now I want to add a collection The Collection is basically the document samples which I want to um upload to allow the AI engine of power automate to learn from so let me just explain this point here um collection is uh samples of documents in this case PDF files which I will upload to the power automate AI engine so it can learn from them okay uh so let me just go ahead and upload it here so this is the collection I'll close one uh and I'll click uh select documents add documents go down here click add documents and now I'm going to set the data source maybe have SharePoint maybe blob storage whichever but I have it on my device so I'm just go ahead and upload it here and I'm going to go into my documents here and I'm going to upload it so I've got um I need to upload a minimum of five documents for it to be able to learn right you have to have minimum five so I'm going to upload here the five documents and now it's going to be uh scanning the documents and then I'm going to click on upload once I click upload it's going to automatically store them in the power automate a I engine and now the next step I'm going to do is start selecting the data points for each document okay so now I have one collection of documents we have five documents I click next and now what I'm going to do for every single document I'm going to choose point which I selected in the beginning so I've chosen two data points the invoice number and the invoice date so in this case I have I have to highlight and Mark each data point which will allow power automate to be able to learn from this data point so the first data point which I mentioned was the invoice uh number and the invoice date right so the data let me just mention this here data points invoice number the invoice date so now let me just choose so I'm going to choose the invoice number which is right here let me try to increase the the the visibility so you guys can see here so this the invoice number you can see it here so I will choose this and I will highlight the data point which is the invoice number okay then I will choose the invoice date so again I'm going to select the invoice date here and I'll choose invoice date now when you get this blue mark on the document it means this this document is complete now if I've chosen more than one more than two data points then I have to make sure I select each one for example if I included my data point of the balance du then I have to highlight that as well and then H and then choose the balance du or maybe the amount du this will complete now for one document now very important point I have to do this for every single document so I've got five documents so now I will go ahead and do it for the next document and it's going to load the next document and then I have to choose the exact same thing again so let me increase the resolution so I will choose the invoice number and I will choose invoice number and then I'm going to choose the invoice date that's the invoice date and so on now that's complete so I'm just going to fast forward to the all three documents and I and I'll pause it right here I'll unpause it when I finish to doing it for those three documents okay okay so now I've done for all five documents and make sure that you see the blue check mark on every single document this will confirm that you have completed the highlighting of the data points for all the documents now I'm going to and you should be able to see this blue button where it say says next here click on that this will take you to the next step okay now this will confirm what you have added uh the document processing the name of your model uh structured and semi-structured documents uh the document sources from my device a total of five documents which I uploaded for allowing the model to be able to learn from the AI engine to be able to learn from and I have one collection which consists of five different documents okay the data points which is the information I want to extract or I want the RPA engine to extract from the PDF files is um invoice number and invoice date and I provided five examples from each single document which I uploaded total of five documents in my collection I have five invoice numbers and five invoice dates um extract highlighted now I will go to the bottom and click train and this will allow the model to start training and learning from the document collections which I have uploaded right so what what what it's doing right now the AI engine is learning by assessing the data point for the invoice date and the invoice number it's analyzing and learning it from every single document to be able to detect and be able to learn that any future invoice document I upload it will automatically recognize the data points of the invoice date and the invoice number and be able to extract it and place it into an um an Excel file okay now I'll go to models so now I've got this one invoice processing and now it's still training right it's going to take a few few minutes for it to complete so I'm going to pause right here and I'll come back when it's finished okay okay guys so now that the model finished training so now I want to be able to test it right so let me go ahead and open it okay now i' I've opened the model now I want to be able to test to make sure it's able to recognize the data which I have uh trained it on right so I'll do a quick test you click on it you open it it says the the name of the model invoice processing you click on quick test and then I want to upload a document sample of an invoice that I have created and I want to make make sure it's able to recognize the data which is the invoice date and the invoice number so I'll upload I have a sample document here which is invoice number six and I've uploaded it it has a similar structure as the other invoices right similar so now it's analyzing and now it was able to detect it see for example let's minim increase the uh visibility here you can see that it did selected so let's see I point out the invoice number uh six which means it was able to scan the number and the confidence score is how confident it is that it matches the model that it was trained on and 99% which is very confident okay that's the invoice date same thing invoice score is 99% which means it's it's almost very accurate the more uh keep in mind that the more data you upload in the collection for it to be able to learn from the more it will be able to learn and be accurate in its future selections of the documents which you submit okay so this is means it's not nine okay so let me do another quick test so I'm going to submit another document which I have here let me open it here and it's in my documents it's an invoice number seven so this is another invoice uh similar but what I did was I changed the location of the invoice number and the invoice date right so um what I can do here actually is let me just open another document I'm going to be able to go here and go to documents and invoice number so this is the invoice number document it's going to open it here let me see it here just one second okay so this is another document invoice number so let me just put the date this is the date here and I want to add another maybe even change it here right and I'll I'll go ahead and put maybe I'll change the formatting here I'll put the invoice number um I'll put some gaps between the invoice number and the actual invoice which makes it different from their previous sample uh document which I uploaded in the collection which it learned from so I'm going to save this as um invoice 8 okay okay and I'm going to save it as a PDF file because that was what was what I wanted to do this is the PDF file invoice number eight I'll save it in downloads and Bam now it's saved okay so I'm going to go back to my model now now and I will go ahead and upload um invoice number8 which was number here the PDF file and I'll upload it and now it's going to analyze to see how it can to make sure it can detect the invoice number and the invoice date so it actually did let me zoom in to see what it did here's the invoice number see the confidence core number six and here's the invoice date okay so even though there were gaps in in the document so this for example if you have 10 documents and some of the invoice documents or any other document in the PDF file the the formatting is is messed up it's not the same they're not consistent as long as you upload a sample of that document and you highlight it in the training of the model it will be able to recognize it in the future it the more the more you upload sample documents the more accurate it becomes in learning how to predict what will the future what will the data points be okay so now I have closed so now I have highlighted the two numbers now I can click publish once you hit publish it will publish it into your table to make sure that it was it was it is active that you can use it for workflow extraction okay this so this video talked about how we we're able to to highlight the data points in the next video I'll talk about how to extract to use the model and extract it and copy information into an Excel file okay I'll see you guys the next video thank you for watching and if you like this video please hit the Subscribe and like button if you have any questions and comments please leave them down in the comments and I'll be happy to answer them okay guys thank you very much I'll see you guys in the next video
Info
Channel: TechLife360
Views: 1,864
Rating: undefined out of 5
Keywords: RPA, Power Automate, Microsoft Office, Automation, PDF, PDF File, Extract Data, extract data from PDF file
Id: ixJ9onngdiI
Channel Id: undefined
Length: 20min 13sec (1213 seconds)
Published: Fri Jan 05 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.