How to Extract Data from PDF with Power Automate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
our client today is a burger company they want our help to do some invoice processing that is they received emails with invoices as as attachments and they want our help to extract the data automatically out of those attachments and get them into excel it could be a database too but for simplicity reasons we choose excel let's see how it's done so we want some input data that is the data that's going in through emails here i save them i have my desktop in a folder and they look like this i highly recommend you to download these six invoices to your own disk and do the operations with me that is download the data they are in the description below and then open up power automate in that way you will learn the most so this one is invoice number one it's quite standard we have an invoice number up here we have a date we have some data we have some addresses we have some total taxes whatever and then we have a table here that is we only have one row in our table and then some headers but we could have 10 rows in our table and let me show you the second invoice so i'll have my phone here so the second invoice looks exactly like invoice number one x apart that we just have two rows here so we need five invoices to train our ai model and we have six here that is we will use the first five ones to train our model and then we will use the last one to test our model our flow whenever we are done let me stress that today we are working with structured documents that is that our documents they look the same they are in the same format we can see that they look like each other that is structured data we could also work with unstructured data then there will have a bit more a bit different process i'll show you this one in the coming week so be sure to subscribe and get notified when i make a new video in that way you will not miss the coming cool power ultimate ai unstructured video that was a long word but let's solve it for this today it doesn't have to be invoices today we'll use invoice that sample dataset it should just be structured data that's it so now we inspected our data let's create an ai model and let's use that model in a power automated flow so let me close down these two here and i think i open another folder so the first thing we do we go to ai builder and then we click build that's it so here i need to do some form processing that is structured data so i click here we could name our ai model something i will just call mine invoice processing like this then i click create it's very very easy to use the ai builder in microsoft power automate we need to specify what information we want to extract out of our document and that was the invoice number the total the items in the table and as well so we can click add first we have a field and that is these fields that could be invoice number total whatever then we have a single page and as you can see that could be our table a single page table that could be a table with our items in we just have to specify if our table date spans one page or multiple pages ours is only a single page so we will choose this one afterwards finally we have some check boxes as you can see here but let's specify all fields so i click fields i click next the first one that was the invoice number i click done then i add another one i want another field so click add next the next one that was the date and click done and i'll add our third and final field like this that was the total you're very welcome to take more data out of your invoice but for now this one is sufficient as the intuition will be the exact same thing now we just need our table with data in it and again it doesn't have to be an invoice it could be whatever document you have with a table in either pdf or a png or jpg so click single page table and then click next we can call our table something that doesn't really matter that much but i can just call them bigger items then i need to tell what name my columns you have so if i click the column one rename column here and i would give them the same names as they have on on my invoice so that is item then i had quantity quantity if you're in doubt simply just open the burger invoices here and see that what i do is i'm doing the right thing then i have the rate and finally i'll have the amount so now i added headers for my table then i can click done now we can click next and we need to specify our input data so that is called a collection so if i click new collection here and then i click the plus sign i click add documents we will wait a few seconds and the data source will pop up a data source that could be a sharepoint or it could be asia blob storage i haven't tried with that one so i cannot tell you how to do it here it will be the exact same things but as you have downloaded the invoices to your desktop most likely choose this one and this one is training data so almost always will happen locally but you could have it online here press the first invoice then hold down shift and take number five now we have chosen one two three four five don't choose number six then i click open so here i click upload the five documents in a few seconds my invoices that is the sample data set they are uploaded and we will use these five invoices to train our ai model and i click close now we have a collection you can rename this as well but it's not really that important keep in mind that we could add another collection for another type of invoices say we only have two invoices in then this ai builder could automatically sort which invoice that was you could also create 100 different collections but that would be quite a mess so and that would be this unstructured data we will use a model for that but for now we will only have one connection as our invoice always look the same then i click next so now we need to tag our documents and that is we need to specify where our data is so first one you can see here we have five documents up here we can scroll through them i'll take the first one first thing i do i need to take this little cross here so move it over here hold down your left mouse button and make a selection then we need to say what field is that well that is the invoice number so pick that you can see here it turns green now i have the date and i will intentionally make an error so let's say that i have chosen this date here that is the due date but i wanted to have this one up here so i can just click escape then we're done but say that i went one step further that if i took this one and i even took the date now you can see it turns green simply just clicked in a tree black dots here and remove tag let's choose the right date that will be this one here i choose the date then we want a total like this total now we want to extract a table this is an invoice table but it could be any table you wanted to extract so what i want to do here is that i mark my row again by left holding down the left mouse button and i mark it that is the burger items and then you can see here we can we have no more rows but if we had you can see here there's a ruler here simply just left click and that one will separate our rows but for now we will take the columns and to do so press ctrl n and you can see we can move around so ctrl here control there and control there now we have four columns we need to specify our headers so this one is item this one is quantity this one was a rate and then the amount if this goes too fast simply just pause the video or ride a bit it's designed for that that's it and you can even see here we have our data up here in a table so we can inspect that what's going on is the right thing then i click done that was number one then i take number two and we do the same thing here i'll do it a bit faster there's nothing has changed so you should just do the same as we did with the first one let me take the total i take the table here that's burger items control like this specify the column headers so here quantity the rate the amount click done on to the next document up here that is number three that's invoice number state that is total and again this feels a little bit intuitive as we are automating things and then we need to do these things manually but that's just how it is now we get a little challenge but let's first add the columns so there's one column here one column here and one column there now we just need to separate our rows so go in the middle here and left click that's it we have our rows and columns we just need to rename the column so i click here item quantity then we'll have the rate and the amount that's it then we click done let's take the fourth one and we take the invoice number here check the date we take the total and we take the items from here one of the debug items you can see it automatically detected it now we don't have to do it then i just click done now we have the last document so that one would be the invoice number the date the total again if you have questions you're more than welcome to put them below in the comments and the section is designed for that i will make sure i will get back to each one of you maybe not the fastest so here i just click next that's it now there's a resume of our what we have done then i can train my ai model my model is training go to models here it will take a few seconds to train it will show up here eventually in a few seconds and it will train will probably be taking around one minute we won't sit around and waiting so while this one is training we will go to power automate we actually in power to it but we'll go to create a flow that we can import this one into so go to create we will need to choose a flow and our burger company they get the invoices in by email so we will choose an automated cloudflow that is this one here we will add a name i can just call this invoice processing like this then i want to search a trigger and this is when a new email arrives so that is this one choose the one that corresponds to your outlook mine is office 365. yours are probably the same so choose this one then we click create so we have a trigger that is whenever we get an email and we want to make it a bit more strict it needs to have an attachment and say that we always have invoice in the subject we can specify that as well so we won't have to process all emails the first one take only the emails with attachments yeah that is uh quite important it's not important but it will make it will not make the flow run every time we get an email so now it's only with attachments we want to include the attachment in the flow as we will use them then we want some subjects so say that it always says invoice in the subject we simply just specify the filter here we can specify more strict filters but this one is sufficient for now so i can collapse the advanced options by the way i always recommend you to look what's going on here in the advanced options sometimes you will use it then we want to have a loop that loops each attachment that should be in this email imagine that we could have more attachment to the same email so i click new step then i want to find an apply to each here and i click it so i want to take each one of these attachments from up here so i click in here and i go to the dynamic content and scroll a little bit down well either search for attachments or find this one here there's other a lot of dynamic content that is named something with attachments make sure you pick the right one that is this one so this one is iterating through each attachments now we can start to process them but first we need to pick a place where we want to save our data to that is the data we get out of the invoices we are saving it to excel and to do so we will create an excel book in onedrive so i go to my microsoft office and open up onedrive here i can just create it in my main directory in the root of it but feel free to do it in the subfolder as well you can also create this in excel online but for now this is sufficient so i click new here and then i take excel workbook so that's it let me enlarge it a bit so you can see what's going on here i haven't just a plain excel sheet i can choose to remove it right now it's just named book that's fine for now so let's create a simple table with just four column headers and that's it the first one that should be the invoice numbers this one is just the data that we want to save then we have the date we have the total and then we have our items so that one was the one for the table first we will save it to a sip the same excel cell that will look a bit messy but it will be fine we will have to use some data science on it to get it more beauty and we will do so so now we have our table here we have nothing in it that's fine so mark the four ones here then click format as a table choose whatever format you want that doesn't matter at all so i just pick one random specify that your table has headers as it does these ones are exact headers we don't have any data in it but we will add it so now we have our table we can to to give it a nice name it it's not necessary but it's always best practice so click inside anywhere in it click table design and we have our name up here so here i can say invoice data this one is just a referral so when we use it in power automate we can easily identify that is the right table it's not very important for now again clicking it click table design you will see it up here but it is important when we work with say 10 different tables that they are not named table 1 table 2 and sometimes they even name the same if you use different books so for now this one is good so we can go back to power automate then we go to ai builder and models and we can see that hopefully our model has been trained let's just click it because we need to publish it we can also run a test on it so let's do that first we run a test and then we want some sort of an image and we have that because that was invoice number six so let's test that it just worked the way we want so i put in number six you can see here that we have five rows now in our table so it will analyze and then it will see if it can find data that we specified it looks like it it has this invoice number six confidence score 99 that's very safe we have the date we have the total and to inspect this table simply just left click and that looks about right as well so now we have tested it we can close it and we can publish our model so our model will get published now so we can use it in power automate and power apps so let's just stay here a bit and while we wait for this if this video helps you you can really help me a lot if you give the video a thumbs up you can either choose to use it here but since we are in the middle of a flow we will just find the flow again so i take the invoice processing i click edit open up the apply to each in a few seconds like this click add an action now we want to use our ai model so extract information from forms pick that one so we want an ai model and that one we call invoice processing so simply just click the drop down and choose it here the form type we need to specify what form type it is well here it is pdfs but as you can see you can use jpegs or pngs so we choose the pdf then we will specify a form surprises and that one will be the attachment from up here so i click in here then i go a little bit down now you can see that it's quite clever power automatic it automatically choose the autumn oh attachments content if you want you can see more but there's no reason to it we will take the attachments content again choose the advanced options that is only pages we will just choose the entire page as those ones are just one pages that's it so now we have extracted it we want to add them to excel so take a add row into a table in the add an action like this so add row into a table here like this then choose a location mine is on onedrive for business the document library that is onedrive now i just need to find the file so pick this one here and when because we were in the root of onedrive my book is here but if you have any subfolders just click your way into those subfolders mine is here that is the book and now we will select our table and that's why we gave it a name you can see here invoice data here this one opens that is the advanced options now we need to specify what should go inside each of the five four column headers the invoice numbers well pick the dynamic content and as you can see here now there's a form extractor that won't give us some dynamic information and you should look for the value here because those ones are the value so if i go a little bit down invoice number value this one was our defined string invoice number and the value that is the value what went in there so then i picked a date again choose the date value you can also search for the dynamic content up here but for now it's okay and pick the total that one was here and then we have our items so here we can see the individual column headers item rate amount and quantity but we want to extract everything because we will extract the whole table that one will be our entry in excel a bit later we will see how to extract each one in separate places but here we will choose the burger items entries that's it that was our flow really simple flow almost just drag and drop so i click save let's try to run it because it's a trigger-based flow i need to send an email to myself to see it will work so i go to outlook then i click new message let's send myself an email and i will be sure to in subject put in invoice now i just need to drag in invoice number six i can drag in whatever i want it will work but that's it then i click send in a few seconds it will show up in my inbox and hopefully that one was it and as you can see over here our flow will run in a few seconds without arrows usually there's probably a 15 20 second delay but that's fine this one will work in the cloud we will not see this flow run in in real life that's it now your flow is running it's processing the document and in a few seconds this one will have green check mark as well and that that's it we can inspect our data so if i go to one drive and then the book we have here we have our data here so if i just expand these ones here we will see that we have we have invoice number we have date we have total and then we have some a bit of a messy data but that's fine and another thing you can see that this one has automatically formatted it for my local currency which is chrono to fix this you need to just right click in your data set there's a few ways to fix it but we'll fix it in the easiest way so right click number format currency make sure that your currency is now in u.s dollar so if i say a dollar english united states that's it so and for the next entries it will look like this now well this one was not really pretty you can see that we have actually our data here but a lot of irrelevant data for our row that was not really nice so let's try to fix it if i go over to run history we need to inspect the log before we do anything and if i inspect my data let me scroll all the way down to our items mark everything and then copy it we go to this json mirror this is just a json is just a format that the data is in to transfer them efficiently so if i paste them in here it still looks a bit messy but if i click beautify it looks a bit more nice you can see that we actually have some data that we want to use we have the item here we have buns that one is the value of this one here then we have some we have some key value pairs so if i say here i have the bunch then we have the rate and the rate was uh 0.29 and so on each one of these entries in our table if i click up here that was one item you can see that we now have 160 lines less i can click here here and here so we have four rows that one corresponded to the four rows in invoice six we want to make it a bit more beautiful so to do so we will use a bit of a bit of magic but it will not be that it will not be very very hard to solve we will just go inside this json to get the item and then we will have the rate we will have the amount and then we will have the quantity so let's create it in power automator we can have one entry for each line in the table so i go back here i click edit and instead of this our items click here and now we will write our expression so i'll go to expression and let's uh just talk about what we want out so if i go to my json here i want for each entry i want the item out i want raid out i'm on the amount out and then i want the quantity out and since um let me just do this and here we can see that we have one entry so it's simply just go in we already did the entries i just want item rate amount and quantity and i want to vet the corresponding value out so i take the item and then i want the value that was bonds to do so it's very easy so go to the expression the first one we will take that is the we will take the items from this apply to each and this is a very good approach to learn to white function so i'll say items like this then i'll say parentheses single quotation marks and i'll say apply to each whenever we are referring to what comes out here and whenever there's space we are doing an underscore and since it's applied to each two i'll take another underscore i'll take another underscore and then two and now we can we are in the entry so now we can work our way down in the tree so if i say question mark and the first thing that i wanted that was the uh that was go down to item then value so i'll take the value of this item and the same thing i'll do with the rate if we see here rate value i'll take that one so if i go back those ones are written in hard brackets so i'll say hard brackets single quotation marks item and then we will have the question mark we are going one way down into three hard brackets and then we want the value just see here if i go up so i have the value here so then i just say in single quotation marks value so this one is sufficient this one takes the value of the item of this the first entry so if i click ok here it says that it's wrong it's not that one is it so instead of just writing out the three other ones they have the same structure i can just click here now let me copy all this one here and to and then go back we just needed to copy it so go here and go here and make a space and now we can make the other expression they will look all for expression will look the same so if i go to expression i can paste in this one here that was what was the one from item and then just change item to let me show you here rate amount and quantity that was very easy so it's basically the same approach so what was the first one that was rate and amount and quantity if i want to have it the same way as in my table then we have quantity before but it's really just a matter of excel gymnastic so now i have the quantity here and the same approach will go i'll pick a space then i'll choose expression paste in this one here then i want rate like this click ok it's invalid it's not sometimes it just happens then i click another space click expression paste it and again and this last one that was the amount if i misspelled this one it will give an arrow but we can easily fix it so now we have created our flow and we can we can take each line in their table of our attachment for invoice so i click save here let's test it and i can process each one of the rows to our database which is excel in this place so now i save and test it i'll try to send an email to myself with an invoice in so i go to my inbox click new message i'll say anastasia let's pick the sixth here and i click send so i need to put in a subject and that was one one invoice otherwise this won't pick up that was a filter we made in the beginning just so you know what's going on so now it's centered we got the invoice here we can go to our flow flow will run in a few seconds and then we will hopefully have four lines of new data as it will take each row of the invoice table as an entry in our excel sheet again rewind the video if you think this went a little bit too fast this is not easy stuff however you can easily do it just rewind the video and replicate what i did here it will become very easy to you when you do so our flow range successfully and if we go to onedrive again there you go you can see that we now have uh of course before the the invoice number the date and the total they'll be the same here but now we actually got our nice data out we could also of course also have assigned these ones to separate columns but for now that was fine take my full power automate course by clicking the playlist in the middle that one will teach you everything about power automate you
Info
Channel: Anders Jensen
Views: 70,377
Rating: undefined out of 5
Keywords: Extract data from PDF with Power Automate, power automate pdf, power automate pdf extraction, power automate pdf to excel, power automate pdf ocr, power automate pdf data extraction, power automate read pdf, How to Extract Data from PDF with Power Automate, how to extract data from pdf with power automate, extract data from pdf power automate, how to extract data from pdfs with power automate, anders jensen, power automate, microsoft
Id: GW5iUV2rRGQ
Channel Id: undefined
Length: 29min 29sec (1769 seconds)
Published: Wed Dec 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.