Get data from PDFs and send to EXCEL with Power Automate Desktop!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello today we're going to be extracting from PDFs using power automate and regular expressions and I am going to give you a little demo of what that looks like first and then we're going to talk a little about it and then we're going to jump into the step-by-step tutorial so I have PDFs here in a folder and they look like this this is public data so I got these off the web and what we're going to be getting today is we're going to be extracting this institution name the rssd ID here and we're going to pull something from one of the tables down here so I think I use like the interest in come in like this box and I chose these because they're kind of a mix of the easy to get things and a little bit more difficult to get things that are embedded in the table so let me show you what the flow looks like I'm going to start this and it'll probably pop up on my other monitor I'll have to move it but so I'm going to run my flow and it is loading all of those into Excel from the PDFs this is um by the way power automate desktop which is not the same as power automate Cloud so what power automate desktop does is it controls things on your computer so you can tell it to go to websites click things move the mouse push keys on the keyboard and extract text from files on your computer this is a tool that's really good for process automation when you're doing things in Excel it's got a whole bunch of excel actions it can run scripts and it can actually send data to SQL server without a premium connector which is a differentiator from Power automated cloud and this thing is free with the most recent version of Windows as of this year so that's pretty cool um essentially if you're willing to be a person who pushes the start button you don't have to get into all of the licensing pieces so we're going to keep it simple today and just use the free version and I'm going to create a new flow and give it a name so the first thing I like to do before I get started is to just make my Excel file template that I'm going to be storing the data in just set up the columns the way I want them that kind of thing so in my case it's a very basic Excel file file so you just got file name institution name the ID and the income as column headers and then blank cells and I want to point out that you don't want to insert a table around this yet because what we're going to be using in our flow is an action that gets the first free row in Excel and if you have a table inserted it thinks that that row is not empty and it'll insert everything below your table so you can make a table as part of your flow you can have your flow insert a table around your data at the end but to put that in until later okay and what I like to do with these Excel files is create a template file that the flow uses and then does a save as on it so that way when I'm working on this I don't have to go back and erase the data every time when I'm testing things if you know what I mean so back to our flow here the first thing we need to do is get our PDF files that we want to be extracting from so that is under this folder section so you can tell it to get files in folder and then give it a path so mine is in my documents folder which is under the C users my name documents this one here and if you want to you can give it a filter so you can um it takes wild card so you can do asterisk dopdf and then it'll only pull PDFs and then toggle whether or not you want it to include subfolders and by the way you can do this text extraction from PDF with PDF that are hosted somewhere in the cloud like on a website you just have to set it up very differently so the the way I've done web PDFs in the past is to have the flow go in and open the PDF file and then copy the text using keyboard shortcuts and then it can parse the text in the clipboard if that makes sense but since we have these files on our computer it's a lot easier for us to access them so the next action we need to do is we need it to open our Excel file so that it can put data in it right so if you search for excel in here there's a ton of excel actions so we're going to use this launch Excel and then we're going to change this from blank document to open the following document and give it our template file that we have saved somewhere and then click on Save and what this does is it stores that Excel instance into a variable and then it can refer to that when you're using your actions so like I said I like to do a save as on that template file so I'm just going to add a save Excel action real quick and have it do save as so you can change this from Save document to save as and then give it the path and the file name and save that and then we are going to Loop over all of those files in that folder and perform actions on them so we need a loop in here so those are under the loops category so we're going to use a for each Loop and the value to iterate here is going to be the output from our get files in folder step so if you click on this x here it'll let you select that variable so that's files list of files and then we can save that and now we need to get our text from the PDFs so there's a whole category of PDF actions here I'm going to do this extract text from PDF and drop that inside the loop and for the file this is going to be really easy because We're looping over those files right so we can just from our variable go and select the current item so that's the current item that we are on in our for each Loop and then it's going to take that text and put that text into a variable here so next we need to parse that text so we need to look for the specific things we're trying to extract because this extract text from PDF is going to get all of the text in the PDF and then what I like to do here is save and run this so that we can see the extracted text because that's going to be how we decide what text it should be looking for to extract what we want so since we've got a for each Loop here I'm going to click on this white space next to the word end and see how it added that red dot that red dot is a break so it's going to stop that act C in the for each Loop so that it doesn't keep going because I just want to see one of them right I don't want it to Loop through everything right now so I'm going to run this and make sure that you've got your Excel files closed because if they're still open it's going to error out when it tries to open the Excel file that's already open just keep that in mind and the flow variables over here if you expand this out you can see where our extracted text is the ellipses menu next to it if I click on that and then click on view it gives me the text that was parsed and we can expand this so you can actually read it so you can see the institution name is right after the words institution name so what we can do is make an expression that looks for these words and gets whatever is after that in the same line so I'm going to close this and we need to stop our flow so this it stopped at this break here but we need to click on stop in order to start editing this again won't let you edit it on when it's running so I'm going to search for the word parse in here to get parse text as an action so we're going to parse the text that we extracted and for the text to parse I'm going to give it our variable of the text that came out of the PDF and for the text to find we have this option of is regular expression and we can turn that on so what this does is it lets us write an expression for the text to be looking for and I'm going to give you the ones that I use that are super common so typically you're looking for a particular word and whatever text comes after that word and what we can do with that is we can refine it once we get it to split off the parts that we don't want so for our text toine we're going to look for the words institution name and our regular expression is going to be a parentheses period plus sign close parenthesis so what this is going to do is it's going to look for this text it's going to store that and whatever comes after after it on the line and if you need to get text that is on the next line down from text which sometimes will happen in PDFs what you want to do instead is add back SLR back sln and that is the code for a new Lon character buts is on the same line so I'm going to leave that there and what you can do too if you have something a little more complicated that you're trying to pull is you can use chat gbt to give you a regular expression it's very good at making regular Expressions the only thing is you have to explicitly tell it not to use look behinds because it'll want to do that and then for the variables produced it's going to want to create a variable for the position of that text so we don't need that so I've turned it off here and then match is going to be whatever text matches your regular expression I'm going to rename this to something specific so that when we refer to it later we know what it is so I'm going to call it institution name and then we need to split this text so this is going to include the words institution name in the return value and we don't want that when we're sending it to excel so we need to split that off so I'm going to use a split text the look behinds is how the regular expression removes the included text and as far as I can tell power automate doesn't support look behinds so um that's why we have to split it out so we're going to split text and we're going to use the variable that we just created here so our institution name text and and we're going to split on custom not standard and then for our custom delimiter we're just going to put in the text that we're trying to remove so that's institution name and then I'm going to leave the space after it because I don't want the space saved either and this one is not a regular expression so don't toggle that on and then for the variables produced I'm just going to reuse the same variables I'm just going to replace the value that was in the variable with this new value and then click on save so now if we run run this make sure to close your Excel file from before because I keep forgetting to do that and what that got is our institution name here in an array so you can see that the first position is blank because it split that text out that's why it's blank and we're going to use that array and refer to that second position to get just the institution name for our Excel file so we're going to stop this and and to get this into Excel the first thing we need to do is we need to tell power automate where we want to put this information in Excel what we use for that is first free row so this there's two actions for this first free column or row or just the first free row on a specific column so I'm going to use this one because it's a little bit easier to set up and that's all we need and the Excel instance it'll automatically grab and the column we want to use is column B so I've got the file name column B and actually we could have used any of the columns that we were putting data in here B will work fine and it saves that to a variable so we're going to use that to refer to when we're adding the data to excel so we've told power automate where our first free row is for it to put the information now we need to tell it to put the data there so we're going to tell it to Excel actions write to Excel worksheet so the value to write if you recall from my template the First Column there was for the file name so I'm going to do that first so the file name if you recall is the thing that We're looping over so it's the current item and for the column that was column A that we were writing the path to and the row is going to be that first free row variable that we just created in the step above that one and we're going to do the same thing for our institution name so the value to write if we select our institution name variable here this has the split values in an array so it's got two positions so we need to do is tell it which position to look in so just put your cursor right before the percentage sign and do a square bracket and then give it the position number we're looking for which is the second position but because this is computer speak um the second position is actually number one and the first position is number zero so I want to tell it number one and then close the bracket so this is the second position in the array and then for the column this was column B and for the row we're going to give it that first free row again all right so let's test this out has a tendency to open Excel behind this window if that bothers you you can tell it to Focus window and bring it to the front so we've got our file path here and our institution name so that's good we still need these two here so let's do those next um the ones in the table are just a tiny bit trickier because we have to be very specific about we're looking for so if I look at the PDF the interest income was right here which looks like it's close to this text but I just want to double check our parse text and make sure so I'm going to copy this particular value that we're looking for so I'm going to go to the extracted PDF text and actually I'm going to so it's next impossible to find things in here so I'm going to put this in a text file so that I can use a search on that particular value I'm going to paste this in here and then I'm going to look for that value with a contrl f so here it is so you can see this one has some extra bits after it on the line so what we want to do here is change our regular expression a little bit so let me show you how to do that back in here I'm going to add another parse text step so we're going to get another value and then for our text to going to select our variable again the extracted PDF text and text to find um the regular expression we're going to use here is a little bit more complicated we're going to look for this particular text value that was next to the number that we're looking for and then we have a parentheses square bracket back slash d square bracket plus sign close parentheses so what this is going to do is it's going to look for a number and include commas after that text and the thing with the commas is that commas are technically text they're not numbers so you have to tell it to get both otherwise it's going to stop at the comma and we need to turn this regular expression on first occurrence only is fine and then for variables I'm going to turn off position and change this to interest income save and we need to do the same thing we did before with the splitting the text so by the way you can copy and paste these things with contrl c contrl v on your keyboard um it's marginally faster I don't because you still have to update everything when you change it but I'm just going to fast forward through this next part Okay so we've got all our text we're going to send it to excel now the same way I'm just going to copy and paste our step here and replace the values all right so let's try running this and it looks like I split on the wrong thing here so this is looking for RSS hyen ID not space ID so that's why failed so make sure your text matches exactly unless we run this all right so that got our values correct let's go ahead and remove this brake step here by clicking on it and then I'm I'm going to do a Focus window so you can actually see this running let's stop and then do this just as an aside the other thing that is useful here is maximizing the window so this is more for when you're interacting with parts of excel like clicking through menu options the menu structure changes depending on the Sid of your screen so you always want to have it maximized so that the buttons are in the same place every time so there's another action for that so I'm going to focus it by window instance handle and select our Excel instance save run it again so that should bring it to the front so you can see it's pretty fast the thing that'll make it even faster is running it without the editor open because when the editor is open it seems to take more time to visually go through the steps versus running it from the launcher menu so the last thing we need to do is save and close our Excel file in our flow so let's do that so we have two options here you can either just use the save Excel option or you can close Excel and then save it as you close save it down [Music] here all right so we are done make sure to save as you're working on this I don't think it autosaves the um the other thing to do with this file when you're done is you could email it so there's Outlook action in powery and desktop you could upload to SharePoint although that's going to be a premium trigger as far as licensing goes anytime you're going from on premise to Cloud you trigger the licensing fees so I hope this was useful to you I'm planning on doing another video on how to do web scraping so how to get data out of websites in the future so keep an eye out for that and I hope you have a great day
Info
Channel: Christine Payton
Views: 4,681
Rating: undefined out of 5
Keywords: tutorial, power automate desktop, RPA, pdf to excel, read pdf
Id: kW2D853JqQg
Channel Id: undefined
Length: 18min 51sec (1131 seconds)
Published: Tue Oct 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.