GPT PDF & Image Data Extraction (Power Automate)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone so now that Microsoft has released some power automate GPT actions in the US I wanted to see if this enables a better way to access PDF and image data for automated workflows although there is not yet an official way to use more multimedia inputs in the GPT actions there is a way to convert optical character recognition outputs for PDFs and images into a kind of text file that the GPT actions can take in this means that we can now Access Data extracts summaries interpretations and any other GPT capabilities on PDFs and images in our automated workflows so if you're setting up the template then you'll want to go to the video description and find this extract data from PDFs and images with GPT page and once you're on this page you can scroll on down to this the bottom of this first Main Post and find this file that you can click here and download once you've got that you need to go over to your main power automate page and find this import button go to import import package Legacy and upload and upload the zip file that you just downloaded and once this screen comes up then you'll want to go down here to the import setup and select new connections for each one of the connections on the flow then click import once that's done loading you should be able to find this open Flow button and once you click on that that opens the template flow for you so in the browse terms this flow is pulling in some file content putting it through a OCR service converting that output from the OCR service into a text file and then passing on that text file to a GPT prompt um so if I go here then what I want to do is load in my first example and the first example looks like this so it's pretty standard looking invoice with product lines uh Bill 2 ship to invoice number invoice date just about everything we need there and then here's the part where it's going to grab that file content and then we're going to pass that file content onto the AI Builder recognize text in an image or PDF and once we've gotten that it's going to pass on those results to this section which is going to process all those results into a text file so if I go back to a previous run here with the recognized text action then this is what these outputs look like it's has a different um Json object in this Json array for each individual line of text that it's found in the image or PDF and along with each piece of text it also includes this coordinates piece that we'll be using so to explain a bit about this we have each one of these pairs of X Y coordinates corresponds to the like bounding box that they that the OCR system puts over the text the fines so if we're looking here say it's ship to it's going to put a little bounding box around ship 2. and so the bounding box coordinates you know the top left corresponds to this first pair of X Y coordinates and then the top right is the second pair bottom right is the third pair bottom left is the fourth pair and mostly what I I'll use is this uh first pair here for most of the actions but then the others may come into play um in more detailed parts of this processing step so the first part here kind of takes the average between the Y coordinates of the first and the fourth uh coordinate pairs and that's in case the document is slightly tilted that it's still going to get the correct um kind of Y coordinates there um and it's going to pull the Y coordinates for every single piece of text that this set of results has returned so you know ship to Bill to the address it's going to return the Y coordinates that it calculates for each item and then you'll see down here in this loop it's going to do a union on the outputs here so it's going to remove the duplicates and leave us with just the integer of the y-coordinate of each line on this document that has text in it so you know say this was the third line then it goes four and then you know this is the fifth y six y coordinate seventh y coordinate Etc so yeah it's going to identify each y line that has text on it and return only that and then also in proceeding step here we just added this property here sort x with the x coordinate of the top left bounding box so the x coordinate here and we did that because this piece here is initially going to sort everything by the x coordinates so that um you know it would give the x coordinate for the bill to then the ship to then the invoice number and have those re be sorted in here and then it's also going to grab only the current um the items with the current y coordinate so it's going to say for this line you know it's going to grab all the items all the text uh items on this y coordinate and they're already going to be sorted by the x coordinate so build to ship to then invoice number from there once we have that all filtered and sorted then we can move on to this main piece here with a select that's just going and arrange from zero to the length of the number of items that we filtered to on this line so the number of text items on this line and this concat is going to return both the text for the given item and the number of blank spaces that we need preceding that item and that's where some of these initial steps come in where you know I have this which is just like 200 blank spaces um this piece here is taking however many characters of blank spaces from that outputs based on our calculation in this expression and what the expression is doing is it's saying like okay if we have say ship to it's going to if we have say ship to then it needs to calculate it needs to return both the ship to text but then also calculate and return the number of spaces between ship 2 and Bill 2. so it needs a return basically all of that and it's going to do that actually for each one of these online all at the same time so it's going to return all this and then all that and so on and then it's going to add to the array of each line the each line variable it's going to add the entire line as a line into that array along with a page number and a line number and the reason we add the page number and line number is because I actually have these going concurrently so that it's still really fast for even large files with a large number of pages um and when you have that concurrency on then it could add lines to this array out of order so you know it could add the fifth line from the second page before it adds the Tenth Line from the first page so that's why we're adding these page and line number identifiers so that once we get out of the loop we can then sort everything by those page and line numbers and join it all together and get all of our text back in order so if I close these out so that Resorts it back into what it would look like on the actual document and then this is where it's all put together into a single piece of text like a text file so you see here if I control a and then Ctrl C then open up a notepad paste it in this looks similar enough to this to where chat GPT can then um intake it and use it as we're asking questions about it so this is the actual prompt that I'm using for this demonstration I outline what pieces of data I want to focus on and pull um just putting a note that you know this is an OCR captured text so some formatting and things may be off from the original file just so that I it will still return things even if it sees a little something is off maybe it'll give it a little more leeway there and then I also and this is where we've input the combined text output in the prompt and then you'll also see that I put in this piece specifying how I want to return the results so I've given this specific Json schema with where I want it to make sure that it uses the specific uh key labels here so that at a later step I can add in a parse Json and I don't have to worry about it returning potentially different key names each time that I run this flow and put in a new prompt to okay so now that we've reviewed all that let's go ahead and save the flow and run it once oh looks like it's off okay go out here turn it on and then run it okay and I'll open this final action here looks like it output the Json that I wanted so I'll just Ctrl a and Ctrl C and then go back to edit mode a new action purse Json generate from sample Post in the sample that I've got from that flow run go here input I'll just use the Json expression on the output text of the create text action and that should parse the Json and get into Dynamic content so from there I'll go ahead and get an Excel action to add a row to the table and I've already got a Excel workbook here I believe it is table one go here table one now I have all the dynamic content from the parse Json just post it in here what's total and then I'll add one more action down here see create share link and so this is just grabbing the file ID from up here so we can get a link that we can add to the Excel table so we can easily view the document uh right from the Excel table with all the data all right so now that that's set to parse the Json and actually send it on to a data set I will run it again so that's one run and then let's go back out here find another invoice and just so you can see over here the Stanford Plumbing invoice has a bit of a different format just so we can show that it can handle several different types of formats and Styles um so I now that I have that loaded I'll run that again and there's the first one and there's a second one so if we compare here you can see the Easter pair invoice the purchase order 2312 [Music] 312 got correct invoice number uh invoice total correct invoice total there um and then we can go over here this one note the invoice number invoice date balance and looks like it correctly picked up all that um this one does not have a PO number so it correctly returned the not applicable in that field and that's going all the way from image or PDF to the final data extract in an Excel file of course this could be sent to any different type of data source Excel SharePoint dataverse SQL wherever you need it to go just by changing out these bottom actions here and this could also be coming from SharePoint dateverse any anywhere where you have a PDF or image files where you can extract the content and pass them through this flow so that's a good example of the invoices I think next we will move on to showing how this works with something a little more complicated on the prompt side with resumes so I have this table prepared here to extract certain data from resumes email address highest position tool years project management tool years experience overall along the fire link and I also have prepared a flow here for this scenario so I've got it um loading one of the resumes there and I've changed out the prompt so I've changed out which data I wanted to pull make sure that I'm mentioning that what type of document it is make sure I'm mentioning that it's a resume and then I've also changed out the Json that I want to output and have that switched out on this parse Json as well and so here's total user experience project management highest position email address there so if I go and let this run with the John Smith resume then we'll see it add a row to Excel for him and then let me also go ahead and find another resume that I have in here save and test with that so it seemed to pull the email address very well along with the highest position and then the main thing I want to note on this example are these two data points so total user experience and total years project management experience uh if I go and open up the actual document here you can see here what this looks like for Smith and then I can do a similar thing here for Brook now if we know this for Smith he's been working since 2002 and it came back with total Year's experience of 10 which doesn't really make sense for him and the point there being this is um you know we've asked GPT to not only just pull data from here but also try to interpret the document for say total years experience and total Year's project management experience and it seems to still not be able to process and return a great result on this document for that you know with that total user experience 10 I'm thinking it probably um just saw this up here and kind of went with that so it still could use a little work whether that be on the model improving or maybe there's different ways that people could improve the prompts and get a better output for this type of document uh however if I go over to the run to the document with Brook it seems to actually have done a pretty good job pulling out um her total years experience and her utility years project management experience so it's a bit Hit or Miss uh again maybe some people will find ways to improve on this one um but I wouldn't be surprised if a year or so down the line we maybe have models that are trained specifically for this or just people have improved on this enough to where this is a strong enough use case with this type of template or this type of setup with a GPT so that's the prototyping and testing that I've done thus far with this OCR and GPT setup seems to be pretty good at pulling data from documents as long as the data exists somewhere in the PDF or the image may not be as good yet on the interpreting or processing the data on documents to give kind of reliable estimates or summarizations of things that we can use in automated workflows but I'm really hoping that we see people pick this and other types of flows like this up and start developing better and better prompts and sharing more of their use cases and those prompts that they find that really work for them and that we can really grow this functionality as a community so again the link to the thread on the power automate Community forum is in the description please go there download the flow and build some interesting things and share it with us
Info
Channel: Tyler Kolota
Views: 8,866
Rating: undefined out of 5
Keywords: GPT, GPT4, AI, Power Automate, Power Platform, Microsoft, Office365, Power Apps, Dataverse, SharePoint, OpenAI, Low Code, PDF, OCR, Data, LLM, Prompt Engineer, AI Builder
Id: mcQr-JsGj6Q
Channel Id: undefined
Length: 22min 15sec (1335 seconds)
Published: Tue Jun 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.