Extract Pdf Specific Data To Excel In UiPath | UiPath Pdf Invoice Data Extraction | UiPathRPA

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi friends this is my responder welcome back today in this tutorial we are going to learn about the PDF invoice extraction you can see the images here we have the invoice and we are going to search the data and we are going to extract only a specific data which we need or which we want to extract and we will export the PDF reader to excel in uipath it means will extract a specific data from the invoice like invoice number invoice date invoice amount and the person which they invoice going to send and like and there are few details also we are going to extract and then will write all these data to the excel file so I know that guys you are waiting for this video and I have received lots of email and comments to make this video for you so now this is the time we are going to learn this so first I will show you how it will work and what is going to face the data and how it will write the data to the excel file and based on that I will show you practically okay so first I am going to open the invoice Tex Tex and project and I will run that project and I will show you you can see here I have taken here three invoice you can check here one by one and all these invoice has the invoice number date customer ID terms and the bill to it means the person so we'll take this person name and the total amount right so this is the invoice sample like format and all this invoice is in the same formal form it and just the data is different like invoice numbers and the date customer ID amount and the person these are the rate are different so we are going to fetch all this data from the different different PDF from the invoice and then we'll write here in the excel file you can see I have created one excel file the name invoice data and the seat name is set and I have put the name like invoice number should be written here invoice date customer ID should return here terms bill - it means the person name and the total amount will write here okay so currently for the demo purpose I have taken only three PDF files if you have then you can turn on the multiple PDFs like more than 5,100 it's up to you okay so we don't have any data here now I'm going to close this now save and close this and then I will close all this PDFs here and you can see here in this project I have taken one completed invoice folder so that what will happen after extracting the data and writing to the invoice data it will move the files move the PDF files to the complete alien voices like let's say if first invoice data has been extracted that it will be moved to the completed invoices folder so without any delay let me show you the demo the name the project name is PDF in vs. data extraction okay so let me run this using control f5 so it will open PDFs one by one and you can see here and you have seen here the prompt I will tell you about this while showing you showing you how can we create this then I will tell you about this prompt why I am selecting this I will tell you later so you you have seen here first file has been closed and this has been also moved to the completed invoice folder so one by one it will open the PDF files and then it will capture the data and then it will move to the completed folder and then at the end after completing all this PDF files it will write the data to the excel file okay so now this has been done now let me open the excel file here and you can see here the data invoice number is two one seven zero this one three zero three one and this is different and then based on that it is writing all these details let me show you the PDF files here this has been moved to now completed invoices let me open invoice one and let me show you so in verse number two one seven zero you can see here two one seven zero the first number so it will light the data in the same order it will read here two one seven zero then invoice date this is the invoice date and then two seven nine is the customer ID and then this is the terms that is Nate thirty days this one and then this is the bill to like the person name this one mainly sent Evo seen and this is the total amount one seven two five you can see here one seven two five same as this we have the another invoice numbers and this has the different value three one three one three zero three one you can see yeah three zero three one okay and amount six five zero so let me show you the last six five zero okay close all this tab and you have seen here the data has been extracted and written to the excel file now close this and save these so that I can show you how can we do this so let me close the excel file and let me move again these completed files to the invoices folder so that we can reuse this files again okay so now you can see here in the uipath project let me first comment this and then we'll take a different one so we'll take here one sequence and let me connect this with the start node and let me delete this okay so this is so let me first cut this and start here and then I started say it as start node and just let me paste this commented sequence right and I'm just going to change the sequence name and this is PDF extraction project PDF data extraction project and now here in the sequence first we need to open the PDF files right so to open the PDF file first we need the path and the file names we need to collect like so which finally we need to open so for that we need to take one assign activity here and then we need one variable that is files or PDF files something let me check I have that variable here or not okay so it's not there so I will write here ctrl K and I will say PDF files and then I will write here directory dot get files and then once you write the open and close bracket it will showing here the function directory dot gate files and path as a string so it means we need a location of the files where the location is there is located so path we take one assigned activity and for the path we can write here path okay so we'll light here PDF but and then we need to pass you the path or you can directly pass here in the double quotation we can directly pass the complete path of the location that filename so currently we have the files have level in this folder so you can see here if the file location is inside the project folder because this is the project folder PDF extraction so if this location this PDF files folder is inside this project folder then we can just take this name copy this and in the double gerson we can just pass this okay and if this location this folder is somewhere else then we need to take the complete path and then we need to write there so either you can directly write that path here in the double quotation or you can take this variable so I'm just taking this variable here and you can see here the PDF path is currently genetic value but that can be a string we can change and then then we have the PDF files let me write here that is a PDF path then comma so why I am putting here comma because I just want to write here search pattern as a sing so search pattern means I just want to pick the files which is PDFs the extension is dot PDF so what will I do that should be as a string you can see here search pad pattern as a string so in the double quotation I will write here star dot PDF so it will just pick the files which has the extension dot PDF so now what will happen once this activity will be executed we'll have the list of files in this variable so it means this variable should have the capacity to store all the file names like this variable should be a collection of files the array of files names right so that's why we have the variable PDF files so the variable types would be collection of string or you can say array of a string so we'll take here array of a stream so now we have the files list of files in this variable and now we can use this variable to open files like PDF files okay now we need to open the PDF files so for that we have to take a for each activity this one so it will we'll take this one and we'll just change this name item to file it means we'll open the each file we'll get the each file from this list of PDF files right and then also makes you click on this and in the type argument object must be selected and then in the body we need to write the activity to open the PDF files because now you can see here in the file we have the one file because in PDF files we have the collection of files and now using this one we are getting one one files here so in file we have one PDF file name and then we'll take here start process so why are we taking a start process so this will help us to open the PDF file and here in the first we can type the file this is a variable because we are getting the file name here and that is a two string which is currently this is taking as object and now we are converting this as a string because you can see so is that this text must be quoted it means that should be a string so that's why we are converting this file to dot to string so now here it will open the PDF files now we can also because we're going to automate this so manually we need to open the PDF files because here it will open the PDF file so after opening the PDF files what we need to do so after opening the PDF files we need to capture the data okay so we need to capture the invoice numbers and date all these things currently let's say we are going to read this data and we are closing this and then again we are opening the another PDF then you can see here the location of the PDF you know the position is showing on bottom so this will not be readable so what we need to do we have to go on top because we are going to read this invoice number date and all these details which is on top so first we need to take one activity to send this page to on top right so how can we do that so let us say we have the page down and now if you'll place page up so page up will send this page to up so that's why we need to take one activity that is send hotkey okay so we'll take this okay so after I starting the process how can we same the hotkey so first we need to use this PDF file it means we need to attach that window so we'll take one attach window here and then we'll indicate on this screen let us say we have the PDF files this one let me maximize this and then we need to indicate on the screen you can is indicate here and now you can see here the selector and we can do one thing we can just keep here invoice and then from and we can delete this Adobe Acrobat because let's say if you are going to use Foxit or any other PDFs then this title will be difference apparently I am using Adobe Acrobat Reader DC if you are using something different then this line will be different so that's why I am putting a star so you can also use this same now save this and let me show you one thing here currently I'm getting here validate it means this is valid so selector is valid it will work fine now we can move this send hotkey in the do section because after attaching this this window what we need to do we need to send the hot key to this PDF files and what key we'll send will send the page of so page up will take from here page up and then after sending the page up once it will go up here then sometime what happens because we need to take care for the future also let us say some time what happens your PDF in the size of 40 nor 15 and you have closed this and again you are running this boat and if you're it's opening the PDF file then size will be the same 47 so sometimes it can make the errors it can throw the error so that's why always we need to do one thing we can set this page as a fit page so for that what we can do we can press control one from the like the number side we can press control one so automatically it will set this page size as a 100% like actual page so that's why after sending this document this page two up we can take one more send hot key activity and we'll send this will pace control and we'll take num1 so it will press control number one on the PDF files so it will set this page in 100% so that both will we will be also able to read this data correctly let us say if you are going to use the activity to click on this invoice numbers date then sometimes what happens it is not able to read this data right because it shows that we are not able to click on this UI element to you we are not able to click on this UI element so that's why we have the option that is called a reading option like how can we you can see here in the Edit we have the accessibility and then we have the option change reading option so by using this we can change the reading option of this PDF and then we can use or we can select any element which we need okay so I think you will able to understand why I am using this so I will tell you I will show you later about this so we have the list of option here info reading order from document left to right top to bottom and usually your reading order in raw print stream and the tag at hitting order so we will use here the only two things info reading and the tagged reading because only with the use of these two option we can automate our this invoice PDF files so first time I will start with the this tag reading order so it will allow you to select this invoice number this number date and terms and this bill to okay so for that we need to take one send hot key activity so instead of taking this send hot key and like press ctrl shift five and then select the tag reading and then press the start like because every time because I will show you why I'm going to tell you about this because every time we require this so what we can do I have created one walk flow for this let me show you let me delete this sign hot key and you can see here in the project I have taken this tagged reading order and the info reading order so in the tagged reading order I have just taken a attach window and then I am using a send hot key and passing ctrl shift five number five so it will open this this prompt and then after that I have taken here San hot key and passing here T so what will happen if we pray C at T it will select the tag let's let me just click this and I am pressing T from the keyboard now it will select the tag reading order so I have taken this st. hotkey and after that I am clicking on a start on this one and then it will allow you to read this or click on this elements UI elements ok so I have taken these three activities that is st. hotkey and click buttons so I will just do one thing I will use the same tagged reading order here it means I will invoke this workflow you can directly drag and drop this it will make this invoke workflow or you can like delete this or take the invoke workflow from here invoke workflow from here and then you can like click on this and you can select this tag order ok so this is the same but here we need to change the name and all these things so that's how you directly go to the project files here and I drag and drop this workflow file here so it will so invoke tagged reading order workflow so we don't need to import any arguments because you don't have we are just passing the same hotkey to the PDF files so it will just click on this it will press the ctrl shift 5 and we'll pass the T it will select this and then click on a start ok so now let me show you one thing let's say if we have ctrl shift 5 and if we have the info reading then let me say this and let me show you what is the difference between this so let me take one activity that is get text activity I'm just putting here for the to show you let me click on this and if you are going to click on this name then we are not able to you know select this just taking the whole names like with the name and the address phone numbers email ID all these things right so this is the issue so how can we resolve this issue so and let me click here it's working fine it's working fine it's working fine and if I'm going to the page down you can see here it's capturing the name this amount fine right but the only the issue with this bill to this person name so how can we capture this you can see I am NOT able to capture the only one main road it it can like is able to only capture the both things one main road and joins bug but this is the problem right so that's why we had the let me show you we have the option control shift five and we can press T so tagged reading order and now you can just click on this indicate and you can see here it is allowing you to select this name right separately now you can select anything separate separate name you can select one main road and this one this one this one this one this one but now you can see if I am going to again select this amount then it is capturing from here to add the last so this will make an error so that's why this amount will work just in a tagged order and this name will work with the info reading order right now I believe you got that why I am going to use this this tag order and info reading only for this so sometimes what happens person do not know like they don't know that why I'm not able to you know take this UI element why I'm not able to take this UI element and sometimes what happens you will try to take this then also you will get this error if you'll go here paste down and if you try to capture this let's say you if you are capturing this this way using tagged order then what will happen you will get error or like you will not get error but you will get the value of this one or sometimes you can get this value like that - or you can get this one also so that's why to avoid this issue we have the two options we need to take two options based on the requirements so let me delete this so first time we need to capture these things like this 4 and the 5 so that's why I am going to take this tag reading order so only with the use of this it is possible to extract those data okay now we need to extract this data so for that we need to take the anchor base otherwise like this is the best practice to use the anchor base otherwise you can get an error also in the second period for third third period so now we are going to take here first I am going to take here a sequence and I will change the name extract data and then I will put here the anchor base and now let me change the name so first I am going to fetch your invoice number so this anchor base is for the invoice number and now we will take here find elements because first time we need to find the element so let's say we need to find the element of this this UI element we are going to find and based on this we'll get this invoice number so we'll click on this and we'll click on this invoice number and now we need to click on the selector and like you can see here we have lots of thing here we can remove these things we can just leave in the first line that is window app equal to like this top level selector and we can just keep this the control name invoice number and the role text and raised of these selector we can delete and you can validate this you can see here this is fine and to check it's working fine or not you can click on highlight so it is showing you you hear that this is highlighting then it means it is correct now you can like okay save this and now we need to capture this number so for that we need to take a get text activity so it will extract the text from the elements click on this and then we'll just click on this to one-seven-zero okay now you can see the selector of this this has just control a roll equal to text and this has control name equal to invoice number and roll equal to text now in the output value we can pass your invoice number so we have the variable here invoice number and and the variable is the type is string so all this variable type is a string now save this and one more thing I am going to show you here so this is called the best practice also or this will help you to avoid the future errors okay so see this carefully now say if we have the invoice number here and we have the value two one seven zero here so I'm not saying that this will not work this will work or this will not work so if you are going to fix the issue permanently then what we need to do so you can click on this anchor base activity and you can see here in the properties of anchor position this is showing Auto so what will happen any time it can take the data from anywhere let's say if you are writing your info invoice number here and it so it can take the value from this also or it can take this value also which it can take this value also or it can take this value also so to avoid this issue to ignore this error we need to click this here and we just need to change the anchor position to top now you can see here anchor position has been changed and now invoice number is on top and the value is on bottom it means first it will search the invoice number label on top and then it will extract the data at the bottom of this invoice number right because you can see the same order invoice number on top and the value here on top it will search and then value it will capture here same as this we have the date and then this so we need to set all this like date customer ID terms and the bill to in the same anchor position that is top now close this not like this but let me and now we will take another anchor base activity and this is for this is for let me take invoice date and weld it and then again we'll take a find element and then get text activity so find element will click on this and click on the date and again we can click on this selector and just remove these three lines this is not required because sometime it can throw an error because of these three lines and you can see here in the title we have star anywhere inverse template and star and we have the control name date and overly cual to text now please okay save this and now we can click on the get text and they will can click on this date and then click on this anchor change the position to the top and click on this text and then click on value and take this invoice date save this okay so two things done same way you have to take another anchor base activity and you can put this in the sequence and this is for customer ID so you can see here I am capturing this data volley in the option that is tagged reading order not in the info okay so you have to also max your while capturing the selector capturing the records if you are catching this data in the info then it can throw an error it may throw an error so that is why only keep this in a tag trading or order option and then extract the data now customer ID find element get text now click on find element click on the customer ID change the selector delete this and validate this okay get text get this customer ID put this value the variable in that is customer ID because I already have this variable name available here now same as this invoice date we need to change the anchor position to the top close this now we need to take another anchor base for the terms right here terms and we'll take a find element and click on this terms and then change the selector delete this so I'm just deleting this because this is not required and it can throw an error in in future and now take a get text activity click on this and click this Nate 30 Dage save this and put the name variable name that is terms and end you can click on these terms change the position and keep it as a top and then we need to take another anchor base activity for the bill too it means the person name and for that we'll take one find element and one get text activity now find element will click on this bill too so it means this is our anchor will extract the data based on this based on this bill two and change the selector like modify the selector validate this so once it is green it means this is valid save this click on this and click on this text the element which you need to extract so we need to extract this name now save these click on this bill to the anchor position and changed top and select the variable name that is Bill - so let me check here I don't think we have this variable available here or not ok so we don't have that so we'll take here ctrl K and build - we can change this to change it to a string and we can put here PDF extraction same as this now let me just you know make sure we have the terms ok now this is done and this is done now we need to this is done now just we need to fetch this total amount so for that I have created this tag like info reading order so I will just drag and drop this info reading order and one more thing because before using this I need to because you have seen here first time we are on the top and then we need to send this page to the doubt down so first time if you are pressing here one-time page down then is going here and then one more time is going here so we can take here two times to send send hot key and will press page down page down so we will take here send hot key here and let me indicate here on the top and we'll send the page down so select from here page down and so we can copy this and we can paste this one more time so we have the to page down after like pressing the page down two times we are using here info reading order I'm going to show you about the info reading order what we are just doing here so that you can also do in your project I am just using attach window and just taking the element of the invoice and then sending a hot key of control-shift num 5 like 5 so it will open the that let me show you it will open this prompt and here you can see I have selected this this that's this reading order combo box and then I am passing your I in the small so once will select the place I it will select this info reading order from document okay and then we are taking a click button activity and clicking on the start so once will click on the start then this page will sit for a reading option as your info reading and then will go here now we will take let me close this and now here we'll take a anchor based activity to now read the total amount let me change here the name that is total amount and I will take a find element and then one get text activity and then click on the find element and then click on this UI element of total and then again click on the selector as we move all this selector validate this and press ok now click on the get text and now you can see here it is just capturing the amount that is one seven two five and earlier when the option was tagged order it was selecting the whole from here to this but now it's capturing only this amount now this is done save this and just in the gate text just put the variable name as a total amount and you don't have to do anything now and now just click on this anchor and same as that now you can see here our position is this is a left side and this is right so our anchor is left side so now we will click on this anchor and now click on this anchor position and the state of auto or instead of top we will click on this left so it will always consider that uipath will get the anchor as your total from the left side and this is the right side and now this is done we are able to capture the data now we have captured and stored that data in the variable and now we need to write all this data to the excel file so before that we will close this PDF file so that it will open another PDF files also so for that we'll take a close application and we'll just click here put here and then indicate element and click on this so this will just close the this PDF every time now close this and after closing this PDF we need to write this data to the excel file and we need to move this that PDF file to the completed folder so now we need to move that folder and write that data to the excel file so to write the data to the excel file first we need to read the excel in which we have the column names so for that we will first read the excel file so let me take here rearrange activity and we'll take this from the workbook rearrange I will drag and drop this here and the workbook path will be this invoice date data and seat and I will remove this this is not required and I will take the date data table as a invoice date DT ctrl K I envy you I see DT now save this and click on this add headers while we check this so that it will consider the first row as your header the column names and now after capturing the data we will take one add row items here you can see here we have the add data row add data row we'll add the data to the rows like in the data table because we have read the data and stored in the data table so we'll take this and right here and you can see here as an input we need to pass here invoice DT so this is the data table we have captured and now in the array row you can see here in arrow we need to pass all the then that names so based on the Excel format you can see here invoice data we have the invoice number and then involved customer ID terms so we'll go here and we'll write here invoice number comma invoice date comma customer ID comma terms comma bill two comma total amount and then we need to capture this in the open closed curly braces okay now save this now again you can see an ad ad data row you have to click on this array of row and then you can write all this variables name which is you have taken to get the value from the gate text you can write here so I have taken in the same order invoice number date customer ID terms bill two and total amount now this is done save this and after that we need to write this also because now we have added to the data table and now we need to write to the relative also so how can you do that so for that we need to take one let me take this here so now we need to take here right range activity right arrange activity so it will write that data table here invoice data seat one and the data table is invoice DT now save this and after that we need to move each files to the folder how can move that let me take one more assign activity and this is for move the PDF files so we'll take a PDF folder for the processed folder processed path and then we'll write the folder name so what is the folder name folder name is the completed invoice copy this and as I already told you that this folder is inside the project folder so that's why we will just put the that folder name in the double position and let me check here the processed path is created or not so again I am going to create this processed path okay so this is already exist let me check here in the variables invoice process path change the name to the string and put here processed path save this and now at the end let me go here and we'll take here one move files let me check here we have the move fight okay we have the move file so we can use here before writing the data to the excel file so we have the file name in the file dot to string and now we need to pass the destination so destination is the path that is a folder name that is processed path so what will happen it will take one file and move to this folder and you can select this override if it had it if it is already there then it will override there now say this and now this is done everything is done now just go to PDF extraction save this and close this excel file save and close and let me run this and you can see here you okay so first PDF data has been extracted and read into the excel file and closed now second has been opened and this is also in two extracted and then this will be closed again okay done now the third file will be opened right now this is done and the data has been captured and let me show you data has been written to the excel file or not and we just had been moved to complete invoices - or not so you can see here it has been moved and now the final that is Excel file you can open here and now you can see here okay so data is like written here from the a1 and what is the issue here you know what is the problem first you can see here the data is written correctly but the problem is if we'll go to the right range activity at the end let me show you we have to select this add headers otherwise it will remove the headers and it will start writing from a1 so previously what was there in the a1 that the invoice number right so make sure while writing the I know data to the excel file you are checking this add header option checkbox otherwise this will be deleted or you can put here b1 so it will start from b1 right now save this and you can see here the data has been captured - one seven zero three zero three one five five three six this is the different different date this different different customer ID terms in the name and the amount right so I hope you liked this video and this is going to be very helpful for you next time like I will tell you about the scanned PDF images I will take few scanned images and then I will show you how to capture the data from the scanned PDF also because this is the just a PDF it has the like option to select the UI element but in the scanned PDF we'll see we will we are getting the option to capture the UI element or not and how can we capture these data please press the like button because I am getting very less likes on my videos and please subscribe if you have nor subscribe and if we have any question then please comment on video or you can send me an email on uipath are at their gmail.com and now also I have shared my LinkedIn URL so you can in the post so you can click on that or you can find me there and you can follow me you can send me requests I will connect thank you so much friends Thank You Watson's video
Info
Channel: UiPath RPA
Views: 91,915
Rating: undefined out of 5
Keywords: uipathrpa, uipath by manish pandey, uipath, pdf data extraction, uipath pdf extraction, extract pdf data to excel, pdf data extraction to excel in uipath, uipath invoice automation, uipath invoice extraction, uipath invoice processing, extract specific text from pdf uipath, Extract Pdf Specific Data Into Excel In UiPath, invoice extraction uipath, uipath pdf data extraction, uipath pdf table extraction, extract data from pdf to text file in uipath, robotic process automation
Id: 4u48YqgZff8
Channel Id: undefined
Length: 55min 0sec (3300 seconds)
Published: Thu Nov 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.