How To Extract Data From Multiple PDF To an Excel PIVOT Table Using Power Query

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] today I'm going to use PDF invoices to build a power pivot table in Excel before we start importing any PDF into Excel we have to understand the structure of the PDF if you look at this page it seems we have a table set sections some row items and columns but we not sure what the structure is exactly and I'll show you in a minute what I mean by the structure so let's get started I open up a blank Excel file and as I always I go to the data tab I hit get data from file and not from PDF but from folder because I want to import multiple PDF invoices to my data then I can browse for the folder where I have my invoice is saved for sure it looks empty because this is just a folder s selection I hit open and the Navigator pane is just connected to my folder I can see the attributes and everything related to my folder from there I simply hit transform data and now I can see exactly that what files what type of files are saved in the folder if I would have something different what I don't want to import I can simply just select the extension column and filter out the extension types I don't want to include in my folder this time I will need only two columns the content and the name column I select those I right click and I hit remove other columns and I'm going to name my query table as data source PDF this is the raw data source I'm going to use in my tabl the content column includes the binary information that's basically the PDF file or the data what Excel was able to pull from the PDF as you see if I select one of the cell then it will show me that what's the actual binary data inside that file and I don't want to trans transform this cury table because I will need to make multiple transformation so what I'm going to do I'm going to right click on the original table and I'm going to hit reference and I'm going to name this table as invoice number my PDF files are named properly but I want to pull the actual invoice number included in the file so I'm going to use one table to get those records and now we are going to learn a new M function which is called BDF tables I enter the name of the function and from there I have to enter the name of the column where I have the binary data and that's the content column I select I close the brackets it seems there is no error with my M code I hit okay and voila this magical function is able to extract the data from the PDF F so if I select one of the cell row under the PDF column you can see that this function found tables and pages in that binary data and that's what we want since we want to pull the information from those tables so let's expand this column I'm not going to use the prefix as always and let's see what we have in those four columns I hit okay it expanded the actual content and as you see it was able to identify the type of the content we have how it's named in the PDF structure and what kind of objects we have saved in those PDF files and as we see we got pages and table for sure the tables are inserted on the pages and in order to simplify our life we are going to select from the kind column the table type and now let's see what we got in those tables so I'm going to select one of the cells and I see I got invoice date number customer ID Etc that's the table one let's see what we got in the table two table two includes the actual item list from the invoice but in this table I want to see the invoice number I I want to collect those records and as we see if I go back to the table one my invoice number is included in the table one table so I'm going to select from here the table one and I think we can modify our query a little bit it seems we don't have to use the name. one column I'm not going to apply a new step I just simply go back in my apply steps list and I select expand PDF I hit the gear icon and I say I don't need the name column so I don't want to see that in my Cur table I go back to the last step and here we go it disappeared now I select the data cell one more time to see what I got there that's fine so now we can pull the actual records and data from those tables but before we would expand those tables and we would get the data we can remove the unnecessary columns I'm going to select the content the ID and the kind I hit right click and remove column here we go I left the name columns right next to my table because I will need to join somehow the records and data since the data is stored in different PDFs right and now I just simply hit the Double Arrow and I can expand the content whatever is stored there I want to see all of them so I just simply hit okay and here we go I got the name of the files and what records are stored in there and that's where the structure comes into play now we can see that the first the first track cord is the invoice number Tag then the actual invoice number customer ID Etc and there's a sequence and logic how it's recording and stored in the PDF so we can easily transform the data them and it seems we are lucky because the invoice date is also stored on the same table so that actually means that this curity table not just the number but the date as well so I'm going to just rename it and let's see what we need from here the customer ID is not necessary because that's my company's customer ID what what I do obviously and I do not need the terms either so what I'm going to do now I'm going to select the column one and from there I'm going to remove the customer ID here we go but it's not perfect yet because what we need is the actual invoice number and the date let's see what left still the customer ID and the term comment is there it's the same data everywhere so what I can do again I go back to the filter I remove the customer ID I hit okay okay and now only the data left what I need but I don't need a header of that record so I'm can remove these records as well these row items from the tables so I'm going to select the filter one more time but obviously you can do at the same time when you transform your data I just want to demonstrate how you can transform and find the structure of your table and here we go I got the invoice ID related to the F name and the date and as a last step on this table I'm going to name my columns and saying invoice number invoice date and I can form a D properly I select the column let's say this is going to be text and this is going to be date and it's done let's find the actual items we sold on those tables I'm going to select my data source again hit reference and uh I'm going to name it as sold items and I just simply repeat the process what I did on the previous table I'm going to say PDF content and the cool M function we learn today is the PDF tables I open the brackets I enter the name of the column where I have the binary content close the brackets hit enter I expand the table I'm going to select the ID kind and data this time hit okay I'm going to select the table two because that's where I have the item list stored in the PDF and again and I'm going to remove the columns what I don't need anymore right click hit remove other columns awesome and now I can start expanding the actual tables from the PDF I want to pull all the columns let's see what the structure is there is an end line at the tables which is always the same on the invoice so I can start with removing the unnecessary items for sure we do not need the nules and blanks so if there is no item assigned on the invoice then why would I need that then I'm going to remove this comment thank you for your business I hit okay okay it seems we are getting closer but as you see there is a glitch in the PDFs for whatever reason the last column generated a second column to display the record so basically we got invoices pulling the amount value in column four but in another invoice the amount value is displayed in a different column but it's easy to fix with M so let's trans from the data a little bit more and I'm going to say custom column I'm going to name it as amount and we are going to make a very easy if statement in m so I'm going to say if column 4 equal n which is nothing empty blank in M then I want to get back the column five as a result else I want to use the column four no errors I hit okay here we go I got the Mount value merg together properly and now I can remove the column four and five I don't need them anymore let's double check one of the invoices so we got item quantity unit price and amount so four column let's go back to the Cure editor and as you see 1 2 3 4 the fifth column is the name what we put from the binary data so I'm going to now properly format my columns the item should be text the quantity should be a whole number the unit price should be currency and my amount is going to be currency as well and now let's name these fields properly item name quantity sold unit price and amount and now we got to the point where we have to decide which method we use since there are multiple methods to build that pivot table from now and I'm going to show you one which can be done easily in the same environment in the part cury editor so what I do now I go back to my data source and I'm going to create a new reference cury table and this is going to be my invoice pivot table not the actual pivot table but my source and what I do now I remove the content column I do not need it anymore and I'm going to merge my data according to the file names right because we left the F name on each cury table cuz that helps us to differentiate the records on the table so what I simply do I go back to this table I go to the Home tab and I'm going to say merage ques and I'm going to merge with the invoice number and date first my unique ID is the name name again hit okay now I can merge with the second table which is the actual item list sold items unique key name name and now let's expand the tables I don't need the file name anymore because that already exists on my table expand the old items I don't need a name hit okay and voila all the data merg together on the same cury table without building any data model or linking the tables and from here our job is very easy I go back to the Home tab I hit close and load close and load to don't forget because we don't want to load all the tables back to the user interface of the Excel I simply hit only create connection it opens up the cury page and the only thing I need is the data I built for my Pivot and that's the invoice pivot table and at this point again we got two options we can decide whether we want to load our data into a simple tiot which is under the insert tab on the Exel graphical interface or I want to build a power pivot which is more advanced and I'm going to show you both technique so if I want to build just a simple standard pivot table then I'm going to use this master invoice pivot table right click load to select table it can be the existing worksheet or a new one let's say the existing A1 hit okay and my table has been loaded to the Excel graphical interface and from here I just simply select the table and I hit pivot table on a new worksheet and from here I guess it's very familiar for everybody let's see the second option with power pivot so the second option is to load to a data model but we don't have to build any relationship just to use the power pivot function and that's what I recommend so right click load two only create connection add this to the data model I hit okay then I go back to the data tab hit manage data model and here we go our table has been loaded to the data model and we can use the par pivot feature then if you want you can use Dex to add additional calculation but we are not going to do any calculation this time I just simply select pivot table pivot table existing worksheet A1 that's fine I hit okay I expand my data source and here we go I can start building my pivot table using PDF data sources let's say filter is going to be the file name invoice number can go to the rows and let's say invoice date is going to be a column and item name is going to be under rows again quantity sold unit price amount that depends on what type of table I want to build let's say I want to see the total value what was sold for my company I move it under the values field and from here I can make it more fancy I go back to the design tab I do not need the subtitels I don't recommend them to use I turn off the granted HS and the report layout I prefer is the tabular one I go back to the view tab remove grid lines and from here I can start analyzing my data if you don't like this layout you can still format as you wish I can go back to the design tab hit report layout and repeat all item labels if you want to see the actual value of the invoices and I can pull back the subal at the bottom then I just right click on the invoice number field I select expand collapse then I'm going to select collapse entire field I hit enter and here we go I can see the total value of those invoices by date
Info
Channel: Office Universe
Views: 7,462
Rating: undefined out of 5
Keywords: excel power query advanced, import to excel, pivot table, office 365, power query tutorials, excel for beginners, best excel tutorial, pivot table tricks, free course, learn excel, how to, excel dashboard, excel features, latest excel features, data visualisation, excel pro, pivot slicer, excel table, connect, what is, counting, finance, data analyst, scrum masters, excel job interview, boss, what if, extract data, pivot, invoices, excel for analysts, pdf to excel
Id: l7YyNHhEa_8
Channel Id: undefined
Length: 14min 57sec (897 seconds)
Published: Tue Nov 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.