Extract Data from PDF 2 Excel | Power Query | Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys this is akilesh from datasense hope you are doing fine today we are going to learn how to extract data from a pdf to excel so this is the pdf which we are having and you can see on the first page there is a small table summary table plus some other text headers and some text and in the next page we have a detailed table behind the data set and this is what we want to extract so what normally people will do normally people will copy paste this and then ctrl c then they will go to an excel and they will try to paste it here and what happens that everything comes in one single cell sometimes it may come in multiple columns and rows also but it is not workable you cannot create pivot tables on this data you cannot write functions on this data so what is the best way to do there is another option which we have is that in the data tab of excel 2016 or 2019 or in the power query section this section is known as power query and power query helps us to extract data from various sources and pdf is one of them so when we go to get data and from file section you can find from pdf here so click on that and it will try to ask you that where is the pdf which you want me to connect so this is the pdf which i want to connect i will click on import and once it is able to connect it will show me different sections from that pdf like here it is able to identify certain tables directly in the page 2 also there is another table or detail table and in the page 3 another table which is the remaining part of our detail table page 1 is having everything whatever comments for their additional text page 2 only that data and page 3 also was having some comment at the bottom so it is just showing you a sample okay let's say if you want to connect only the summary table so i can go to this table 0 0 1 because that is what is showing the summary and i can click on load okay and now this will be loaded into one of my sheet as simple as that and it is a very clean data no problem at all and tomorrow if your pdf changes the data this will also be updated by right clicking and refresh now let us say we don't want to go for the summary table we want to go for the detail table which is splitted into two pages so how to extract that one so i will start from data tab again get data from file and from pdf and this time so this is the pdf click on import now this time let me take any table okay and i will not click on load i will click on transform data and i know that my target table is in table 2 and table 3. so i will click on transform data so i clicked on one of the table there okay and this is what we got on the right side you might see that there are certain steps which power query has applied automatically so when i click on source you can see this is listing all the components which it was able to extract page one table one table two table three like that okay and here's the data behind them and all the three steps all the further three steps are keeping in mind that you want to extract only the summary table so what i will do i will right click on this second step and i will delete all the steps until end so this will get rid of all the three steps after that okay so i am starting from source which is able to connect to the pdf and it has given me these six actions using this third column i can say that i want to see only tables and i can also say that i want to ignore the table 1 because i know that that is related to the summary table and i am interested in the detail table and please note that this method will work even if your detailed table is having 10 pages of data okay now this is the column which i am interested because data is here these are all metadata informations about those about those tables so what i will do i will right click here and remove all other columns now i will try to expand this data column using these double arrows i will uncheck this option and it will give me this data set the first line is my header and then remaining is my data set your page 2 and page 3 data set is combined together here so first line is our headers so what i will do i will go to home tab and here in the home tab you have use first row as headers so first row will be taken to headers and now you can see data columns are there and these are the three numeric columns in the data set if you want to change the data type of any column you can just click on this prefix and say what kind of column it is and that's my data set this will be the name of the output table so what i will do and i will call it as detailed data [Music] now once you are done you can click on close and load and close and load to and this will give you some options where to put the data set which you have right now so table means it will create another sheet in the current excel file you can create a pivot chart or a pivot table also let us go for a new worksheet when i click ok you will see a third sheet has been created and it is pulling the data together and your data set is going through all those steps which you have written in the background and this is what you get which is quite a clean version of your pdf file so these steps we can use to extract data from any pdf which is digitally generated please note that if you have scanned some documents and you want to extract data from that it might be very difficult so your pdf should be digitally generated that means that your excel file or word file someone might have saved it as pdf then you can extract it otherwise it won't be possible for power query to extract this data hope you found it useful please share your feedback in the comment section like the video and if you have not subscribed to us please do so thank you very much see you in the next video
Info
Channel: DataCense
Views: 3,784
Rating: undefined out of 5
Keywords: advanced excel tricks, excel online course, excel tips and tricks, excel for analysts, microsoft excel tutorials, microsoft excel, microsoft 365, excel 365, get and transform in excel, convert pdf to excel, import pdf to excel, pdf to excel, extract data from pdf to excel, power query, excel power query, power query import from pdf, properly convert pdf to excel, convert pdf to excel keep formatting, datacense
Id: Pi3ObLfuJX4
Channel Id: undefined
Length: 6min 44sec (404 seconds)
Published: Sun Mar 20 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.