Would you like a free tool to extract data from PDF to Excel? There are PDF-Excel converters
that can get the task done. You can also use copy and paste if you don't have a lot of data. Now, Power Query has a new feature where you can import
data from a PDF document but does it really work though? Does your important information
update when the content of the PDF document changes
and you refresh your data? Let's find out. (upbeat music) We're going to look at
three different examples to see if we can
successfully import our data from PDF to Excel. In the first example, I have this safety report
and on the second page, I have the safety performance data. So let's say I get this
from the safety department. I need to grab this table
and put it in my Excel file so you can do further analysis on this. Well, my first try would be
to just copy and paste this. I'm just going to highlight,
press Control + C to copy, go to Excel and press Control + V. Now, this brings over the information, but it loses all the formatting. So if I had the time, I can organize this into a table again, but unfortunately I don't have the time. So this is what I'm going to do instead. Go to Data, Get Data, From File, From PDF. I want my Safety Data, click on Import. In the Navigator view, I get to select what part of the PDF document I want to import. If it recognizes any tables, it's going to show it to me as table. On the bottom, below the tables, you're going to see the
content of the page. So that can be a table, but together with everything
else that's on that page. So notice on page two where I have my table
in the page view here, I also see the page numbers. So in this case, since I just want the table information, I'm going to go with the table view, click on Transform Data just to make sure everything looks good. I have headers here. This one is missing. Let's also give this report a name. Everything else looks good. Let's send this to our Excel sheet. I wanted as a table and I want
it on the existing worksheet and I get my table in
the proper structure. I can update the formatting of this. I can use this to create pivot tables or use formulas to do further analysis. Now, we also want to check
whether this is dynamic. So let's say I receive new information. That information is in
another PDF document. It has the same name,
it's called Safety Data. But this time it also
includes 2020 information. So I'm just going to drag it
and drop it into my drive. And I'm going to replace
the file in the destination. So now let's go back to our reports. Right mouse click and refresh this to see if we pull over the new information and our 2020 information
comes in automatically. So we've created a connection
to our PDF document. And as long as the structure of our PDF document doesn't change, everything remains dynamic. In the second example, I have this report from
the Tesla web page, it's their Q2 2020 financial update. Now let's say I'm interested
in grabbing this table, which is on page four and I
want to bring it to Excel. So here I can see their
quarterly financial information. As a first try, let's
just try to copy this. So Control + C, go to Excel, and let's do Control + V and
see if it works this time. Unfortunately, it doesn't work. It's going to be very difficult
to work with this data. So instead, let's go to Data,
Get Data, From File, From PDF. That's my PDF. Let's import. This time, I get a lot of
tables and a lot of pages. That's the number of pages
I have in my PDF file. Now, whenever you can just work with the table view, this way you can avoid
all the other information that's on the page. So the table I wanted was on page four. Now, it doesn't look so great anymore. So the quarterly information is here, but it looks like my first column, the text column, was split
into multiple columns. So let's take a look at whether we can easily transform
this to get what we want. It's recognized that
these are number columns. So it's formatted them as decimal type. These last two, they're formatted as text because they have a mix
of numbers and text. So if I want to, I can clean these up or I can get rid of them completely if I don't want them in my report or I can just keep them as they are. Now, I'll just leave the rest as is. But what I want to clean
up are these columns here. I want to get them to
just become one column. So I'm going to highlight
the first column. Hold down the Shift key,
highlight this column five here. Now let's just merge these, right mouse click, Merge Columns. My delimiter should be a space. Call this $ in millions
aside from % values. This is something similar
they had in their report, & millions except percentages
and per share data. Okay, so I'll just keep it like that. Click on OK. And now we have one column. When you use transform to merge columns, you get these additional
spaces in your merged column. Now, you don't get that if we add a column and
merge these together, but it's no problem. What I can do is just to trim this. Right mouse click, Transform, Trim, and we get rid of these extra spaces. What I'm also going to do
is get rid of these blanks. So let's go ahead and filter these out. Now, here's something interesting. Notice the negative numbers here. Let's take a look at our PDF report. The negative numbers are in brackets. So Power Query recognized
these as negative numbers instead of putting them as text, which is great because it saves us a lot of additional steps. My report is done. Let's give this query a name and let's send this to the workbook. Close and load. And we've successfully brought the table from the PDF document to our Excel file. In the third example, I want
to do something different. I want to grab the table
of contents from a manual. Now, this manual has over 200 pages and the table of contents starts on page three and goes to page five. This is the part of the
manual I want to grab and put into Excel. So let's see how we can do that. Let's go to our Excel,
go back to the Data tab, Get Data, From File, From PDF. That's my manual, let's import it. Now, the bigger the PDF file, the longer it's going to
take for it to analyze it, but check this out. Even though this has over 200 pages, it has a lot of tables,
it doesn't take that long. It's already assessed where it has tables and down here, we can see the contents of all the pages. My table of contents was on page three. So it's recognized that as a table. Then page four and page five was the rest. So these are the three
that I want to import. Now, I have the option
to select multiple items, but when I do it this way, it's going to create a separate
query for each of these. And then I can append these. Now, instead of doing it this way, I'm going to select the folder. Right mouse click and transform the data. This way I can do the
appending with one click. What I want are tables.
So this was what I wanted. Then it was this and this. Kind for these is table. So let's add a filter and just
take a look at our tables. Now I have my table of
contents right here. These are in the first three rows. So let's go and keep the top rows and put a three here and OK. Now when I click to the side here, I can see the content of each of these. What I don't need anymore are these three. I'm going to highlight and
delete and now expand my data. Let's click on OK. And I get my table of
contents appended together. Now I can clean this as I want. Let's say I don't need
these null values here where I have the section
number and the section name. So I can filter these out. What I can also do is
create a separate column for my numbering here. The delimiter to use for
this can be this right here. But notice I have it multiple
times for some of the topics. This means I need to be careful and not apply this
delimiter for each time. So let's select the column
Split Column, By Delimiter. I wanted custom and I copied it before, so I'm going to paste it in. I'm also going to add a
space before and after, because that's what I have in each field. So here's the change I need to make. I don't want each
occurrence of the delimiter. I want the leftmost
delimiter and click on OK. So now I have my numbers separately. Let's give this a proper name. This is Topic, and these
are my page numbers. Let's also update the formatting. So this should be a whole number. This is text and that's a decimal number. Everything else looks good.
Let's send this to Excel. I wanted it as a table,
let's put it in A1 and OK. It's loading everything right here. And I have my table of contents together with the page numbers. So these are the different
ways you can import data from your PDF documents into Excel. I hope you enjoyed this video. If you did, don't forget
to hit that thumbs up. And if you aren't
subscribed to this channel, consider subscribing before you leave. And I'm going to see
you in the next video. (upbeat music)
Never even heard of this! Thank you so much for your contribution! This is going to save me a million hours and will give me the time I deserve to sit back and kick my feet up!
I thought you were talking about her in which case I was inclined to agree with you
Where are the instructions for my camera?
This was very helpful! Thank you so much for posting :)