Hi guys, welcome to the first episode of Next
Generation BI Guru. have you ever wondered whether it's possible to view PDFs within PowerBI. Well you are
in for a treat because the answer is resounding yes, and this is exactly what we're gonna do in
this episode. Embbeding PDFs within your dashboard can prove extremely useful. You can utilize them
as references for some of the matrices you're displaying in your dashboard. Additionally as
we'll explore in this episode, you can create a library of PDFs within PowerBI, this not only
facilitates easy navigation but also simplifies share since all the contents will be stored
in one PowerBI file. In the first part of this episode, I'll demonstrate to you how to convert a PDF file into a format that can be consumed by PowerBI, then we'll utilize PDF viewer
to visualize them in PowerBI. In the second part of the episode I'll guide you through
automating this process using Python. we'll process a library of PDFs and import them to
PowerBI. If you have no prior experience with Python don't worry, I've got you covered.
I'll walk you through the setup process and explain the code for you. Furthermore, I'll
share all the contents covered in this episode including Python code and PowerBI file in the link
in the description. having that said, there's nothing else to do, but to pause this video for a moment, grab a cup of coffee and let's get started! Okay, here we go! We need to convert a PDF file, one of those PDF files into a format that can be consumed by PowerBI, so let's for example take
this one. The format we need to work with in PowerBI is a binary format that's called base64. You don't need to know much about it except that this is what the viewer will accept within PowerBI. So let's take one of those. We will use an online tool called ASPOSE, you can find
the link on the top also you'll find in the references in the
description. All you need to do... is drag and drop the file and then select from
the format, make sure it's base64 and hit convert. Couple of seconds... here you go you have your file ready to download.
Click download... it will come as text file. Now we have the base64 code ready
let's keep it here... Open Excel... and we're ready to paste it here. You don't want to use the import from data for a reason I'll explain in a minute, just keep it as
is. Go to the text file and copy them all. Select all control+A, control+C. You have everything selected. And here paste the text. Once you paste it, you'll notice that it doesn't only come in one cell it actually covers 15 cells and the reason why is, because we have a limit for the number of characters you can fit in one Excel cell. I think the limit is ~32,700 something, so obviously we have character numbers much bigger than this, for this reason Excel will divide them into 15 cells. Now if you attempt to do it through data import, from here it will only cover up to the limit and leave the rest behind. This is something we don't want, we need the whole code to be there. Now in PowerBI we don't like this format, we want to convert it to columns format because we want to show all of those as one record. For this
reason, what we need to do as a next step is... again go back home... copy...
put it in a new sheet... paste... transpose. It will convert the rows into
columns, and it will fit them in the 15 columns. Now for simplicity we will add the headers here.
so let's call it from B0 to B14... drag this on... that's it! the CSV file is
ready, save it CSV file. okay, that's it! Now let's move to PowerBI. There's
something I like to do to save some time when starting with a new dashboard, is preparing the
background somewhere else and have it ready to be imported, so that will save me some time starting with the dashboard. Usually I do it inside PowerPoint. You can match the size of of the slide
here to your dashboard... from slide size you can do custom slide size from here. Change the pixels as
you know and from PowerBI, you can do the same from the canvas settings. You can also have custom settings
here, make sure they're the same back to PowerBI save as... browse... now for the format try not
to use PNG or JPEG format because those formats don't give you very good quality
for the background when imported to PowerBI, use SVG... save... just this one... that's
it! Going back to PowerBI will go to canvas background... browse... desktop. You
can find the exported slide here. Import it... make sure the transparency is zero... fit format....
here you go, you have the background ready. Now, this PDF viewer you can find it in
the marketplace, just... from get more visuals just type in the search PDF and you will find
this PDF viewer. This is what I usually use. Add it, and there you go! I already imported it before, I replaced it and here
it is, now drag it to the canvas. Okay! now we have the PDF viewer imported, the
next step is to import the CSV file from get data... text/CSV option... again go to the desktop... this
is the file we created a minute ago, bring it... we go for transform data... so from transform...
we use the first row as headers. that's it! the CSV comes with all the fields as you
see. Now here's one important thing to do, we need to create a calculated column
that will merge all those columns. I know as a first thought you might think
why don't I do from query editor but guys, just like Excel the query editor only
accepts ~32,700 something in each cell as a number of characters limit. So if you do merge for all
those, you will lose all the content except for the first cell. So we need to do it as a calculated
column here. And just to make our life a bit easier instead of creating the calculated column and
keep adding fields, we can automate this in Excel with a quick formula. can say here the name of
the column... drag it over... we have all the columns. Then we do one simple formula and
drag it over. Something like this. Boom! We have our cell ready. Copy... go
here... right-click... new column... PDF content... paste the formula... hit enter... and your
field is ready. Notice here we have so many unwanted fields that we're not gonna
use, you can also select all of them here hold the shift and boom! just hide them. Now
we see that they're hidden from here. We have only the calculated column. just select
the PDF viewer drag it, and that's it! You have your PDF inside PowerBI. This PDF viewer
allows you to navigate from one page to another here, if you have many pages you can navigate from
here, also it allows you to zoom-in and zoom-out. If you don't want to see this control here, from
properties you can hide the control header. that's it guys! here you go, you have your PDF inside
PowerBI. Now let's move to the second Part. In the first part, we tried to process one PDF file and
import it into PowerBI, now what if we have so many files. Imagine you have like hundred files
sitting in a library in a Shared Drive. what you want to do, you want to be able to process them
all together into base64 format and bring them to PowerBI, right? so this is exactly what we're
gonna do in this part. But for the sake of this tutorial, we'll only process those files. Now the
first line of business here is to install Python. Go to python.org/downloads and you'll find a
download link for the latest version of Python. Here, just click it and it comes to your
downloads or whatever folder you choose. Double-click... because I have it already installed,
I can modify it here... just keep the default settings don't try to change anything, this is the
development environment where we'll be writing the codes and run them, just go next and install. Yeah,
it's as simple as this. Now you have your Python installed. You can check from the start, try to open
the development environment here, you'll find it. That's it guys! We're ready now, let's go to the
code itself. I have already prepared the code for you. As I said earlier, you'll find it in the link
in the description so I'll just open the code file. Of course, if you want to start from scratch,
you need to use a new file but I already have it ready, so let's open it... there it is... and this
is our code, I'll go quickly through this code. It's very simple, straightforward but it does the
job. First thing, we need to import few libraries but don't worry about those libraries, they all
are native libraries for Python. So whenever you install Python, they will come with the package.
I created a function to be able to call it later. This function will be processing the PDFs with.
This function requires tow inputs, the directory and the file name for the PDFs. As we go through this
function the first thing, it opens the PDF file from the directory and extract the contents as base64
then there's another string strip the code from the binary reference, this is needed because we'll
be dividing the code into chunks of 32,000. So we need to make sure whenever we merge them
there's no extra string in the code itself. So, don't worry about it just make sure to use it. This
directory will contain the header similar to the ones created in the first part from B0 to B14.
Also we added a source to know the name of the PDF we're getting the contents from. Here we Define
the output CSV file directory, we'll put the same directory and name it PDFs. This line here creates
the CSV file in the directory and add the headers to it, the headers we we stored in a dictionary
above and as you can see here we open the CSV file for writing. The next part starts adding
the base64 code into the CSV file and it opens the CSV file for (Append) just to make sure that we
don't overwrite the file each time it runs. And we add iterative function to divide the string into
chunks of 32,000 characters and save them in different cells. That's it! This is the function
the last part of the code we call this function and we iterate through the files in the directory
and call the function to process the PDF files Here we feed in the directory we're gonna use. By
the way, remember to add double backslashes to the directory otherwise Python will not understand it
as a path, I left a comment, I also commented the whole code for you. And here we iterate through
the directory provided above, list all the files... find all the PDFs and get the file name, then call
the function we defined above and process the PDFs there. That's pretty much it! all we have to do now
is to run this code... done! how easy was that! this is the file here, the CSV file. If you open it just
to have a look you will find it's already created all what we need for us. Added a PDF name and
created all the contents in 32,000 chunks. We added 30 fields, just in case we have bigger files
but I think we don't need all of that. Maybe here we used only 26 cells. I deleted the extra ones we
don't need them and just save. of course you don't have to delete them, this is just an extra
step. We're done here. Now let's go back PowerBI create another page... just like before I'll
import the background we created in the first part... convert to fit... transparency, move it
to zero and we are ready now to bring the viewer. Now let's get data from the CSV file just like before... transform data... and from transfer use the first row as
headers. Here the code created an extra blank row for us, so let's just remove it.
Here you have your data ready, download them... We will use our little trick to create
the calculated column, I'll go back to the same Excel sheet we have. We need to drag this to have 25 fields or 26 fields... right-click... new column... PDFs contents...
paste... you have your calculated column ready. And we'll go back to the relationships and hide
The unwanted fields. Select all of them... hide... done! Now you have your merged contents ready. Insert
it to the PDF viewer. Notice when we added the contents, the viewer doesn't display any
of the PDFs because this visual allows you to see one PDF at a time. So what we need
to do, we need to have an extra slicer that allows us to navigate through files. Let's
have it here we'll use the source as a slicer field and that's it now you can see all the PDFs
came and you can navigate through them easily as expected... and that's pretty much it. As you witnessed we managed to utilize free tools like the online converter, Python and
PowerBI to incorporate PDFs into PowerBI. if you found this episode enjoyable, hit the like
button and if you're new subscribe to the channel for more unique contents. Also I highly appreciate
your feedback so please leave a comment. Remember I included all the contents covered in this episode
in a link in the description. Hang on everyone, this is just the beginning now we learned
how to print PDFs into PowerBI, can we actually perform searches on the contents of the PDFs
within powerBI? sounds promising doesn't it? tune in for the next episode to discover more. Although
as a hint, the answer is a resounding yes!... Peace.