Seamless PDF Embedding In Power BI: Elevate Your Reporting Game | NextGen BI Guru

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hi guys, welcome to the first episode of Next  Generation BI Guru. have you ever wondered whether   it's possible to view PDFs within PowerBI. Well you are  in for a treat because the answer is resounding yes, and this is exactly what we're gonna do in  this episode. Embbeding PDFs within your dashboard   can prove extremely useful. You can utilize them  as references for some of the matrices you're   displaying in your dashboard. Additionally as  we'll explore in this episode, you can create   a library of PDFs within PowerBI, this not only  facilitates easy navigation but also simplifies   share since all the contents will be stored  in one PowerBI file. In the first part of this episode, I'll demonstrate to you how to convert a PDF file into a format that can be consumed   by PowerBI, then we'll utilize PDF viewer  to visualize them in PowerBI.    In the second part of the episode I'll guide you through  automating this process using Python. we'll  process a library of PDFs and import them to  PowerBI. If you have no prior experience with   Python don't worry, I've got you covered. I'll walk you through the setup process   and explain the code for you. Furthermore, I'll  share all the contents covered in this episode   including Python code and PowerBI file in the link  in the description. having that said, there's nothing else to do, but to pause this video for a moment, grab a cup of coffee and let's get started! Okay, here we go! We need to convert a PDF file, one of those PDF files into a format that can   be consumed by PowerBI, so let's for example take  this one. The format we need to work with in   PowerBI is a binary format that's called base64. You don't need to know much about it except that this   is what the viewer will accept within PowerBI. So let's take one of those. We will use an online tool   called ASPOSE, you can find  the link on the top also you'll   find in the references in the  description. All you need to do... is drag and drop the file and then select from  the format, make sure it's base64 and hit convert.  Couple of seconds... here you go you have your file ready to download. Click download... it will come as text file. Now we have the base64 code ready  let's keep it here... Open Excel... and we're ready to paste it here. You don't want to use the import from data for a reason   I'll explain in a minute, just keep it as  is. Go to the text file and copy them all.  Select all control+A, control+C. You have everything selected. And here paste the text.  Once you paste it, you'll notice that it doesn't only come in one cell it actually covers 15 cells   and the reason why is, because we have a limit for the number of characters you can fit in one Excel   cell. I think the limit is ~32,700 something, so obviously we have character numbers much   bigger than this, for this reason Excel will divide them into 15 cells. Now if you attempt   to do it through data import, from here it will only cover up to the limit and leave the   rest behind. This is something we don't want, we need the whole code to be there. Now in   PowerBI we don't like this format, we want to convert it to columns format because we want   to show all of those as one record. For this  reason, what we need to do as a next step is... again go back home... copy...  put it in a new sheet... paste...   transpose. It will convert the rows into  columns, and it will fit them in the 15 columns. Now for simplicity we will add the headers here.  so let's call it from B0 to B14... drag this on... that's it! the CSV file is  ready, save it CSV file. okay, that's it! Now let's move to PowerBI. There's  something I like to do to save some time when   starting with a new dashboard, is preparing the  background somewhere else and have it ready   to be imported, so that will save me some time starting with the dashboard. Usually I do it inside   PowerPoint. You can match the size of of the slide here to your dashboard... from slide size you can do   custom slide size from here. Change the pixels as  you know and from PowerBI, you can do the same from the   canvas settings. You can also have custom settings  here, make sure they're the same back to PowerBI   save as... browse... now for the format try not  to use PNG or JPEG format because those   formats don't give you very good quality  for the background when imported to PowerBI, use SVG... save... just this one... that's  it! Going back to PowerBI   will go to canvas background... browse... desktop. You  can find the exported slide here. Import it... make   sure the transparency is zero... fit format.... here you go, you have the background ready. Now, this PDF viewer you can find it in  the marketplace, just... from get more visuals just type in the search PDF and you will find  this PDF viewer. This is what I usually use. Add it, and there you go! I already imported it before, I replaced it and here  it is, now drag it to the canvas. Okay! now we have the PDF viewer imported, the  next step is to import the CSV file from get data...  text/CSV option... again go to the desktop... this  is the file we created a minute ago, bring it... we go for transform data... so from transform...  we use the first row as headers. that's it! the CSV comes with all the fields as you  see. Now here's one important thing to   do, we need to create a calculated column  that will merge all those columns. I know   as a first thought you might think  why don't I do from query editor but guys, just like Excel the query editor only  accepts ~32,700 something in each cell as a number   of characters limit. So if you do merge for all those, you will lose all the content except for   the first cell. So we need to do it as a calculated  column here. And just to make our life a bit easier   instead of creating the calculated column and  keep adding fields, we can automate this in Excel   with a quick formula. can say here the name of  the column... drag it over... we have all the columns. Then we do one simple formula and  drag it over. Something like this. Boom! We have our cell ready. Copy... go  here... right-click... new column... PDF content...  paste the formula... hit enter... and your  field is ready. Notice here we have so   many unwanted fields that we're not gonna  use, you can also select all of them here   hold the shift and boom! just hide them. Now  we see that they're hidden from here. We have   only the calculated column. just select  the PDF viewer drag it, and that's it! You have your PDF inside PowerBI. This PDF viewer  allows you to navigate from one page to another  here, if you have many pages you can navigate from  here, also it allows you to zoom-in and zoom-out. If you don't want to see this control here, from  properties you can hide the control header. that's it  guys! here you go, you have your PDF inside  PowerBI. Now let's move to the second Part.   In the first part, we tried to process one PDF file and  import it into PowerBI, now what if we have so   many files. Imagine you have like hundred files  sitting in a library in a Shared Drive. what you   want to do, you want to be able to process them  all together into base64 format and bring them   to PowerBI, right? so this is exactly what we're  gonna do in this part. But for the sake of this   tutorial, we'll only process those files. Now the  first line of business here is to install Python.  Go to python.org/downloads and you'll find a  download link for the latest version of Python.  Here, just click it and it comes to your  downloads or whatever folder you choose.   Double-click... because I have it already installed,  I can modify it here... just keep the default   settings don't try to change anything, this is the  development environment where we'll be writing the   codes and run them, just go next and install. Yeah,  it's as simple as this. Now you have your Python   installed. You can check from the start, try to open  the development environment here, you'll find it. That's it guys! We're ready now, let's go to the  code itself. I have already prepared the code for   you. As I said earlier, you'll find it in the link  in the description so I'll just open the code file. Of course, if you want to start from scratch,  you need to use a new file but I already have it ready, so let's open it... there it is... and this  is our code, I'll go quickly through this code.   It's very simple, straightforward but it does the  job. First thing, we need to import few libraries   but don't worry about those libraries, they all  are native libraries for Python. So whenever you   install Python, they will come with the package.  I created a function to be able to call it later.  This function will be processing the PDFs with. This function requires tow inputs, the directory and   the file name for the PDFs. As we go through this  function the first thing, it opens the PDF file from   the directory and extract the contents as base64  then there's another string strip the code from   the binary reference, this is needed because we'll  be dividing the code into chunks of 32,000.   So we need to make sure whenever we merge them  there's no extra string in the code itself. So, don't worry about it just make sure to use it. This  directory will contain the header similar to the   ones created in the first part from B0 to B14. Also we added a source to know the name of the   PDF we're getting the contents from. Here we Define  the output CSV file directory, we'll put the same   directory and name it PDFs. This line here creates  the CSV file in the directory and add the headers   to it, the headers we we stored in a dictionary  above and as you can see here we open the CSV   file for writing. The next part starts adding  the base64 code into the CSV file and it opens   the CSV file for (Append) just to make sure that we  don't overwrite the file each time it runs. And we   add iterative function to divide the string into  chunks of 32,000 characters and save them   in different cells. That's it! This is the function  the last part of the code we call this function   and we iterate through the files in the directory  and call the function to process the PDF files   Here we feed in the directory we're gonna use. By  the way, remember to add double backslashes to the   directory otherwise Python will not understand it  as a path, I left a comment, I also commented the   whole code for you. And here we iterate through  the directory provided above, list all the files...  find all the PDFs and get the file name, then call  the function we defined above and process the PDFs   there. That's pretty much it! all we have to do now  is to run this code... done! how easy was that! this   is the file here, the CSV file. If you open it just  to have a look you will find it's already created   all what we need for us. Added a PDF name and  created all the contents in 32,000 chunks. We   added 30 fields, just in case we have bigger files  but I think we don't need all of that. Maybe here we used only 26 cells. I deleted the extra ones we  don't need them and just save. of course you don't   have to delete them, this is just an extra  step. We're done here. Now let's go back PowerBI create another page... just like before I'll  import the background we created in the   first part... convert to fit... transparency, move it  to zero and we are ready now to bring the viewer. Now let's get data from the CSV file just like before... transform data... and from transfer use the first row as  headers. Here the code created an extra   blank row for us, so let's just remove it.  Here you have your data ready, download them... We will use our little trick to create  the calculated column, I'll go back to   the same Excel sheet we have. We need to drag this to have 25 fields or 26 fields... right-click... new column... PDFs contents...  paste... you have your calculated column ready.   And we'll go back to the relationships and hide  The unwanted fields. Select all of them... hide... done! Now you have your merged contents ready. Insert  it to the PDF viewer. Notice when we added   the contents, the viewer doesn't display any  of the PDFs because this visual allows you to   see one PDF at a time. So what we need  to do, we need to have an extra slicer   that allows us to navigate through files. Let's  have it here we'll use the source as a slicer   field and that's it now you can see all the PDFs  came and you can navigate through them easily as expected... and that's pretty much it. As you witnessed we managed to utilize free tools   like the online converter, Python and  PowerBI to incorporate PDFs into PowerBI.   if you found this episode enjoyable, hit the like  button and if you're new subscribe to the channel   for more unique contents. Also I highly appreciate  your feedback so please leave a comment. Remember I   included all the contents covered in this episode  in a link in the description.   Hang on everyone, this is just the beginning now we learned  how to print PDFs into PowerBI, can we actually   perform searches on the contents of the PDFs  within powerBI? sounds promising doesn't it? tune  in for the next episode to discover more. Although  as a hint, the answer is a resounding yes!... Peace.
Info
Channel: NextGen BI Guru
Views: 8,061
Rating: undefined out of 5
Keywords: Seamless PDF Embedding In Power BI, Elevate Your Reporting Game, NextGen BI Guru, Game, Reporting, Seamless, Embedding PDFs In Power BI, Power BI PDF Viewer, Convert PDF For Power BI, Automate PDF Embedding With Python, Power BI Dashboard Enhancement, Power BI Reporting Tips, Power BI Visualization Techniques, Power BI Python Integration, Power BI Advanced Features, Dynamic Power BI Dashboards, Power BI Interactive Reports, Power BI Dashboard Customization, NgB&x%
Id: AU3rthIFLOE
Channel Id: undefined
Length: 17min 35sec (1055 seconds)
Published: Wed May 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.