How to Generate an Analytics Report (pdf) in Python!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's up everyone and welcome back to another video so before we get into this video i kind of want to set the scene so as a data scientist one of our biggest roles is often to communicate insights about data to business stakeholders and clients and oftentimes these people might not be the most tech savvy oftentimes it might not make sense to send these people you know large files of code or even jupiter notebooks but instead it would make a lot more sense to send them some sort of analytics report that maybe captures all of our visualizations in a nice pdf format in prepping for this video i kind of looked up something along the lines of like how to create an analytics report in python and i didn't find much so as kind of a classic engineer i decided to take this into my own hands and figure out a solution that i thought worked pretty well for the task so this is going to be a two-part series in this first video we are going to learn how we can create a pdf analytics report in python and for this report we're going to be using covid 19 data from around the globe to create all sorts of visualizations on and then ultimately package those up in a nice pdf using the python fpdf library in the second video of this series we're going to actually learn how we can schedule these scripts to run automatically using aws lambda and cloudwatch and then actually send the reports that we create via email to any contacts of our choosing before we get started i want to give a quick shout out to this video's sponsor and that is skillshare skillshare is an online learning community with thousands of classes for creative and curious people they have classes on topics that range from photography to marketing to creative writing and all of these classes are kind of oriented around real life situations so you can really take away skills and apply them in your day-to-day life some topics that i think are particularly exciting for all of you would be classes that they offer on business analytics web development as well as freelance and entrepreneurship recently i decided i wanted to kind of level up my instagram game so i took a course by brandon wolfel called instagram worthy photography shoot edit and share with brandon wolfel and from that class took away a lot of different information on the camera settings that i should use as well as kind of how to light my subjects in different situations and i hope to apply this to my own instagram very soon it's exciting how many different classes that skillshare offers and just the breadth of all the different material that you can learn if you're interested in joining membership costs less than 10 a month when you sign up with an annual subscription also the first 1000 people to use the link in the description will get a free trial to skillshare's premium membership all right let's get into the video all right the first thing we're going to want to do is to navigate to github.com keith galley slash generate analytics report and i'll make sure to link this in the description once you're there you can kind of look around a little bit ultimately we'll be creating an analytics report that looks like this which i think looks really cool i also left some setup information here but what i recommend is getting all these files locally so to do that we can click on this code button and we can copy this link a lot of you will probably use https to clone this but i'm going to use ssh and if you don't know how to do a git clone you can always download the zip folder here so next once we've copied that link you should open up a terminal window and if you're using mac or linux play your default terminal is fine on windows get bash works well windows terminal works well and i'm using a program called commander so i'm going to do a git clone and then paste in that link i just copied all right and then i'm going to cd into that new folder we just cloned so i can cd into generate analytics report and if i do ls we can see all the files there so what i recommend at this point is running the generate report file and if this works then great your setup is complete but if it doesn't we'll want to just make sure to install some libraries so we see we don't have the fpdf library so i'm going to do pip install fpdf looks like that is good i'm going to try running that file again no module named pandas so there's a bunch of other um libraries that we need to install but to speed things up i'm going to just install it in one go so pip install pandas then i'm going to do space numpy space matplot lib space plotlay space collider i think these are the last libraries that we'll need but how i would kind of go about this is anytime an error message popped up i would just install if i didn't know ahead of time what i needed all right everything looks to be installed so i'm going to do clear to just clean up my terminal and then i'm going to run generate report again and just see if things work i see i'm getting a numpy issue i did run into this earlier today and i found out by you know copying this url right here that something is going on with numpy version 1.193 so i was able to solve it by doing 1.193 as my installation or there's an issue with 1.194 so downgrading one minor version fix this for me not sure if you'll run into this one now let's try it one more time this script will take i think around 30 seconds to run if i'm not mistaken so just give it time if it's taken a sec we see that it has executed so we're going to go ahead i'm going to bring in my folder i'm going to click on generate analytics report we see right now it is exactly 12 19 p.m and i see that a new report was created then so that looks like everything executed properly i can double click on this and ultimately see my analytics report looks really cool and note for testing purposes i have some hard-coded strings that i'll show you that is just set to october 10th by default but we can change this to read current data i guess from the previous day all right now that we have all of our files locally we can go ahead and open up a text editor and start playing around with things so i'll be using sublime text for this i'm going to open the folder and i have it in generate analytics report select folder and we're good here so when we generated the report we ultimately generated this file here so you're welcome to look through this but i think what will probably be easiest is let's just create a new report file from scratch and we can start building up the pieces and seeing what we have available in this repo to kind of build a custom report of our choosing so i'm going to do new file we'll save it as a new report dot py and this will be our blank slate that we can start using this fpdf library to write code so i think it's worthwhile probably to look up the python fpdf library the documentation and see how we can you know really start using things so here we go here is the fpdf library for python you can create pdfs as the name implies using python through this library probably easiest is you know maybe go to the python oh what the heck okay i guess that python 3 page doesn't work but go to the tutorial and just start seeing you know what code examples are here so let's go ahead maybe and just copy this code here just kind of get a proof of concept of creating our first pdf and then we'll start doing more sophisticated stuff so copy and paste that i'll leave the link to the docs for fpdf in the description but right now it looks like we're just writing hello world basically on a fpdf doc so let's go ahead and run this it has finished kind of on the side i will keep open a window so i can easily load this so we just generated two toe one dot pdf hello world look at that that's what we just generated using this little snippet we see here and we can do all sorts of like more interesting stuff i guess here so we could maybe like make this an f string and do hello world or say something like hello my name is name and then we can set the variable name equals keith up here and now if we run this again two two one i'm going to just call this tutorial1 or tutorial.pdf because 2-2 i don't know i'm not feeling not feeling the name nothing wrong with the name but just not feeling it uh all right uh let's run this again open up we're gonna delete two two one tutorial.pdf hello my name is keith we see that we've just like dynamically added our variable into the file that we generated so that is cool all right and so let's start getting more sophisticated in you know figuring out what else we can do going back to the documentation let's just start seeing what other types of things we can do so we can kind of just keep walking our way through this tutorial so i think first off we see that there's a couple different parameters we can pass in when we're creating a pdf and so right here it says that pages are created with a4 format by default which is kind of the standard internationally here in the us we use letter so if i wanted to specify a portrait paper in millimeters that uses the letter dimensions i could do that doing the following p millimeters and then i would type in letter i also could uh you know and i could run that and see we have no errors and that just creates a similar similar tutorial.pdf i could also you know look through the the documentation a little bit further and and click on the fpdf object and see that like i could just use like the the keyword argument to define that i just want it letter so i could also do something like you know format equals letter and i'd run that that would work just fine too i'm going to just keep it with the defaults though for now i just want to show that you can change that so we are going to use those defaults so a4 is what we are using so one thing i think that's good to do is to just make sure you have clear what the millimeter size is for a a4 paper is so we see we have 210 by 297 so width of 210 millimeters height of 297 millimeters so real quick i'm going to just define those as variables here so width equals 210 height equals 297 and it will make sense why i'm doing this in a little bit but i think it's just helpful reference i'm going to just start cleaning this up i'll just keep this as hello world okay so we have the width and the height we know that by default we're using a4 we have just some basic text on the screen what else can we do so i think one of the next things that might be useful to do is actually include an image in our pdf so text is cool but oftentimes i think images is what really makes our pdf more like a report so let's go back to our documentation and let's just look for image somewhere in here so we see image down here and how do we use the image well we see we first pass in a name and then it looks like x-coordinate y-coordinate width height etc so that's pretty straightforward one thing to note is that width or the x and y start at the top left of the page so positive values kind of bring you down and to the right as you add those so we're gonna do pdf.image that will be the function then we're gonna need to pass in a file that we wanna display well we don't have any files yet so this is a great opportunity to see what kind of functions are baked into this repo that we cloned uh probably the easiest place to start is to look at the daily counts file that has two functions that are useful plot daily count states and plot daily count countries if you want to play around and test out what you can do with this if you go to the bottom of the file there's a main function that you can run so if i run this we see we get some just you know simple bar charts represent different things so you can see how those are generated through this sorry the bottom of this file all right so let's import these functions to our new report so i'm going to go ahead and do from daily counts we want to import uh plot daily count states and probably plot daily count countries should be useful and i might just denote some of these up here these would be the the local imports and then these are the python libraries we need so right ahead we can go ahead and do something like plot daily count states and it'll pass in how about new hampshire and massachusetts so i currently live in boston massachusetts and i'm originally from new hampshire so that's why i care about these two different spots and we'll plot it for the last or i guess the day we'll just leave blank so it just uses the whatever's pre-configured there and we'll save this as filename equals test.png cool and then all we have to do here is just do test.png so we're using this function that plots the bar chart for states let's see how this works looks like it ran the true test though is to check what's in tutorial and we see you know the bar chart is there but it's not positioned well so we need to fix that how can we do that well as we saw with the documentation there's these x y coordinates as well as the width that we can specify so we really want to take advantage of these we know our width of our paper is 210 millimeters so we'll use that to our advantage so let's just start it at x equals zero y equals well our title is at 10 so we want to be below that so maybe we started at like 30 millimeters down and then our width this is what's going to be important here what we could do is just define it as width so that would take up the full page but maybe we don't want it to take up the full full page so we subtract five millimeters off of that and that should probably be pretty good so let's run that again and we open up our tutorial pdf again look at that pretty pretty good uh pretty good look it fills up the entire page almost you could even if you wanted to give it a little bit of an offset so now it pushes this to the right five millimeters and maybe we want this to be a little bit smaller now so i'm gonna make that minus ten run that one more time so we're really just kind of in our heads thinking about the dimensions of the paper and like plotting our image in a way that makes sense given that boundary so i like the looks of this looks pretty good um how about if we wanted to also plot some um countries right next to it so i could do plot daily count countries let's say us and india and in just a minute i'll show you what we can actually pass into both of these functions i wrote some helper methods to help us you know more easily know what we can do and i'll call this filename equals test2.pn have out and then pdf.image test2.png and now we got to think about this ultimately the goal is to have two bar charts side by side so we can play around with the values here to make that happen so to make it easier i'm going to start this at 0 again and i'm going to just say that this is the full width right now but now we want 2 side by side so instead of making this the full width i'm going to do with divided by two and probably to give it a little bit of a boundary but before the next image i'm going to do subtract five millimeters off of that for this next image and we're going to start it right after that so it's going to start it with divided by 2 on the x-coordinate the y-coordinate is going to be the same at 30 and then let's just make it the same exact size so width divided by two minus five and let's see what this looks like so look at that we have two charts right next to each other and if you wanted to it might make sense to it looks like they're a little bit too close to the left side so i might just do a 5 here and a plus five here to just move them both over to the right five millimeters run refresh this looks pretty good to me cool so that's two different charts side by side and really for the report we can just do that with a bunch and just change up the y coordinate as we create more and more charts before i jump into that it is probably worth showing all the different states and countries we can pass into this method as well as the other methods that are defined in time series analysis so in the helper method and also worth mentioning forgot to do this but all this data is coming from a github repo that johns hopkins university maintains uh i'll link this in the description but i'm grabbing all this data from this time series data here and it's all coming from the confirmed cases confirmed global cases confirmed u.s deaths confirmed global deaths now if i click any of these you can view raw ultimately we see all this data here ultimately this url can be read in directly in pandas if we look at the helper method if we uncomment this base path and comment this path ultimately what this load relevant data function can do is read that exact file we were just seeing it can actually read the url and pull in these files so if you ran this script every day it could pull in the updated csv because johns hopkins is maintaining that repo they're updating it daily the reason i have this commented out right now is because i don't think it's necessary to make a bunch of requests to that github file and you know maybe overload if everyone that watches this video is doing that at the same time so i'm using the the files that we've downloaded locally in our data folder for now to load everything all right going back the whole point of this was that in the helper method we have two functions that are get state names and get country names which can help us just see what we could possibly plot here get state names and get helper or country names and i'm gonna just real quick do this at the bottom so print get state names so if you want to use a state method you can pass in all these different states and that's how you'll get the data for these places if you want to pass in countries in the country methods we can just change this to get country names and this is just reading from the csv it's just a little helper method to show us what columns actually exist what so these are all the countries you can look for it doesn't have every country but i think most the countries that track their coven 19 data pretty well will be in this list just a a use something useful as you build out your own reports all right let's keep building out this report and i'll close this up a little bit so we can see better all right so i'm gonna kind of more format this similar more similar more similarly to the report you saw at the start of this video so that is this one so we'll maybe have a separate page for global cases and maybe we'll have like just more comprehensive looks at um line charts and past so and so days and we can also create this cool looking map here as well all right so let's start building something that looks more like that out um instead of doing kelly count countries here i'm going to do a states again i'm going to just copy the same states as before now i'm going to pass this in as states and i'll pass this in as states and you could pass whatever states you wanted here okay and now maybe in this one instead of doing the same exact i'm gonna actually do the death or the death counts so mode equals um in the helper i think there's two different modes so i'm going to import mode from the helper method as well so the mode of this one will be mode.deaths so now we're doing the death counts um for these two states for a specific day run that if i refresh this we see now we have the death count on 10 20 20 for new hampshire and massachusetts here as well and now we can start just keep building out different charts so maybe now we do line charts so the line charts are found in time series analysis so the two methods that are going to be useful are plot states and plot countries so i'll import both of those into our new report method so from time series analysis import plot states and plot countries once again you can use that same helper method i showed previously to see what your options are for the states and countries you can plot all right plot states we'll pass in the same states we will pass in maybe we can pass in the number of days we want to track over so i'm going to say the past week and i'll say that the file name here we'll just kind of go down the list here i'll just call it test3.png and then we'll do a pdf image again i'm going to copy the same one as above test3 now i'm just going to drop down the y to be 100 millimeters we'll let the play around with the spacing but i'm going to see if that works then i'm going to go ahead and copy this and pretty much do the same thing as before this will be test four test four and now this will be the same parameters as the graph that we added here run that what does this look like now oh no what happened oh yikes how did we get landscape here i don't like that i'm so confused oh what the heck how did this f get here i don't think i wanted to do that i don't know how that made it landscape but whatever there we go that looks much better um yeah i might honestly even space this out a little bit further so maybe i'll make this y coordinate 110 for both and i did something again what the heck is happening oh it's me oh maybe is it running i don't know cool that looks good all right and maybe we want to just capture all this stuff in a nice like method so i'm going to call this create report maybe we pass in a file name call this tutorial.pdf as the default name and we'll shift all this in with tab to all be in the create report method all right and we're just building from here um one thing that's useful to know is that we could do a pdf.ad page here and now these two charts that used to be on the same page as these two charts would now be on a different page so this is helpful if you want to break up like different pages to be now this is going to be file name if you want to break up different pages to dif like different pages as different like sections you can use add page so what happens if i run this new or create report run so we see same charts as before now on that new page we have those other two charts and really right now we have all the building blocks we need to build that final report that looks like this so now let's just tile the pieces together and make something that looks exactly like this so the first thing we're going to do is go ahead and take this type of content here and turn it into a title so i kind of want to start with a blank slate so i'm going to actually add another page down here so really our first page is starting here and so the first thing i want to do here is going to be add a title just comment this this commenting this so it's clear all right so how would we create a title well we could do something pretty similar to this with this cell method but one thing that i encourage you all to do is to explore what other types of things you can do with the pi f pdf library and when i was playing around with this earlier i found a couple other methods so cell was one way to write things down on the screen uh on your pdf but there was also this write command as well as a command i saw that was for line breaks so i ultimately will create a title using these two functions so here we go let's define a new function i'll just call this create title it's going to need to take in the same pdf that we're currently editing and what will this look like well we can copy something similar to this where we have arial font bold face 16 that's probably fine i'm gonna just make it tweak it to how i have that report so i had it set to this then what i ultimately did was created a line break of 60 so right now i'm just really copying stuff from this generate report dot py file so i might honestly just go ahead and copy this and you'll see kind of how this plays out in a sec okay here we go so what does this look like so far i'm going to just we have create title so what we're gonna do down here is get rid of this and just call the create title method that passes in the pdf and i'm gonna run this real quick no page open oops i deleted the pdf.ad page here so tutorial.pdf we see with what we did right there we just have a very basic covid analytics report um title there and next let's set it to be whatever day that we're generating the report for so maybe we make a tweak to our function here and have this also taken a day we'll have the title taken not only the pdf but also the day so now we can uncomment these lines and let's just set some sort of day down here now create report will take in the day and what does this look like invalid syntax day equals missing one required positional argument pdf create title day comma pdf that should work runs now i'm going to refresh this now we see that it says covet analytics report and the date and one question you might have is why did we start this title so far down well if you remember the final report we see it has this nice looking letterhead at the top of it so when i was playing around with creating this report i ultimately was not really happy with just having you know images and text i wanted something more to kind of make it more professional more uh you know i guess really more professional so i was like okay how can we put a letterhead at the top of it and realistically we could do this the exact same way we put any of these other images on if we have an image of some letterhead we can just add that using the pdf image method just like any other image so what i ended up doing for this was i used adobe spark to create a nice looking letterhead for use in this report and then ultimately if you go to the resources folder you see that there's a letterhead.png and a letterhead cropped we'll use this letterhead cropped file just because it's smaller and we'll add that to our pdf so we can go ahead and right above the title we will do a pdf image then if we go into resources and it's called letterhead cropped dot png we can grab that we can start it at x equals zero y equals zero and then we'll just give it a width of the entire width of our screen and that should put it on nicely re-run all of this it might take a little bit longer to run now because we're putting on a fairly large image okay tutorial.pdf look at that so with that letterhead cropped we added it to the top and it really you know makes it look more like a professional report would so you could play around with adobe spark and make something of your choosing up here um let's continue onward now we're really just going to start copying and pasting things from the uh report i think the first thing i guess before i copy and paste anything it is worthwhile to see the kind of last images that were are part of this repo that we cloned so in the create case maps file there's plot usa case map and plot global case map so i'm going to import that into our report so from create case maps import plot usa case map and plot global case map and what we're gonna do here is just utilize this plot usa case map at the top of our first page so i'm gonna go ahead and do plot usa case map we'll call it usa cases dot png and we'll do it for the day that we've passed in in our report so day equals day and then we will want to add it to our pdf so pdf.image usa cases dot png and then we will have it be set to 5 offset we'll have it 90 millimeters down on the page so that it doesn't collide with our title down here and we will have it be almost the full width so i'm going to say width we're going to subtract 20 millimeters off of that and this is something you kind of just play with see what that looks like finished all right let's load this up look at that now we have this nice chart a little bit of background on how this chart is ultimately generated because it is interesting to know about the code behind this this is generated using plotly so if we go into create case maps ultimately see that we import plotly express as px and there's this chlor choropleth visualization that you can create that's ultimately can create all sorts of maps so in this case it is creating the usa map and you can play around with like the color here you can play around with the range of the colors that is displaying in that chart so if i look at that 3000 is the max so some of these like the texas and california might be going over 3000 in a day but we're kind of capping off this range color like this if you wanted this to be kind of auto automatically generated you could always just like comment this line out i guess you'd have to move this to a non-commented outline and that would allow you to have a range that's not predefined and i do want to give a quick shout out to when i was generating this map i found this medium post by joe santana vanitch i think i pronounced that fairly okay and this was super super useful in in learning how to take that johns hopkins covid19 data and turning it into a nice looking graph like this and one thing that's really cool about plotly is you can make interactive visualizations like you see in this little gif here but i found this article super useful so i'll link this in the description all right going back to our report really at this point because it's just now tying the pieces together i'm going to just start copying and pasting in the stuff from the completed report file that we've already seen and one thing i will note before i do that is one thing that could be useful is that as we see over here if you look at the files that we have in our directory we see that all these test images that we're saving are kind of populating right around our other files so one thing that can be useful is to like have some sort of temp directory and save these intermediate images that we're using for our report in the temp directory instead of directly in our working path that can just keep things a bit cleaner so i'm going to go ahead and just kind of delete some of these things one thing i also will note that i haven't shown right now but plot states and plot countries also taken an optional end date parameter so we'll want to use that when we are generating the line charts just to make sure they end on the proper date and i guess one final thing i will want to say is that we can using the date time library we can grab the current date so we don't have to have a hard-coded date we can grab the current date by doing the following day equals let's see how i did this before this is something you can always kind of google and you might just have to play around with a bit but what ultimately i did here was i said date time dot today and then i use a time delta method so i'm going to import two different things from date time i'm going to import from date time import date time and time delta and so what i did is i i know that like the johns hopkins data is is being updated every new day so it will have the previous day's data um at this you know fairly early on in the new day so i really want to get yesterday when we're generating this report so i'm going to do datetime.today minus time delta days equals one and then i'm going to want to string format that as month day and then year and this is all in the date time library documentation what we're taking is like the date time format did object and then we're converting it into a date format that we more kind of usually see and this date format ultimately aligns with what the columns have in the johns hopkins data however what they do by default what this would look like is something like 0 5 0 3 20 i guess uh so note with this format right now we have a trailing or a leading zero in both the five and three if it's not a number 10 or greater so the last thing that i did to make it the same format is the columns in our data so the columns in our data look like this where they don't have that leading zero is i did a replace of any time i saw a slash zero i wanted that just to become a slash and then i did a left strip so anytime i saw the first character being a zero we just got rid of that so that's what this line now does so if i run all this now we're going to get yesterday's data key error 11 7 20 is not defined and the reason it's not defined is that we just need to in our helper method we want to pull from the url so the actual data source and so right now we're just for testing purposes i have it set to dot slash data so let's now pull from the online copy of our data rerun new report and tutorial.pdf what does that look like see now it's pulling 11 720 because we did the stuff with date time library and because we set it to pull from the url where the state is hosted it can actually get that data and we see cases are not looking great here in the u.s let's just now final final thing is i'm going to go kind of through and just copy and paste um what the final report looked like so this is basically exactly what we had but now i'm just copying pasting the exact stuff in generate report here some copy and pasting the first page i need to define states okay so this is the first page second page just gonna quickly copy and paste that in and this you know is very similar we're using all the same methods we've already used copy and paste in the second page from where and once again we're copying this from generatereport.pui save that i don't know what happened it's indented too much and then finally the third page delete this and copy and paste in the third page and this time we're using the plot global case map and we're going to plot just us india and brazil but once again you can use that helper method that is get country names in the file to get all the different options you could have there cool and that's ultimately the report that we're going to generate in this so i guess i can quickly run it finished tutorial.pdf and as we can see we get all the data that we did originally and i guess i quickly will just dive into a couple of the things like note for the first graph we're getting two the past 250 days because this previous days is set to 250 here one thing that you might be curious about is how are the colors looking like they do so if we look at that report the colors are very similar to this peach color that's being used in that united states map that's happening if we look at like time series analysis there's a couple there's these two lines here that ultimately are getting orange colors from an orange spectrum here so if i got uncommented these lines and then got rid of colors right here that would use the default colors so this is using a color map so if you look at lookup matplotlib color maps you might be able to play around with cool colors here but real quick i'm going to just show you what like they would have looked like by default turn that up pdf you see this is would have been the the colors by default the blue so ultimately use the oranges to get it more similar to be this color i'm going to undo that real quick so you can play around with these things play around with what types of visualizations you include but hopefully that this gives you some inspiration for reports that you can generate on your own that's the ultimate goal of this video alright that's all we're gonna do in this video thank you everyone for watching if you have any questions about the content that we covered in this video be sure to let me know in the comments if you did enjoy this video it'd mean a lot to me if you throw it a big thumbs up and also don't forget to subscribe to not miss any of the future videos we are gonna do a follow-up to this one where we actually show how we take these reports that we're generating with our python script and actually can automatically have that script you know run at a certain time so maybe once a day and then automatically send an email to people of our choosing with the report content so that should be a really cool video subscribe not to miss that once again i want to give a quick shout out to this video's sponsor skillshare be sure to check out the link in the description to start playing around with the classes that they offer alright till next time everyone thanks again for watching peace out
Info
Channel: Keith Galli
Views: 65,305
Rating: 4.9654746 out of 5
Keywords: Keith Galli, python, programming, python 3, data science, data analysis, python programming, data visualization, analytics report, fpdf, data viz, pandas, numpy, matplotlib, plotly, line charts, bar charts, python project, edit pdf python, business analytics, marketing report, python reporting, python automation, data science projects, choropleth, map chart, geography plot, python plotting, pandas library, real world project, machine learning, plotly express, pdf report, pyfpdf
Id: UmN2_R4KEg8
Channel Id: undefined
Length: 49min 15sec (2955 seconds)
Published: Wed Nov 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.