Visualizing Excel Files Easily With Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back in today's video we're going to learn how to create efficient professional and simple visualizations of excel files using python so let's get right into it [Music] all right so this is going to be quite a foundational video we're going to work with a simple excel file like this one here so data.exel or data.xlsx is the file that i have prepared here is a simple file of a habit tracker so we have a date here on the left and we have then uh some workout stats some water drinking stats some reading stats some supplement stats and some journaling stats so those are the habits now of course you can also add some formatting here like minutes or liters or pages and all that but this is just some sample data you can use whatever data you want i'm going to use this here as an example don't focus on the data focus on the visualization so this is an excel file now and this excel file is deliberately designed uh to cause problems when working in pandas with it so we cannot just load this exo file into pandas because for example we have a heading here that does not belong to the data we have a free row here that does not belong to the data we have a free column here that does not belong to the data so this is the actual data here now preferably when you work with python and visualizations you want to have simple csv files uh where the first row is just a heading and then you have um the individual data rows here however we're going to work with this here to have a challenge and the first thing we want to do here is we want to install the libraries that we need to work with this data so we're going to open up the command line here and we're going to say pip install uh pandas this is the most foundational library that we're going to use here to work with the data frames that are very similar to excel tables and then we're gonna say pip install uh and we're gonna also install matplotlib and we're gonna install also seaborne those are the two visualization libraries seaborne is a matplotlib wrapper so a library that is basically making the usage of math.lib easier and adds some functionality on top so what we want to do here now is we want to import pandas spd we want to import a matplotlib dot pi plot as plt and we want to import seaborn as sns so those are the libraries we're going to use with the respective aliases and now let's go ahead and start with some very basic stuff let's say data equals pd dot read exo to load the exo file into the script so we're going to provide here the x uh xlsx file and we're gonna say now print data just to see how this looks in python when it's printed and as you can see now here we have some problems first of all we have a bunch of things here that are treated like they are the column names but they're not because this is just uh unnamed uh cells and then we have this habit tracker heading and then we have some nand values so we need to clean this up first and for that what we're going to do here is we're going to provide um a parameter and this is going to be the parameter skip rows and we're going to skip the first excuse me we're going to skip the first two rows in order to basically ignore these now of course you can also just go ahead in excel and just remove them but the problem here is of course sometimes you want to keep the excel file but you still want to do the python visualization so we will just do it in python while loading the excel file using pandas now for the column we're going to just drop the column by doing the following thing data equals data and then dot ilock and we're going to select everything but we're going to skip the first column by specifying one colon if we want to keep the first co column we would have to do just a colon or maybe zero colon but by doing one colon we skip the first column and then we can go ahead and see that the data is now properly loaded in python so you can see here at the top we have the column names we have here the individual rows and we have an index now if you want to change the index of the data frame uh to the date for example what you do is you say data equals data dot set index and then you say date and then we don't have an index anymore we just have uh the date here i mean we do have an index which is the date but we don't have a counter we don't have an identifier which is just a range so this is how you load the data into python now what you can also do now is you can go ahead and do some visualization so we're going to start with a very very simple visualization we're just going to say plt.plot and we're going to plot some simple things here like data.work out by the way you can provide a column names here by using a dot or by using square brackets and passing here workout uh and then this is the y value so for the x value we want to have the date now date is no longer a column it's the index so we provide data dot index if you were not doing this here so if you don't set the index to date you can just provide data and then date like this here then what we're going to do we're going to say plt show so this is already going to be quite a simple visualization of the workout that we have here so now here we have the minutes that have been worked out um and here we have the date so what we can do now is we can add some labels we can say plt x label and then this is going to be the date plty label this is going to be the work out time in minutes for example uh and that should actually be it so oh of course one thing that we might want to add is a title we want to say work out stats for example so let's rerun this here and you will end up with a simple visualization here so this can also be done with seaborne so you can just go ahead and say sns.line plot however it's going to have the same results so the only difference here and this is why i want to show you that is in cborn what we used to do is we do uh data and then equals data and then we provide just the strings as the column name so we say x equals date and then we say uh y equals uh workout for example i hope this works now with the index being set to date but it should work so let's just go ahead plt show let's copy all this because as i said seaborn is a wrapper around matplotlib so if you use cborn you still have to use the plt commands to provide all these things and to do the plt show but essentially this is the same graph as you can see here however seaborn makes a lot of other plots easier and um what i want to do next here before we go to the next plot is i want to pre-process some data because when we look at the data here you're going to see that uh we have two uh columns here that are yes and no values obviously booleans so we only have two possible values the problem is though that they're now just strings so the pandas data frame doesn't know that those are booleans so what we can do here is we can map them to zero and one so what we can do is we can say data and then um what was it supplements equals data supplements and then we're gonna map based on a dictionary in the dictionary is gonna map yes to one and no to zero i'm gonna do the same thing here with the journaling journaling there you go so when we print the data now let's just delete this statement up here we're going to see that data has 0 and 1 instead of yes and no and this can be plotted more easily and can also be used for calculations and other things because data is now a numerical value you can of course also use a boolean but this is now just a numerical approach here so what we're going to do next we're going to do multiple plots this is something quite simple we just say the figure and the axes are plt subplots and here we provide now the grid so if you want to have two by two if you wanna have two um rows and two columns you can provide two two uh i wanna have four rows with one column so i just wanna have four uh graphs below each other and because of that i do it like this and then you have access 0 1 2 and 3 and here you can do just the plotting so we can say for example i want to plot the data index so the date and the data work out for example and then i want to say x is zero here we now cannot just say uh dot title dot x label we need to say dot set title and this is gonna be work out stats then axis zero dot set x label this is going to be um the date and then set [Applause] y label and this is going to be the work out time or maybe we can just say time let's just say time and then provide min here as uh as a unit and then we can say i think actually that's it now we can copy all this and we can change this here to one and here now we plot not the workout but what was the other thing uh we had the water water stats how much water we drink on a daily basis oh actually this is i think the name was actually drinking water and then we had down here a stats and then dates and water in liters and we can copy this now and do this two more times so this is gonna be set to two two two and then three three three and here we're gonna say now what do we wanna do reading uh actually yeah reading was the column name then we had reading stats then we had date and we had pages and for the last one i want to do something different i want to do a bar plot so i want to say yes or no either a bar or no bar for the journaling when do we journal when do we not journal and then you can see the stats there so journaling stats and here just yes or no and then we can say plt dot show at the end and we should see a graph with multiple subplots so a figure with multiple subplots as you can see here now probably we can i think there's something called the plt tied layout i hope this is enough to fix this otherwise you need uh to look up more advanced structuring techniques but there you go so now you have the workout stats the water stats the reading stats journaling stats of of course not very detailed you can of course also change the x6 and all that i have a couple of videos on visualization if you want to go into more detail but this is just a basic uh graph with or a basic figure with multiple subplots so then i want to show you one thing that is a little bit more complicated in terms of the graph itself but it's just a one-liner in python so it's not complicated for you it's a complicated sophisticated professional graph but you don't have to do anything complicated to show it and this is the correlation heat map so now we have a bunch of features in there we have the workout we have whether the person has journaled or not we want to see what is the correlation between all these uh values and we can do that in the command line by just saying data.core this is one possibility you can see like that or we can plot a professional heat map and for that what we do is we just say sns dot heat map and we provide data dot correlation we provide anode for annotation equals true and we say cmap and you can choose a color map you can just google what color maps are available but the one we're going to use is the yellow so yl gnbu yellow green blue and then we're going to just say plt show and this is going to create a professional correlation heat map of your data and you can see then that certain things are correlated so supplements seem to be negatively correlated with journaling that might be meaningless might have some some meaning drinking water seems to be very positively correlated with work out which makes sense when you work out you should drink more uh on rest days maybe you don't drink so much uh that makes sense and yeah this is what you can do now one last thing i want to show you here before we end this video is how you can add some additional information based on what you already have and how you can also visualize that so for example we now have a lot of dates what we might want to do is we might want to take them and extract the weekday so is this a sunday is this a more a monday a tuesday and so on and then to visualize that data maybe we can find some correlation between a sunday and not working out so i deliberately designed the data to find such pattern but maybe in your data that was not designed by you artificially you can find some patterns like that and for this what we're going to do is we're going to say data equals data reset index because we need to work with the with the date so we're not going to have it as an index here i'm going to also delete all this here and we're going to say the data weekday is going gonna be uh just the data date and we can treat the date as a date time object so as dot dt and then we can just say weekday here um and then we can say another feature that we're going to create is the is underscore sunday feature this is going to be a boolean feature and it's going to be determined by the weekday being equal to six because we start counting from zero zero is monday six is sunday um so this will be a yes or no whether it's a sunday or not uh we can print the data to see what this looks like and um [Music] there you go true false false false false false so you have monday tuesday wednesday thursday friday saturday and sunday again so this works and now we can go ahead and just plot a simple line plot we can say sns line plot the data is the data the x-coordinate is the date the coordinate is the work out and then we can do the same thing with a scatter plot so we want to have a dot on the line every time and the only thing that we're going to change here is that we're going to specify an additional parameter the hue so the color is going to be determined by the is sunday attribute by the is sunday feature we're going to say plt show and that is going to be our final graph here as you can see there is a pattern you can see on every sunday there is no work out of course there are also days where there is no work out there not sundays but you can see here true means this is a sunday all the orange points mean zero minutes worked out i deliberately deliberately designed uh it like that but we can also do the same thing i hope so for the journaling because the journaling i also artificially designed the data so that the journaling happens on every sunday so this should be also something that can be seen here now this is not the most beautiful graph but you can see every time when journaling happens it's an orange point so it's a sunday but it also happens sometimes when it's not a sunday so that's it for today's video i hope you enjoyed it and hope you learned something if so let me know by hitting the like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you much for watching see you next video and bye [Music] you
Info
Channel: NeuralNine
Views: 27,616
Rating: undefined out of 5
Keywords: Python excel, python excel, python excel visualization, excel visualization, python visualize excel, visualize excel files, excel, python, seaborn, matplotlib, pandas, python excel pandas, python excel matplotlib
Id: 4F9jnTDvxv0
Channel Id: undefined
Length: 15min 52sec (952 seconds)
Published: Sun Aug 21 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.