Matplotlib Mega-Tutorial - Data visualization in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to another project data science tutorial this time we're learning about matplotlib the foundational data visualization library in python many of the other popular python data visualization libraries are built on top of matplotlib actually like Seabourn is one that comes to mind immediately which is why starting with matplotlib is a great choice even if you want to go on to learn those other libraries as well and matplotlib is just a hugely powerful library anything that you want to do you can do it so here at project data science we believe that you learn best through doing which is why all of our tutorials are hands-on we walk you through everything step by step but you should absolutely follow along on your own so if you're not at your computer get at your computer and get ready to do some coding alright let's walk through what exactly you're going to learn today what we're gonna do step number one we're gonna create a virtual environment using kondeh if you don't have Python installed and if you don't have matplotlib installed this is where we are going to help you install that and and if you don't want to install it on your computer we're gonna show you how to use Google collab which is a great option for getting started we're gonna download and load some sample data into our Jupiter notebook and we're gonna be spending the rest of our time in that Jupiter notebook to make it nice and easy for you learning matplotlib and then we're gonna go through some different sections so I try to break this down for you as logically as I could think to do so section number one we're gonna discuss the basic parts of a matplotlib graph this can be pretty confusing to newcomers so I spend a little bit of time here helping you get an understanding of those different pieces of the graph section number two we're gonna go into some basic graphs like line plots bar graphs scatter plots etc section three we're gonna look at how to plot multiple graphs on the same axes so on the same on the same chart essentially and it's it's pretty straightforward it's pretty easy to do so like if you want to do a scatter plot and a line graph at the same time section number four we're gonna look at adding and altering text like titles axis labels etc section number five we're gonna look at using subplots what is a subplot well it means plotting multiple graphs in their own different little chart areas so we'll talk about how to do that and then a lot of it actually is gonna be in section number six so we're gonna cover the basics in sections one through five and then section number six is gonna cover a lot of common questions that beginners to matplotlib have and we're gonna cover some of those common use cases that might trip you up in the beginning and then finally section number seven I don't like to knit to end a video without giving you some opportunities for future learning and some resources and some ideas for how to do that so let's get started first things first I'm going to open up a terminal and I'm gonna create a I'm gonna create a new let me drag this terminal over here I'm gonna create a new directory for us to work in for this project so this is how I like to start new projects let's let me do a little LS here take a look at my home directory you'll see that I have this folder project data science I'm gonna go ahead and change in here if I do LS you'll see these are the directories for the other courses and the other videos that we have recorded here that are online I'm going to make a new directory let's call this Matt plots Lib will call this matplotlib tutorial let's just call it that so I will CD change directory into matplotlib tutorial if I do an LS you'll see that there is nothing in here and that's because we haven't created any anything in here we just created this directory so that's looking good now we're going to be using some data that we create in this tutorial but we're also gonna be using a little bit of data on heart disease a little bit of data from Kaggle here so I'm going to open up a new Chrome window let me drag this chrome window over here and if you go to Kaggle heart disease let's see we have the heart disease data set we've got predicting heart disease let's go here predicting heart disease and actually if we click this heart disease UCI link right there this is the one that we're gonna want so kaggle comm slash roan etf ro NIT F and then heart disease UCI I'll make sure to post this link so that you have it but this is where we're going to be downloading our data from this is a data set on heart disease we can take a look at the description here the database contains 76 attributes but the published experiments for this data set only use 14 of them and those are the 14 that we're gonna have in this data set in particular the Cleveland database is the only one that's been used by machine learning researchers to this date etc etc so this this data set originally comes from UCI so the UCI machine learning repository so if you come here you'll see that these are some very popular machine learning data sets so we're gonna be using this heart disease one here but we're just gonna get it from Kaggle so let's go up here click this download button if you don't have a Kaggle account you'll probably need to create one i will click on this zip file that gets downloaded it opens up this heart dot csv i'm gonna rename this really quickly just to what the zip file was named actually which was heart disease UCI and now just to show you how to do this in the terminal i'm going to come back over here to our directory our matplotlib tutorial which if we go to project data science and matplotlib tutorial see that this is right here and i'm going to move viii from our downloads heart disease you see i dot csv and i'm going to move that to just period which means this directory so i move that and do a little LS to show that we have our heart disease you see i dot csv file in here okay so now if i if i go and i double check that our csv file really is there i'll go ahead and double click this we can pull it up in excel really quickly just to take a look at what it looks like alright just a bunch of numbers see a bunch of numbers here and we have our column headers across the top so we have the age of the person the sex of the person these are ones and zeros and the original or the Kaggle data set over here gives us the mapping so one equals male zero equals female and then we've got these other numeric columns as well and this is what we're going to be plotting for some of the tutorial so coming back over here let's go ahead and close this and go ahead and close this now we get to the part where we need to set up our python environment we need python we need matplotlib in this case we're gonna need pandas to read in the data sets and we're gonna need jupiter for our jupiter notebooks here so there are two different options number one i'm going to be creating a Conda virtual environment using the Conda creates command here if you want to go this route as well then I suggest if you don't already have mini Conda or anaconda installed go over here to the mini Conda page Conda is a virtual environment manager and installer for Python so we are going to use Conda to manage our virtual environments and if you don't know what virtual environments are feel free to do a little bit of research there but essentially it's just it's a best practice for wrapping up our Python coding environment into a single directory so that we don't you know screw up our computer and etc and so we don't have conflicting packages as well when we try to install things so if you want to go this route install mini Conda and we will install everything that we need into a folder and it you know it just takes a second here if you don't have python on your machine and you don't want to go through this installation process which i do recommend if you're interested in data science or data analysis you know having python on your computer is very helpful so you can actually run some python and stuff but uh but you can also go to google collab collab so collab these are essentially jupiter notebooks run by google in the cloud here through your browser so with these you would not have to install anything you can just come here to collab you can create new notebook this is going to create a notebook for you and it's going to have everything that you need installed so it's gonna have matplotlib it's going to have pandas etc so if you want a nice easy way to get started here you can just go to collab and use a notebook in the cloud so that's pretty cool i'm gonna go ahead and move this to the trash because i'm gonna be doing this on jupiter notebooks on my computer leave this site all right we got this mini con 2 here so let's come over here to kondeh create and i'm going to create a new virtual environment I'm gonna call it Matt plot lib tutorial and so that's this dash in flag here is the name the name of the virtual environment we're creating the Conda virtual environment and now I'm going to install the Python packages Jupiter Matt plots let's go and full screen this why don't we matplotlib pandas numpy I think that that should be it's and if we click run here this is gonna spin for just a second it's going to look at it's going to look at all these packages that we're trying to install figure out if we can install them and then we will hit the letter Y right here so this is going to tell us everything that Conda needs to install in order to install the four different packages that we want to install in this virtual environment so I type a little Y for yes hit enter this is going to download and extract these packages and we will give it just a minute to finish perfect so we've just created a condom virtual environment called matplotlib tutorial everything finished installing just fine so a one way to see which virtual environment you're using or rather which version of python you're using is if you're in a terminal here on linux or mac OS this command is going to be a little bit different on Windows I believe it's called where if you're in CMD the command command prompt in PowerShell it might be a little bit different like where Exe or something like that but so you type which python and this is gonna show you if I were to run Python right now so I run Python this is the executable that's getting called when I run python so anaconda three-bin the python and you'll notice that this is python 3 dot 7.3 etc etc I have access to all of the packages that are installed in my base Python installation through anaconda now if I exit out and I Conda activates Matt plots Lib tutorial just like it says to do up here to activate the environment you will notice that this changes over here on the left to show us that we're in a different environment now and your terminal might not have that that's okay I have the terminal zsh which does some fancy stuff like this for me but that's fine if yours does not now I type which Python and you'll notice that we point to a different location so anaconda 3 in this map plot lib tutorial bin Python this shows that we are now inside of our wrapped up virtual environment and we only have access to the packages that we explicitly said we wanted to install in addition to the standard Python packages so I type Python here you'll notice that we're actually running a different version of Python in here 38.3 because conda whenever you create a new virtual environment I think it just pulls the latest version of Python the latest stable version so we're actually running a newer version of Python if I import pandas we installed that so pandas can import just fine but we did not install Seabourn which is another data visualization library so Seabourn is installed in my base installation but it's not installed in my virtual environment because we are using a totally separate Python installation we're using totally separate packages here alright so what I'm going to do now is i'm in my matplotlib tutorial folder I'm going to run a jupiter notebook and and we're going to do the rest of this matplotlib tutorial in side of our jupiter notebook here so let's go ahead and close out of this other chrome window so this is our Jupiter notebook interface I'm going to to open up my new Python 3 notebook let's name it matplotlib tutorial up here and we are good to go the first thing we're gonna do here is I'm going to create a markdown cell by typing the letter M and I might get into here and make a header one and I'm going to say set up notebook so the first thing that I like to do in a notebook whenever I'm doing data visualization using matplotlib is you do this percent sign matplotlib in line so we don't have to spend too much time on this right here but essentially there are different ways that you can run data visualization in matplotlib and in jupiter notebooks matplotlib in line is going to be one of your most popular most common choices I would just set this up put it as one of the first cells in the notebook and then kind of forget about it so let's let's uh let's get closer to diving into some actual data visualization here because the thing about matplotlib that you will very quickly learn is there is so much power that it can be overwhelming and there are so many options that it can be extremely overwhelming so I'm going to try to start you off with some of the simplest stuff that we can do and then we're gonna add on complexity there but I definitely want you to limit the number of things you're thinking about originally so let's go ahead and get started so import mat plot lib pie plots and this is the main plotting sub module here that we're going to use matplotlib has various other sub modules which you could see by typing tab by hitting the tab key and then you can see all of these different sub modules that matplotlib has there's a ton of stuff here but anyways pipe lot and then we import it as PLT and this is going to be the way that you're pretty much always going to import mat plot lib here so let's go ahead and run this now I'm going to create a new markdown cell here with a new heading one I'm going to say section number one pieces of a mat plot lib graph to start off with I want to be nice and explicit here that we so that we see exactly what we're plotting so I'm going to say X I'm let's just create some data to plot so I'm going to say x equals and I'm going to create a list here negative 3 negative 2 negative 1 0 two three we could also do this with a range function and Python that's totally fine and I'm going to import numpy as in P while we're at it because matplotlib usually expects a numpy array and an umpire array if you haven't worked with numpy much before is you know you can basically think about it like a list or a matrix so you can have a one dimensional array which is just a list or you can have a two-dimensional array which is more like a matrix with rows and columns and you can have three dimensional arrays and things like that so I'm going to create x equals in P or a negative 3 up to positive 3 and now let's for our Y let's create x squared so we're just going to do a simple x + y squared thing here so if we look at X we have our array negative 3 up to positive 3 and if we look at Y well we just squared every single number here so 9 4 1 0 1 4 9 etcetera so to create a plot in matplotlib this is the basic formula that I am going to recommend you always use when you're when you're starting out there are other ways to do it but we're not going to talk about those right now we're just going to give you I'm gonna give you a formula that you can use to plot and I recommend that you always do it this way for now so to start off with you're going to say fig and ax equals PLT dot subplots and then you're gonna do all of your plotting code here so do your plotting code here you know dot dot so this is where you're actually gonna say what you want to plot and at the very end do PLT dot show so this is the formula this is your plotting formula here so I'm gonna even create a little section 3 header down here and say you're plotting formula alright so what we're going to do is we're gonna say fie axé and this stands for figure and x YZ X YZ not axis but ax sees this is kind of a confusing thing we'll talk about in just a minute fig ax equals P Ltd subplots will do PLT show and now our plotting code that we're gonna do first is just ax dot plot x and y and here we go our first matplotlib graph beautiful isn't it it's actually kind of a little choppy it's actually you know kind of ugly but with this just a little bit of code here we're able to produce graphs and this is pretty cool and from here we're gonna add on complexity layer by layer but this is the foundation of plotting and matplotlib you always instantiate your figure and your axes you always want to end it with PL th show unless you're saving out saving out a file rather than displaying it here and then in the middle this is where you're gonna do all your plotting and you can have you know you could have one line here to plot this or you could have you could have a hundred lines to do all kinds of crazy formatting so this is a good place to tell you what do these terms actually mean and to do that we're going to go to the handy dandy mats plot lib parts of a figure anatomy of a figure let's see here we go this is exactly what I wanted to look at so this is what a matplotlib graph is or a matplotlib figure and if you understand what it is this is gonna help you with plotting so let's talk about the main pieces here the figure B figure so you notice that this circle here pointing to just what looks like a blank patch of nothingness that's because the figure the matplotlib figure is this whole graph so it doesn't matter how complex your graph is it doesn't matter how many different lines you have in here and how many different points and you can have you can actually have lots of little different graphs in here as well the whole thing is called the figure now within a figure you'll notice right right beside it here AXYZ ax yes AXYZ this is what we might think of as being the graph itself so the figure is where you're going to put all of your graphs and then the AXYZ is going to be kind of the plot itself with with an x-axis and a y-axis and some lines and some and some points and all this kind of stuff so to make this a little bit more clear let's come back over here and I'm going to show you now what it looks like to have a single figure a single figure with two axes so sub plots and in this case I am going to do one row of two columns of axes and I'll explain this in just a second but here we go so now we have a single figure now a figure being just like this image here you see how I can kind of click and drag that image so this is like a single image file that you could download here that whole thing is your figure now your axes ax es this is going to be this plot so this plot right here is one axes this plot over here is another axes so we have two axes and this is what I created here I created one which is the number of rows of axes that I wanted and two which is the number of columns of axes that I wanted and then I took each one of those axes and I plotted on it and this is why up here we had one and we plotted on that single axes x and y so X versus Y negative 3 negative 2 negative 1 that's your X and then 0 up to 0 1 4 9 etc that's that's your Y so down here we have a single figure still but now we have two axes all right now within each X sees with an es and this is very is very confusing they probably could have done a better job naming things here you have your x axis and your y axis so this might be what we what we typically think of as an axis and then for each axis there you can have an axis label you have axis tick marks you have gridlines on that axis and then same thing for your y axis up here and then you have your actual plots so you can have a line plot you can have markers for like a scatter plot so in our case over here we have axes 1 which has its own x axis and its own y axis and maybe to make this clear let's plot X cubed on the second one here so X cubed is now going to be in this axes over here so you see that each one has its own x axis x axis and its own y axis each one has its own lines its own line objects in here and this is all wrapped up in a single figure alright so I hope that that's helpful we create first off we create the figure and we create the axes and then we plot things on the axes themselves and then finally whenever we're done getting all of our stuff to plot we show it we show the graph ok so hopefully this is somewhat clear for you at least I think that people honestly use matplotlib for years without ever looking at this figure here and seeing what all different pieces are so remember that the graph the whole thing is called a figure within each figure you can have multiple ax C's which are where you actually do the graphing each one of those axes has an x-axis and a y-axis and then that's where you're gonna do your line plots your scatter plots with your markers etc and just to really really drive this home so I'm going to copy this down here and instead of making a 1 by 2 grid of subplots I'm going to make a 2x2 grid of subplots and let's just plot this let's show this before we do any plotting on the axes themselves so I'm going to create a comment here and just see it say it creates a figure with four axes too let's see two rows of two graphs on each row and here we go you'll see that we have a single figure so if I want to drag this you know you see that that's kind of just like a single image there and we have four different axes we have a 2x2 grid of axes each one of those axes has its own x axis and its own y axis and we can plot whatever we want on these axes and this object the XS that it returns in this case since we have a 2x2 grid is actually also going to be a 2x2 numpy array for us to plot on so to plot I'm going to use kind of two-dimensional indexing in here to show you how we can plot things on these graphs so I'm plotting things on the the zeroth row the 0th column so this one and then the first row and the first column this one over here we could also plot things on 0 1 and one zero and I'm doing all of this by the way so let's let's maybe do an X to the fourth and what do you think like like a log of X or something like that let's let's try this okay well we got it divided by zero and the log let's do it let's do an exponential maybe here we go so you see we've got four different graphs now in four different axes and they're all inside of a single figure so everything inside of a single figure alright so the reason why I'm really I'm really trying to drive this home right now is because everything else everything else that you do in matplotlib is going to rely on you understanding these axes objects and the fact that you're plotting on these axes objects so if you want to plot some you know two images side by side well you now understand that these are two different axes on a single figure if you want to plot two different graphs on the same axes well now you know even though we haven't done this yet you might be thinking to yourself okay I know I'm gonna use that same axes to do that plotting and this is something that we're actually going to get to in just a minute so before we move on I want to show you one last thing while we're here and that's that's the fact that you might notice that these graphs are pretty choppy right you're like oh well this doesn't look like a very pretty parabola it's kind of choppy or all the graphs gonna be this ugly and the answer is no the answer is that the graphs are choppy because these are the only points we gave to plot to it to matplotlib to to plot so to make this to make this very clear I'm going to introduce something that we haven't looked at yet and that is the scatter plot that's the scatter plot so with a scatterplot you can really see it now that we have only defined X&Y points remember we've only defined X&Y points x and y for seven data points total so we've got three and the negative we have zero and then we have three in the positive so that's why we have so few data points and then when you when you connect these points with a line well then you end up getting a kind of choppy looking graph here so to show that let's see what it would take to create a smooth line and to create a smooth line I'm going to use the numpy the numpy method or the the object here a range and if you've never used a range you can use one of my favorite Python tricks the question mark here to get the documentation so in P dot a range here's the doc here's the doc string you give it a start you give it a stop and you give it a step and then it returns evenly-spaced values within a given interval so the start number the stop number and then the spacing between values so let's for example create a range from negative 5 to 5 and let's space them out by 0.01 and if I run this you will see boom Wow tons of values that is a lot of values how many values is that well that is a thousand values perfect so let's create this as our X instead and now let's do the same thing that we did before y equals x squared now if we plot X versus Y like we did before now you see we have a very smooth graph and why is that well that's because that is because if we plot the scatter plot again of X versus Y you see each one of these is a point so we now have a thousand data points here defining this graph rather than a measly seven this can be something that confuses people whenever they first start using matplotlib you know they're like hey why is my graph look the way that it does and well it's because of this it's because of how many points you passed a matplotlib asking it to plot so in this case we're only passing seven points in this case we are passing a thousand points and we get this really beautiful graph here actually I'll show you let me show you let's create a Y let's create a y damped damped oscillation is that any spell oscillation I think that is oscillation you always forget your spelling and things like this damped oscillation so I'm just doing this because it's fun and it's pretty so I'm going to show you really quickly what a nice pretty graph looks like so we do the e to the negative x times cosine of the NPI pi whatever you don't have to know any of this by the way this is just for fun so a grid ry now let's do fig ax equals peel t dot subplots let's do a X dot plot X versus Y damped oscillation PFD dot show and look at that isn't that pretty isn't that nice and we have a thousand data points here that are helping to show this graph which is why it looks so nice and smooth all right that's enough of the introduction I think I hope that you have gotten a better sense of the pieces of a map plot lib graph I'm going to copy this under this section header just as a nice little reference let's save our Jupiter notebook and let's move on I'm going to create a section header here in markdown I'm gonna call it section number I guess is section number two now isn't it section number two basic graphs alright so let's do some of the basic types of graphs that you might want to work with here and actually you know what we we haven't started using our heart disease data yet have way so let's actually scroll back up to the top I'm going to create another header here and I'm going to say load our data okay so we're gonna load our data using pandas and if you have not used pandas before and if you haven't seen the project data science pandas mega tutorial the pandas mega tutorial that will walk you through pretty much everything you need to know about pandas so I would highly recommend watching that at some points but let's go ahead and just import pandas as PD and we're going to use pandas to load the data so let's type in LS which handy dandy Jupiter notebooks converts into the shell command or it uses the shell command here and it lists out the files that are in the current directory so we see our heart disease UCI data set so I'm going to load this PD dot read CSV into a into a data frame so let's read that in if we look at D F dot head this will give us the top five rows and our data so we see that we have our age sex we have this column called CP etc etc etc and then target is whether or not the person had heart disease if we look at DF dot shape this will tell us how many rows we have and how many columns if we look at D F dot describe D F dot describe this will tell us some descriptive statistics for each column so for example our our average age is 54 in this data set and if we go over to targets the average prevalence of heart disease is about 54% so the majority of the people in this data set do actually have heart disease finally if we look at D F dot info this will tell us the number of non null rows so in this case we have three hundred and three rows looks like all of our columns none of them have any nulls and all of them are numeric data types one of them is a float the rest of them are int here integers all right so we've loaded our data we're ready to start plotting using this data now so we'll come back down here to our section number two section two basic graphs so let's start with a scatter plots let's start with a scatter plot so if we take a look at our data frame and maybe we want to take a look at the scatter plot of two of these columns plotted against each other so maybe we're interested is there a relationship is there a relationship between age and let's say your cholesterol levels so that's the question we're interested in answering so I'll actually type that in here is there a relationship between age and cholesterol cholesterol level so if we look at DF age here this is going to return a panda series we have a bunch of values we can convert this to a numpy array which U is used in plotting matplotlib here and actually I think matplotlib can also take pandas series as well but you know never hurts to convert it to a number array first and then let's look at the first ten values here so the first ten values of our numpy array and let's also look at cholesterol let's look at cholesterol first ten values here all right so we want to make a scatterplot following the same the the formula that I told you up there the formula for plotting the matplotlib let's do fig comma axe for axes equals PLT subplots will do ax dot we have been doing ax dot plot but all we have to do is change it to ax dot scatter so instead of doing a line plot we're now going to do a scatter plot and we'll pass in our what we want our X to be so X is going to be the age and then we'll pass them well in our Y to be so the y is gonna be our cholesterol and then finally PLT dot show and here we go here's a scatter plot of our age versus our cholesterol and at a glance you know how to glance that doesn't appear to be too strong of relationship there might be a slight positive relationship here but nothing that jumps out and and slaps us in the face so let's uh you know maybe we're interested in not not every data point that we have in our data set maybe we're interested in the average by age and this will actually give us fewer points to work with as well so I'm going to create a data frame that is grouped by age and so essentially what we're gonna do D F dot group by we're gonna group by the age and then we're gonna take the average of all the other values so if I do that and you know I explain this more in the pandas tutorial video but essentially what's going on here is we're we're grouping all of our data by the age and then you know for let's say forty year olds or let's see maybe for like forty five year olds will have ten different data points so we're gonna take the average of those ten data points to get each one of those values so this cholesterol value here rather than being the cholesterol for a single individual is now going to be the cholesterol the average cholesterol or all 45 year olds so let's take this group I will set this equal to DF group by age let's just call it call it something so that we can use it and then if we look at this data again and ask ourselves okay now we want to plot age versus cholesterol well our age is now our index over here on the left side so this is kind of a special feature of pandas dataframes and then our cholesterol is still just a column so if we look at DF group by age index here all of our ages so there's gonna be our new X and if we look at DF group by age cholesterol Dot values and we just look at the first ten of those all right and maybe we can look at the first you know 10 from the index as well all right so let's create a second scatterplot so we'll say scatterplot number two average cholesterol by age all right so we can take this exact same kind of scatter plot formula we'll put it down here and now instead of DF we're gonna have DF group by age DF group by age and rather than calling the age column for this one we're gonna pass in the index and this DF call will stay the same so let's plot this and see there you go so now we just have a single data point for each age and we get something that actually does look much more like a positive correlation here so that is scatter plots in a nutshell ax dot scatter and we're gonna we're gonna go into more depth with all of this graphing as we go along but we're starting with just the question of how do you plot some basic graphs so first up scatter plot now let's talk about a line graph so what if we wanted to take this same plot here but rather than plotting it as a scatter plot what if we want to connect the dots we want to plot a line graph similar to how we did up top with the you know y equals x squared well we can just copy this down here and rather than a X dot scatter we've done this before a X dot plot and there we go whether or not it makes sense to represent this data this way you know that's that's your judgment call to make but to plot a line graph all you have to do is call a X dot plot ok so let's move on to let's do it let's do a bar graph what if you want to plot this as a bar graph lots of things look really good as bar graph so it's very easy to tell relationships between different different categories in a bar graph so this is a very common type of graph how do we do that in matplotlib once again let's go ahead and copy our formula for graphing down here so we create our figure we create our axes using PLT dot subplots and now rather than a X dot plots we just do a X dot bar so you can see that we're changing the type of plot here by changing the method that we call on the axes object so we have our axes here where we want to plot something and then to actually plot something we got to call some sort of method on it and to do a scatter plot you call dot scatter to do a line plot you just call dot plot and dot plot is more powerful by the way you've got you have a lot of different options actually for each one of these methods but we're just going through the basic method right now and then for a bar graph you just do dot bar and all the time your first value here is going to be your X and your second value is going to be your Y that is going to change slightly with a horizontal bar graph which we're going to do next so let's do a let's do a horizontal bar graph here and so instead of calling bar now we call bar H so we call bar H and now so this this is where it mixes up a little bit this first argument that you're gonna pass in you know you you call it the exact same way here but now this first argument that you pass in is technically going to be your y-axis and the second argument they pass n is going to be your x-axis so now you might be thinking okay well hey how do I you know just how do I see how many methods I can call on this axis object so if you go out to a new cell and I believe we still have this ax object from the previous cell here if we do a X we'd type dot and then we hit tab so this will show us all of the different attributes all of the different methods that we can call on our axes object and you'll see that they're actually they're quite a lot so here's bar here's bar H you can see that we've got a box plot there you can see that that we've got various things for dealing with labels and zooms we have error bars here so you have all kinds of different things and this is you know partially so I'm just scrolling through this massive list right here this is partially what I said whenever I started the video about matplotlib can be overwhelming there are so many options because you can you can tweak this graph to look however you want it to look so there are a ton of options here which is why we're starting kind of nice and simple and building up complexity is as we go and there will still be a ton to learn after this video so you know it's all about just how do you want the graph to look and how do you get mat plot lib to do that so let's keep going with simple types of graphs there's gonna be our last one is a histogram and for a histogram you just need one variable so it you know a histogram buckets your data into different into different Bin's and then shows you how many of your rows of data fall into each bin so in this case maybe we want to look at let's go ahead and copy our are kind of plotting formula down here maybe we want to look at our age distribution so how many people in our data set fall into which age buckets so in this case we just call a X dot hist so dot hist is the method that we're gonna call it here for a histogram and by the way if we wanted to look at that doc string if we wanted to look at the the documentation for hist we just put the question mark after it we run that cell and then you see okay what do we pass in here well we pass in X X is the only thing that's required and all of these other parameters are optional they all come with defaults so bins is one of the parameters that we can use we can plot a histogram ok compute and draw the histogram etc etc bins here's the description for that parameter so if you want to do something like plot a histogram and you want a little bit more control you know slap this question mark here run this cell get the documentation and see what all your options are and you can set some of these right whenever you first plot your histogram so alright so we want to look at our histogram for our age so in this case we're gonna go back to our original data frame that has all of our rows that has one row for every person in this data set and let's go ahead and get the values here for the the numpy array and here we go here's the histogram of how many people in our data set fall into each into each age category so you can see here that we have you know most of the people in the data set kind of fall into this cluster right you actually we've got you know we've got two big clusters here one of them is from what is this like 53 up to 67 or something like that and the other the other group here from you know late thirties up to 50 53 early 50 something like that so that is how you do a histogram and with that you see we have a number of different chart types here that you can access all just by calling these different methods and if you remember scrolling back up to these four different subplots that we had these four different axes so if you wanted to do a different type of plot in each one of these you could and you know I might you know I could just show you what that looks like right now actually so what if we change X to the fourth to a scatter plot there we go well so we have a lot of data points here oh it's because we changed our X and our Y we changed our X and our Y to have a lot of different point to have a thousand different points down here so that's why this graph is now going to look pretty different which might actually not be super great for these next types of plots but let's let's try it anyways let's try a bar plot so this is gonna try to do a bar plot with a thousand different bars which yeah it's not not the greatest but there you go and so this is how you would do different types of plots in your different sub plots here it's as easy as changing the method you're calling okay so that wraps it up for these simple graphs here we have scatter plot we have a line graph we have a bar graph horizontal bar graph and a histogram and with those plots I think you're gonna be able to do most of the types of plots that you want to do and especially when you start kind of mixing and matching these and learning how to use color and all of that so let's let's move on now to the next section so let's create a new section right now let's call it section number three section number three and so for this section we're gonna look at plotting multiple graphs on the same axis so the same axis so remember fig ax equals PLT dot sub plots and then peel th show this is gonna give us one figure so one image you know and one axe C so ax es there so one place where we're gonna plot something so for this one we're gonna look at multiple graphs on the same axes so we're not going to be looking we're not gonna be looking at multiple axes like this we're going to be plotting on the same axes okay so let's try a simple plot first let's go back to our simple mathematical equations that we were using earlier so we'll do a X dot plot and let's take a look at our X so our X is still there our Y is still there let's take a look at X&Y here what is this this is a parabola all right beautiful now if we want to plot another thing on here let's do a X dot plot let's go back to our damped let's see Y damped oscillation remember and there you go so our parabola got very squished here you can see because our our Y axis suddenly got very big so what if instead of instead of a parabola here what if we do like X cubed let's try that alright so that looks a little bit better a little bit better there but either way you can now see that we have two graphs on the same axis we have y equals x cubed and then we have our damped oscillation graph here and matplotlib went ahead and plotted these in two separate colors for us now whenever you start plotting things on the same axes you very often want to add labels and this is very easy you just add a label parameter inside of your graph here and so we can say you know for this one maybe this is just XQ boobed and maybe the label for this one is damped oscillation so we add labels but if we run this nothing else happens so this is where our handy-dandy ax dot legend method comes in so this is something that you'll probably use all the time anytime you add labels to your graphs and then you want to show those labels in the graphs just add this ax dot legend we run this matplotlib automatically puts a legend down here for us with the color and it'll put the style of the graph too so you know if this was a scatter plot these would be little dots rather than lines and we see that our blue plot here is X cubed and our orange plot is the damped oscillation so that's a simple overview of how you use the same axes object to call the plot method multiple times you call it multiple times with different data and that plots your data on the same axes here now let's go back to our heart disease data and let's say that we want to have a scatter plot so we want to go back to a scatter plot of plotting one numeric variable against another to look for a relationship but we also want to color by labels we want to color by labels so why might we want to do this well let's say that we want to look at the relationship of let's go back let's go back to our data frame let's say we want to look at the relationship between t rest BPS which if we go back to our kaggle data here back to our catechol data t rest BPS this is the resting blood pressure so this is the resting blood pressure and then cholesterol this is the serum cholesterol in was this milligrams per deciliter or something like that so let's say that we want to look at the relationship between the rest blood pressure and the cholesterol and we want to see if there's any difference in the that relationship as far as people who have heart disease and people who don't so we want to color by people who have heart disease let's let's take a look at kind of what we mean here and how we might do that so we'll start with our basic formula so we've got our subplots here we're creating our figure creating our axes at the very end we call PL th show to show that plot and now if we just wanted to do this normal scatter plot if we just wanted to do a normal scatter plot we might say you know DF t rest bps values and we want to plot that against cholesterol dot values and here is our plot we've got that resting blood pressure on the x-axis we have our cholesterol on the y-axis but we want to color this we want to color this by if the person has heart disease or not so looking at our original data here you can see target so target of 1 and let's just go back to our data to verify a target did it let's see all right experiments with the Cleveland database so values 1 2 3 4 all indicate heart disease and a value of 0 indicates no heart disease I think an our data here we just have nope sorry value counts yeah we just have ones and zeros and our data here so one indicates presence of heart disease zero indicates absence of heart disease so we want to color by this column so we want to color each point by whether it's hard to these or not so using this we can use the same principle of plotting multiple graphs on the same axes so in this case let's say let's let's take a look at our target let's say we want to get only the rows where this equals zero so we set this equal to zero that creates a boolean mask here so essentially you know does this row equal zero and then sets says false are true we passed this boolean mask back into our data frame in order to index down to just these rows here and then we could call this you know we could call this like our DF subset or something like that so let's let's try to go through this whole thing up here so DF subset equals this and now rather than passing in the whole DF maybe we just want to pass in this subset so these are just gonna be the rows where the person does not have heart disease so let's say first plot the data where the individual does not have heart disease so let's do this scatter plot first and they there you go you see that the scatter plot thinned out pretty nicely there so we're only plotting about half of our points now we're only plotting about half of our data points so let's copy this whole thing let's copy this whole thing go down and now maybe we set DF sub set equal to where the person does have heart disease and then we can plot again so second second plot the data where the individual does have heart disease and look at that matplotlib already colors these points differently based on whether or not the person has heart disease or not and now all we have to do is like we did before let's add a label so let's say label equals we'll call this maybe healthy and this label will say heart disease and remember that we need to add our handy-dandy ax dot legend down here and there you go so essentially what we've done here is we've plotted two different scatter plots on the same axis the first one these are the X values and these are the Y values of the healthy patients the second one these are the x values and these are the Y values of the patients with heart disease matplotlib automatically does the coloring for us and we add a label and we add a legend and there you go we now have a scatter plot where we're coloring by labels that's pretty cool now let's say that we had a lot of values here and we wanted to plot we wanted to do this same kind of coloring scatter plot for a lot of different values well this is a lot of redundant code here and it is very clear because we're explicitly spelling out hey you know plot this first and then plot this second but there is another way that we could do it and I want to show you that really quick so I'm going to copy this down to the second cell so in this case we're going to loop through our different values so basically the only thing that changes in this code other than the label that we set is the way that we filter our data down based on the value of target here so if we look at DF target again DF target if we want to get the unique values in this series then we just do dot unique and this shows us that we only have two unique values 0 and 1 and then we can iterate through these so what does that look like well we can say for target value in DF target dot unique so we're going to loop do you have target dot there we go dot unique so we're gonna loop through each target value through zero and one let's go ahead and indent this and now rather than setting this equal to zero explicitly we're gonna set this equal to the target value that we're iterating through and I'll delete this comment and while we're at it I'm going to remove this label for right now and we can delete this second plot here so let's and then maybe maybe actually I will pass in I'll pass in label equals target target value and then we'll keep the legend here and there you go so we just did the exact same plot except you'll see that now matplotlib because of this DF target unique it has switched the order that we're plotting which means that the colors are flipped here so zero you know or healthy is now orange and heart disease is now one just because of the ordering that these values come out of DF target unique and our label is now just this target value so if we wanted to have a more descriptive label here this is where you could start doing something like this so maybe we have a dictionary maybe we have a dictionary called label mapping or something like that and we want to map a label value or a target value of zero to healthy and we want to map a target value of one to heart disease and now rather than our label equaling the target value we'll have our label equal label mapping of the target value so I run this cell so that we get our dictionary and now if I run this again you'll see that we're passing in our label which is mapped from this dictionary and this is a kind of trick that you can use in a lot of different situations we could we could have a we could have a color mapping dictionary here where maybe you know if the label is zero we want that to be blue and if the label is one we want that to be red or something like that so you can do all different kinds of mapping tricks with dictionaries like that like this but this is an example of how you could take two explicit calls if you wanted and turn them into one for loop call that iterates all over all of the unique values now while we're on this scatter plot let's say that we have one specific point that we want to call out somewhere in this scatter plot this is something you might want to do for various reasons maybe you want to call out a single bar and a bar chart or you want to call out a single point in a scatter plot this is a pretty common thing to want to do so let's talk about how to do it with this scatter plot so I will say scatter plot calling out a single point in this case the way they were going to call it out is using color so if we do a X dot scatter question mark let's see what options we have here so in addition to passing in x and y which we've been doing we have an S parameter a C parameter we got a bunch of other parameters here so s is going to be the size the marker size here and C is going to be the marker color so we're gonna take advantage of that marker color in just a second so let's start with our normal start with our normal template here PLT dot subplots let's do a X dot scatter and that did it uh let's do a X dot scatter we're gonna do all of the points here so let's do D F T rest bps D F cholesterol let's take a look at this all right now let's say that we want to call out a single value so first off let's let's put all these other values as kind of a light gray so use the C parameter C equals light gray so C equals light-gray so we just turned all of these points like right now we want to call out a single point here and maybe maybe the point that we want to call out is just the first patient in this data set so what we can do is we can just have a scatter plot of and actually I'm just gonna copy this whole line down let's have a scatter plot of just the 0th index person so just the first just the first point in our data here and now let's change the color to red and there you go you see now that we plot all of our points in a light grey color and then we just take the first point we just take that first point the 0th index here and we plot that specific point in red so this is a very nice way and you can do this with bar graphs you can do this with other types of graphs this is a very nice way to call out specific points in a graph is you just you know just plot another scatter plot right on top of it do it in a different color whatever color you want okay so our last example in this section is going to be let's do a let's do a bar chart with a line graph and maybe a horizontal line as well so we'll start off with our same template PLT dot sub plots and PLT show down here so what data do we want to plot well maybe I want to plot let's see maybe I want to plot a line graph of let's go back to our group by age our group by age so let's plot age along the x-axis and maybe let's plot our cholesterol levels along the y-axis so here's a nice line graph showing your age and your average cholesterol per age group so now maybe I want to add a bar graph on this axes here and I want to show the T rest PPS the the resting blood pressure so now let's do a X dot bar and once again you know just to kind of reiterate we're using the same a X object here so we're taking the same object calling different methods on it here so that we are plotting different plots on the same graph the same axis so we'll also use the age as our x-axis and now let's do the t rest BPS as our values and there you go look at that we now have this bar graph which represents one value and the cholesterol which represents another value so that's not explicitly called out here so we could change the color of these different graphs here's maybe I'll pass in a C equals let's try orange and we'll we can leave this color the same let's pass in a label and see what that does so this will be cholesterol and this label will be T rest bps and remember that we need to call a X dot legend in order to get that legend to show up and so there you go this is what I was talking about earlier as far as the the graphs will show up as different they'll show up as different shapes to indicate what you're looking at here so this is a bar graph that's telling us this is the resting blood pressure and the orange line is this line graph and that's the cholesterol that we're looking at here alright that's looking pretty good let's move on to our next section now so this next section you'll notice that we haven't added any titles we haven't added any labels we haven't messed with any of the text we've just been doing the graphing with the exception of our our legend here what's good which gets placed automatically so let's have a section now and what section is gonna be I think number four right around section section four yep section number four moving right along section number four adding and altering text all right so adding and altering text this is a very important part of graphing because you want to give people information about what they're looking at you know you want you want to label your your x-axis like what is 50 what is this number even stand for you want to label your y-axis you know what are we looking at and then you want to give the graph some title some helpful title so that people are able to understand the context of the data and what you're trying to communicate so I'm going to before we do text before we dive into it I want to show you a very helpful matplotlib resource so if you map plot lib text API here let's take a look at this so there's one a specific thing that I want to show you actually and I'll just go ahead and copy in this URL it's down here on the page here we go class matplotlib text text so the the reason why I'm showing this to you is because the matplotlib a the API the documentation is very good for looking at all of the different properties that you can use when you're using something like a text object so our titles our text our labels our text etc these are some of the things that you can change about the text so you'll notice you can change the font family so you can change the type of font you can change the font size one of the ones that we're going to use is you can change the rotation of the text so you've got all of these different options for manipulating the text so this is a very helpful link I'll make sure that you have this and whenever you need to do something special with your text you can come here and reference this so I'll actually yeah I was gonna paste this into the Jupiter notebook here okay so first let's look at axis labels and this is axis with an eye as opposed to AXYZ with an E like that so let's get some graph to work with that will work with for this text section here so let's see here maybe what we want to do is we want to look at this bar graph of cholesterol well we want to look at a bar graph of cholesterol by age so I'll copy this plot down here let's change this to a bar graph of age versus cholesterol I'll get rid of the color and the label so first things first our audience has no idea what these acts what the axis the x axis and the y axis are supposed to mean here so the way that we do that is I'll actually show you on the object first so ax I'll hit dot hit tab let's look at all of our different methods here so it's the set so you'll start seeing all the different things that you can set come up here as I type in set set underscore X so you can set the bounds the limit we're gonna set the X label so we're gonna set the X label here and what is our X well that is the that's the age so let's go ahead and copy this up here and there you go you see the word age print it down here now as our x axis label if we want to do the same thing for the y axis well we just change this to set y label and what is this this is going to be average cholesterol average cholesterol there and there is our y-axis label so now what if we want to give this plots a handy-dandy title so I will go ahead and copy this down let's create a new section header here called plot title and the way that you do this is let's go back to ax dot set and let's just see if there's a set title set there is right there look at that I already knew that but you didn't know that and so it was still a surprise for one of us ax dot set title and let's look at ah let's let's say average cholesterol by age and there you go average cholesterol by age now as the title of and we're doing this on the ax object so this is the title of this axes the title of this axes and this will actually come into play in just a second here so these are two of the main things you're really gonna want to know for adding and altering text and we're gonna do some more with text later in the common questions section but for now for now let's go ahead and just leave it as the axis labels and the plot title since these are gonna be some of the main things that you will use alright so let's get to our last section here before we get into common questions and like I said we're gonna get back to text here quite a bit in the common questions so we're gonna look at things like changing these ticks and rotating the x axis and y axis labels and changing the font size and all of that good stuff in the common questions but let's do one more section first of basic function before we get into some of the nitty-gritty here so section number five using subplots so I've already showed you this up earlier earlier in the jupiter notebook here but i want to make it explicit and spend a little bit more time on it so you know how we usually do PLT dot sub plots and then we do PLT show and that creates a single axes here that we can plot things on well if we pass in a number so a number of rows number of columns then we get multiple graphs here and let's go over again so let's let's take a look at a 2 by 2 a 2 by 2 here and remember that this actually usually usually will change this to a X s if this has multiple axes in it and if we look at ax s this is a 2-dimensional numpy array so notice that here we have a list of two matplotlib axes here we have a list of two matplotlib axes and these are both wrapped inside an outer list and we can just look at the dot shape here and see that we have two by two which matches what we asked for and also matches the graph that we are getting so then to plot on each one of these essentially what you do is you do a X s and then you can you can specify the coordinates within this ax s object or the in the indices the index of where you want to plot so let's plot something in the first row the zeroth row and column one let's try that and maybe we want to plot here let's take a look at our our group by age again so maybe maybe we'll do all of these as looking at our age and let's go back to looking at our cholesterol values and here you go you can see that we now have a line graph in the first row second column of our average cholesterol by age now what if we want to look at in this first one maybe we want to look at the so 0 0 for the zeroth row 0th column dots maybe want to look at a bar graph of the average resting what was this resting blood pressure right let's do a bar graph there so now you'll see that we have a bar graph in that first plot there and this is something that's going to come up in the common questions as well but I'm going to go ahead and make the figure bigger so remember that the figure the figure is this whole image right and so now we're trying to plot for different graphs inside the same image whereas before we were only plotting one so that's why this looks pretty small so let's change the fig size here let's change the fig size and maybe we'll make this let's just pass in a 10 by 10 and see what that does or maybe you know maybe let's try let's try it 12 by 8 there we go that looks a little bit more proportionate so now you'll notice that that the figure here this is still just a single image but the single image is now bigger so we have more space for each axes that we're plotting on plotting on ok let's take this let's create two more of these we need to plot something down there in that second row first column and something in the second row second column so maybe let's take a look at our group pilots what columns do we have let's take a look at this one for our third graph and maybe let's take a look at this for our fourth so we got phallic here an old peak what are those so this one a lack or however you pronounce as the maximum heart rate achieved and old peak is the was as ST depression induced by exercise relative to rest I'm not entirely sure what that means but if you do medicine you might know what that means or if you're in healthcare so let's make this third graph here a scatter plot and maybe we'll make the finer the final one a bar graph again so let's try this and here you go so you see that first we're plotting a bar graph and we can do anything that we want with this axes just like we would with just a single axis so we can set the labels we can set the title we can do all of that stuff the second one we have a line graph we have a scatter plot and we finish it off with a another bar graph here so let's actually let's take a minute to just show you that you can indeed set the X label and the Y label and the title just like normal using this access object here so let's look at the first graph and then we'll do the second graph and here's the third graph and here is the fourth graph all right so and we maybe don't need to set everything for all of these but so rather than a X I'm just changing it to the axes location inside of this numpy array here and the X label well this is gonna be age the Y label this is going to be average what was this this was your resting blood pressure right so average resting blood pressure and if we want to set a title here we can say average resting blood pressure by age let's take a look at this and see how that looks all right so our our grafts are getting kind of tight here which you know getting your grafts to look exactly the way that you want them to definitely take some time depending on what you're trying to do but notice that we do have an x axis label a y axis label and a title just for this graph so and we did that just by using this same axis object here now let's say that you had taken the time to populate all of these graphs with their appropriate labels so I'm going to go through and just change make sure that this is indexing into the correct axes object here so let's say that you know we're just gonna leave all these as the same for right now but let's say you took the time and went through and you know maybe I'll change cholesterol and let's see what was this one again this one was heart rate achieved average maximum heart rate achieved and remember this is the average of all these because we're looking at the we're looking at the data grouped by age and then averaged and then old peak well this was the ST depression induced by exercise we'll just say average old peek here all right so let's say that you took the time to plot all of these well just like I was saying a second ago the age the age x-axis label and the title here are our two clothes they're overlapping and this doesn't look good and you can't read it and you don't know what that says and so we need to fix this somehow so there are ways of moving the text up and down and moving the graphs up and down but there's also this very handy-dandy PLT dot tight layout function and this is not it's not a magic bullet you know it's always it's not always going to work exactly how you want it to but if you call this tight layout right before a PLT show then matplotlib tries to do its best to make sure that things are not overlapping and so you'll see here that the x-axis label and the title of the graph below are now not overlapping and everything looks pretty good okay so I think that that's it for section number five using subplots it really is just this easy you create however many subplots you want and then you do any kind of graphs that you want on each subplot on each axes there finally you call PLT show and if things are overlapping try PLT tight layout try this and see if that works and if not you might need to do some manual tinkering with the positions of things but hopefully this works for you okay so now we're gonna go on to our last section here which is a little bit of a longer section and that's because let's see section six here section number six that's because when we're talking about common matplotlib questions there are a lot of these because we've you know we've walked you through the basics here you now have all the information that you need to be able to take your data and do some basic plots some basic line graphs you know bar plots scatter plots etc and to set the titles to set the the X and the y axis labels to do multiple plots you have all that knowledge but typically whenever people are doing plots they want to do something specific and there are a lot of different specific visualizations and specific ways to alter a visualization that people typically want to do that you might want to do so this common question section here is going to go through a lot of the most common things you might want to do with your plot starting with how do we change the figure size how do we change the figure size so this is something and we're gonna go back by the way to our normal our X data and our Y damped oscillation data here just to you have kind of some consistent and I'm actually going to set y equals to Y damn Dostal oscillation so that we can just plot x and y for all of these different plots here so this is something that we did up above let's start with our normal our normal template here PLT dot sub plots ax dot plot x and y and then we'll do PLT dot show run this okay so how do we change the size of this figure well we can do that right here whenever we create the figure and the ax used to begin with so we just pass in fig sized fig size it's as if you're specifying the size of a fig fruit but you're not it's the size of this figure here and then you can pass in whatever you want so let's pass in 10 10 there you go we get a nice big square figure here if we pass in let's say 12 and 8 so you'll see this first number here controls the width the width of the image and then the second number controls the height of the image so that's how you the figure sighs there and let's make this let's make this a little bit smaller we'll go like we go six and four and we're actually pretty close to our original image size so let's try eight and six all right so moving on to our next question how do we save our image to a file how do we save our image to a file well you could just right-click here in Jupiter notebooks and say save image as that's a very easy way to do it if it's just a one-off thing but if you want to save it to a file using matplotlib let's get our basic plots down here again I'll remove figure sighs so rather than caught it calling PLT dot show here at the end let's do peel T dot I'm gonna hit tab and see what all of our options are here and I'm gonna I'm gonna start typing in save fig there we go so let's actually look at the documentation for that using our handy dandy question mark question mark syntax in Jupiter notebooks so we can save the fig and then it just says args and kwargs so you know arguments keyword arguments so basically we can pass in anything to this okay here's the call signature so f names or file name we can pass in the DPI if we want we can pass in the orientation bunch of different stuff but the main thing that we're gonna want to look at here is the file name so if format is not set then the output format is inferred from the extension of f name so in this case let's say we want to save this as a PNG we can say my plot dot PNG and it will save it as a PNG because matplotlib sees the extension here and knows to save the file as a PNG file so let's go ahead and run this and you'll see that it actually it pops up here as well but let's go back over to our terminal let's do a little LS and there we go my plot P&G let's uh let's open my plot dot PNG and here is that image right there look at that and so if we if you blow it up really big you can see it does get a little fuzzy that's where the DPI argument can come in handy but we just saved out our image to a file so there we go I'm going to save this notebook really quickly make sure to save as you're going along so you don't lose your work all right how let's go to our next question how do we change the font size title axis labels and tick labels alright let's check this out so we've got our normal our normal plotting formula right here I'll remove the fig sighs okay so we're gonna plot so first let's let's set our title and our access labels so I'll do a X dot set title remember and we can just say you know damped oscillation and we can do a X dot set Y label and we can just say that this is the you know let's just call it Y label for now Y label and a X dot set X label and we'll just call this X label alright so we do that and you'll see that matplotlib goes ahead and puts a title X label Y label but they are the default fonts here and this you can do this a lot of different ways you can do this right here which which will be the way that I'll show you you can just pass in some font parameters so if we do a font size equals 20 let's say then there we go our title is now bigger and we can also pass in fontsize for let's let's maybe do 15 for these labels here and there you go you see that our X and our Y label are now different and then for our our tick labels this is going to be a little different so usually you can just follow this this kind of formula here for the tip labels we're gonna do something different and that's using the ax I'll come out here ax dot if you start typing T and then hit tab this tick params here so I start typing tick tick params and let's take a look at this so tick params changes the appearance of ticks tick labels and gridlines and you tell it which axis you want to apply it to so in this case we can say ax dot tick params let's maybe say access equals both let's apply it to both axes and in this case it's actually going to be label size and we can let's set this equal to 10 just to show you that it's working let's set it equal to 30 at first okay so that's nice nice and big way too big so equal to like 7 so they're pretty small now and I just want to your to remind you right now that the documentation is going to be your best friend so in the case of text for example here are the different things that you can pass in for text we passed in font size for the title and the X and the y labels we could also pass in let's see color let's just show you that really quickly we can pass in color equals you know red or whatever and then if we just let's just Google you know matplotlib tick params and this will take you to this documentation page right here and you can maybe find a little bit more information you could find some examples you can see this example usage right here this documentation is going to be very helpful for you and also the documentation that you can find in Jupiter notebooks just by doing the question mark most if not all actually of the the documentation from online for some of these things can probably just be found right here like this this usage right here it gives you a gives you a nice example so remember if you have a question consult the documentation and there because there there can be a lot of different ways to do this and I'm going to I'm gonna drop this as a reference here right now we're not going to go into it too much because this could be a whole other conversation but there is this idea of your your RC params and this is basically what this is saying is you know this is gonna set your this is going to set various parameters for different objects in matplotlib for your entire plot or for your entire jupiter notebook session and you can store these parameter defaults externally in a file and load them as well so you don't have to do this every single time so you don't have to constantly be specifying you know specify what font you want and what size you want etc so for example here's a font dictionary with the various parameters that we want for our font we want the family to be mono spaced we want the way to be bold and the size to be larger which is that's kind of funny and then you essentially say I want to apply these parameters to all font objects and this is how you would do that so if you're interested in learning more about how to set these defaults and how to create your own matplotlib parameter file then definitely do some more research into that we're not going to be going over that today but I will drop this link right here for more research and you'll be able to find all of these links outside of the video by the way in the video resources ok moving on so a very common question very common question and this is something that we didn't actually we did not use this method this way but you might see something like this what the heck does this mean you might see fig dot add sub plots something like 1 1 1 so what the heck does that mean well let's just try it and see what happens so we'll do fig ax equals well you know actually yeah yeah we'll do this so peel T dot sub plots and then fig dot add subplot 1 1 1 let's try that peel T show so this will actually be a little bit more clear if we take away our axes so let's create a figure let's create a figure first so fig equals PLT dot fig your but let's not create an axes to start and I'm not going to use this add subplot either so with we with just creating a figure and then doing peel T dot show you see that we have a figure here but there are no axes so and remember that the axes are where the plots actually occur so we don't have anything that we can do any plotting on so if you have a figure and you want to add an axes to it that's where this add subplot command comes in so now if I set ax equal to this and we we run this you'll see that now we have an axes and if you know just to be very explicit about it let's print ax and you'll see that this is an axes subplot type we can even just print the the type of this object and this is of type you know matplotlib AXYZ subplots actually subplot so basically what adds subplot does is it adds an axes to the figure so that's kind of that's kind of step one here so I'll say you know first fig dot add sub plot ads in axes to an existing figure so then there's a question about what is this one one one thing mean so the I will admit this is this is very confusing let's just look at the documentation for subplot and once again kind of unhelpfully we just have these args kwargs here but let's let's look at the call signature this is where the useful information is going to be so ad subplot it looks like you have different options for using the ad subplot method and you do in fact so the first way you can call it is how many rows do you want how many columns do you want and then what's the index of this plot so let's use that really quickly so imagine that I say I want I want two rows I want three columns and this is going to be at index number let's say index number two so it's hard to see okay let's get let's create a basic Axios to work with first so we'll do dot subplots here and then we'll create a second axes on top of it there we go so this hopefully this makes a little bit of sense here so essentially what we're saying is we're creating our first axes which is this graph back here and then add subplot we are going to imagine that we're creating that we're creating two rows and three columns worth subplots and we're going to only create an axes at index number two so let's say that we wanted to demonstrate this a little bit more clearly let's also add let's say AXYZ number three and we're going to add this at index let's say one two three let's say out of it four so we're gonna add it in this bottom corner here and there you go so add subplot is basically a way of saying hey I have a figure I want to create an axes on it and and I want it to be in this specific location and I want it to appear as if it's in you know a row of two rows three columns etc so that's kind of one of the ways that we can use it here going back to the documentation you'll see that we also can pass in just pose or axe so let's take a look at what those are so POS pose position is a three-digit integer where the first digit is the number of rows the second is the number of columns and the third is the index so this is why I think this is kind of confusing because you can basically take these methods and just squish them down with no commas and this is exactly the same thing so we're saying hey I want to create subplots as if there were two rows and three columns and I want to create us create and axes at index two we run this we get the exact same thing so if you see something like fig add subplot 1 1 1 well what does that mean well let's go back to just creating our figure figure all right so fig dot add subplots subplot 1 1 1 so this is saying I want one row want one column and then I want to create an axes at index one within this thing so this is basically saying I want one graph you know nothing too tricky there and actually this is the exact same as calling it with no arguments there so I think this whole thing can be rather confusing personally whenever you have the option to create subplots in the very beginning here whenever this makes sense for you I would say do it this way do it this way and then you get your list of axes objects there and then you can you know you can manipulate them individually that way if you really like using this way that's fine but just know that there are there are kind of a lot of different ways that you can do this and it can get very confusing so try to pick one way and stick with it and I would recommend if you I would recommend just sticking with this to begin with and by the way just to show you that we can in fact plot on those axes just like we would plot on any other axes I'm going to do a x equals fig that add sub plots a X dot plot let's do x and y let's create another one x 2 equals fig add subplots let's create 2 rows 3 columns will do index 3 this time and then a X 2 dot plot X Y and there you go we got our big graph which is this first one that we created and we got our small graph which is that second one that we created ok let's move on another common question is this one how do I change the tick frequency the tick frequency on the X or Y axis alright so let's go back to our normal way of doing things so plots subplots will do X dot plots X&Y and then we'll do PLT dot show so you'll see here that the current X ticks are basically just where matplotlib thinks that they should go and sometimes this will work and sometimes we'll want to change that so if we go up here to ax dot I'll hit tab take a look at what we got here and then I'll start typing in set and you can see all the things that we can set and let's scroll down and you'll see set X ticks this is the one we want to go for we can look at the documentation if you want so we basically just pass in ticks and we just pass in a list of x tick x-axis tick locations so let's try that let's say let's say we want tick locations at let's say negative let's change this from even numbers to let's say negative 5 negative 3 negative 1 well well add 0 as well and then 1 3 5 let's see what happens and there you go we have now changed the X ticks from even numbers to odd numbers including 0 and we can just leave out some of these tick labels if we want you know let's take out 3 and there you go like maybe this is how you want to present it for some reason and you can even do stuff like let's add in negative 10 for some reason and you know there you go so you've now kind of expanded the graph to the left and added an X tick label there so it's just that easy and you can you know you could do all kinds of different patterns and stuff with this if you want but that's how you do that alright this next one is one that I use pretty regularly so how do you set the y-axis limit so whenever you have a normal graph like this matplotlib just tries to determine what kind of y-axis you want and sometimes that'll work and sometimes it won't so the way that we can do that is if you do a X dot set and then let's just try to set Y Lim there you go so while am you can pass in the bottom you can pass in the top you've got some other options here but the way that I typically like to use it is you just say a X dot set Y Lim then you pass in a minimum and a maximum so maybe we want to look at this from 0 to 150 and there you go that's what this graph looks like from 0 to 150 or maybe we want to look at it you know from negative 100 to 100 and there's what that looks like so you can use this set Y lamb to shrink or grow your y-axis okay these next two I think are pretty cool because they're they're a nice way of kind of cleaning up your graph by removing things if you don't want them so I do use these sometimes as well so let's say how do you remove ticks and tick labels so let's get our plot here and in this case we are going to go back up to this set X ticks method and set Y ticks set X ticks and we're just gonna pass in an empty list and there you go so we're basically saying you know hey we don't want to we don't want apply any X tick labels and we do the same thing with Y and we can just make this graph totally context list which is you know admittedly not very helpful but it does look nice and clean doesn't it so you know if you want to do this for any reason there you go there you go so related to this how do you remove the axis lines grids etc so let's say that we want to remove these lines around the edge well let's get normal plot up here delete these and if you come in here to ax dot set and then set axis set axis set X is off tada so let's try that see what happens set axis off and there you go no axis lines around that now by the way we didn't have any gridlines here but if we did this would affect the gridlines as well so let's let's actually show that really quickly if we do a X dot grid and then pass in true here we get this nice grid at the tick marks and now if we do set access off so this not only turns off the aligns around the graph but it also turns off those grid markers okay next question rolling right along here so this is a very common one how do you rotate access text and this isn't something that we've had to use really because our access text has looked fine but a lot of the time you might have very long labels here and they all run into each other and you can't read anything and so you're like well I need to I need to rotate these so that I can so I can see them so I'll show you had a rotate your axis your axis take labels and we're gonna go back to our friend the tick params method so ax dot plot XY feel T dot show all right let's rotate these x axis labels so we're gonna go back to a X dot tick params and if you look at the documentation or if you remember you can remember that we can pass in which axis we want to do it to and then we can pass in any parameters that we can apply to these these labels here so I'm going to pass in the parameter rotation equals let's try 45 and there you go now they're nice and rotated and if these labels were very long they would not run into each other okay we're getting closer to the end here we got a few more questions to go through though so let's keep rolling scatter plot marker size or let's let's change this and do a question how do you change the scatter plot marker size so let's get our graph here in this case we are going to need to change this to scatter look at that so remember that we had like a thousand points here that we're plotting so that's why you see so many different points here so for the size well let's just look at ax dot scatter this is a great habit to get into by the way just look at the documentation check out what the parameters are this will help you a ton a ton okay so we're gonna be going for this s parameter here and if you look down here under parameters s the marker size in points squared okay cool let's just tinker with it and see what happens you can also pass in an array like so you can pass in a shape for every every oh sorry a size for every point but let's just start by passing in s equals let's try 10 was 10 do 10 makes it smaller so if we can go down to 5 even go down to 1 look at that and then we get these kind of really tiny little points are so we can change the size that way what if we wanted the size to vary with the x the x axis here so as our points have larger X values maybe they get bigger let's try that hey look at that so they start off really small and actually I think we do have a point of zero in here which is probably why this is invalid value encountered in the square root so we might be let's try adding a very small value to this like zero point one there we go that fixes it so you can see here that our sizes start out very small and then they get bigger as we go along and so this can lead you to do some kind of you know cool effects I don't know if they'd be useful for what you're trying to plot but they can look pretty cool at least you know let's do let's even do X plus 0.1 let's multiply by 10 or something so we get really a bigger effect there we go or we could pass in we could pass in Y for example and now it looks like because we can't have negative sizes now we're only showing the points for the positive Y so maybe we make this absolute value of y so that looks pretty funky or we could do what does this go down to like negative 100 so we could do Y plus 105 there you go and this is kind of big so you know we're just tinkering around at this point this doesn't necessarily yeah so there you go you got your size option and by the way this is going to be how you would do this across kind of a lot of different graphs and and actually in fact some other some other plotting libraries in Python as well so that's why I wanted to show you this that you can pass an array here for the size and you can also you can also do the same thing for color and let's actually show you that really quickly and this will be our next question how do you color and size every individual point well you just pass in arrays you just pass in arrays here so let's take this I'll take it back down here let's get our X you know I'll do like I'll do X plus 0.01 or 0.1 to avoid that whole error that it encountered and we'll multiply multiply that by 10 and then we've also got this C parameter I've got the C parameter so let's try passing and let's try casting why in for that and you'll see that we get some kind of interesting thing going on here let's uh I'm gonna pass in a DC map value this is a color map this is a way that you let matplotlib know hey I want to use a particular a particular group of colors so we got some cool warm going on and this actually is not doing what I would expect it to do I would expect the values to change with the Y here let's take a look at ax dot scatter so you can pass in size you can pass in si si is a color a sequence or a sequence of color there we go all right it looks like looks like was able to fix it by some of these x-values if you look at X here so we're going down to negative five and so if we add 10 to our X so that we make sure all of our sizes are positive then it looks like things work as we would expect them to okay figured that out so this is coloring by every point and also doing the size by every point and this might not be very helpful for this specific graph but let's say that you've got you know a hundred different points on a scatter plot and you want to color them by one thing and you want to size them by another this is how you would do that and as we just learned make sure that all of your sizes are positive because you can't have a negative size all right here's one that comes in handy for some data science problems and this is how do you plot using a logarithmic logarithmic axis so essentially sometimes you might have values that grow very large very quickly that grow exponentially you might have a histogram where everything is clustered over on one side so these are a couple of the different use cases where you might be interested in using a logarithmic access so I'm going to just copy over some code that I have over here so our X exponential is just going to be an X range from 1 to 10 with a step size of 0.1 and then our Y values so we're gonna do the exponential function so e to the x times 30 so e to the x times 30 now if you plot this you'll see a graph that looks like this now this is not particularly helpful right we are down here it looks like very close to zero and then suddenly it just shoots up and but you'll see that our scale of our access here our axis or Y axis is actually what is this like 10 to the 129 so this is it this is a monstrously large number it's just the fact that our y-values are growing they grow so large towards the right side of the graph here that none of this is really on the plot so this is not particularly helpful so how might we change the y-axis to be a logarithmic axis so this is a kind of a classic example of when you might want to use a logarithmic axis is whenever you have some of your values that are in wildly different ranges than other of your values so in this case we come down here to our where axis is our axes apologize or apologies a ax dot in this case we're gonna use this set Y scale and we just pass in log and there is a beautiful straight line that shows the linear relationship between X and log of Y so this is a nice thing to do for graphs like let me copy this up here just so you can have a reference it's a nice thing to do for graphs like this as I said before it also is nice for histograms where you have a lot of values kind of clustered on one side so and this is a pretty common thing that comes up in data science as you're looking at different types of data okay rolling right along here how do you plot two histograms on the same charts is a pretty common correct question this is something you should know how to do how do you plot two histograms on these same chart this is something you should know how to do because it is a classic example of plotting two graphs on the same axis so let's let's see so if we just plot our X&Y here so we're gonna need some data that we can that we can plot in a histogram and I went ahead and created a couple of normal distributions let's go ahead and throw those in here so I'm just creating normal distributions they have a certain mean they have a certain standard deviation and I'm creating a thousand different data points so if we do a X dot hist of dist:1 let's create these outside here so we're dealing with the same data because this will create a new data set every single time and here's our first histogram and now you just do another histogram on that same axis so tada there you go two instagrams in the same chart same axes it's as easy as that now whenever we've got something like this going on I'll show you another one of my favorite tricks here which is using this alpha parameter and alpha basically controls the transparency so let's actually do it alpha on the second graph first and let's let's take this down to like 0.7 and if we take the Alpha down to 0.7 you can now see that we can see through the orange graph to the blue graph and if we do an alpha on both of them then we get this pretty nice coloring effect here where both of them are a little bit muted and we can see the distribution of both of the graphs now just because I think this is helpful I'm also going to throw labels on here so just one dist:2 and then remember if we run this this doesn't do anything until we call a X dot legend and there you go that's how you do two histograms on the same chart all right three more just three more questions this is a marathon isn't it all right so how how you draw horizontal and vertical lines so you might want to draw some reference lines for people to look at for various places in your graph so let's do to do let's come up here we'll get our little template and do a X dot plot X&Y alright so let's say that we want to draw a horizontal line at 100 a horizontal line at a let's maybe do 50 and negative 50 and then let's also do a vertical line at negative 4 so how do we do that well these nice little a X dot H lines function and a X dot V lines function so let's look at the documentation for H line and H lines so you pass in a XH line so you pass in Y which is a scalar or a sequence of scalars this is the Y index is where to plot the lines so we're gonna say we wanted it at negative 50 and 50 and then you also have to pass in the X min and the x max so do you want the do you want the line to go from negative 4 to 0 do you want it to go all the way across the graph and in this case we wanted to go all the way across the graph so I'm actually just going to say give me the minimum of my X data and give me the maximum of my X data and there you go so we're going from the minimum of our X up to the maximum of our X and if we wanted this to go you know shorter for some reason well we could have it go from 0 to 0 to 5 or whatever okay and then the same exact thing for a X dot V lines so in this case we just want a line at negative 4 and let's go from the minimum of Y to the maximum of Y and there you go and if you want you can pass in colors so you can pass in colors equals red or you can pass in a list of color so let's say red and blue and there you go so now we're coloring our lines by different colors there okay two questions left night I saved two of the cool ones for the end so we're gonna talk about images and we're gonna talk about colors so how do i display an image using mat plot lib how do i display an image so this is something this is pretty common you might want to do this especially with neural networks with deep learning if you're dealing with image data you might want to look at some samples of your data in a jupiter notebook so let's go to an image which i like i think is very beautiful so i'm gonna get an image from unsplash unsplash has a lot of really nice photos on there you can you can download them they've got the art that the photographer who took the photo so you can go to this photo if you want which has this URL here this funky-looking URL or you can find your own photo on unsplash that's totally fine and let's go ahead and download for free thank you - when is this URI ml C n on unsplash for sharing your work with us we're just gonna use it for a minute all right so let's go let's close this out let's go back over here let's actually go to our terminal and we're going to need to move our let's see down loads our image file from downloads so if I do LS tilde for a home and then downloads for our downloads folder here you'll see that here is our JPEG that we just downloaded so let's move em V from the Downloads URI into the current folder where I'm at if I do a little LS you'll see that I just moved that image here perfect so how do you display an image in matplotlib well if you look this is going to use the P L T the the PI plot module here so P LT dots I am for image read I am read so if we read in the image so we have an image it's a it's a JPEG let's read in our image here it will just call it let's just call it image all right so it looks like we can only use PNG files with out pillow installed but if we install pillow here then we can handle more images so let's just hop over to and actually I'll show you how to do this in Jupiter notebooks you don't even need to leave Jupiter notebooks necessarily so let's say which pip and our pip is going to be the one from our virtual environment so let's do pip install pillow let's try that out there we go is that easy very quick to so I'm gonna go ahead and delete this cell let's try reading it in again and there we go we read it in successfully this time now that we have pillow installed so what's the type of this object here well this is a numpy array so if you already have a an image file that you want to read in you can use PLT IM read and that will convert it into a numpy array if you already have a numpy array where your image is stored then you're good to go and let's take a look let's take a look at our image data here all right you can see that we've got some nesting of these square brackets so this is going to be a multi-dimensional array here let's look at the shape let's look at the shape so here we go we have one dimension of our image we have another dimension of our image and you'll see that our third dimension is this is going to be our RGB values this is going to be the color of the essentially so let's show you how to plot that let's do our normal our normal template here normal formula peel T dot so but flats and then peel t show and then in this case we're gonna use a special method called I am show and we are going to pass in our image there and look at that it's kind of small and it has an axes or it has it has your x-axis and your y-axis there but this is our image that's pretty sweet I think that's pretty cool so I'm going to turn off I'm going to turn off the axis and I'm also going to specify a larger fig size now in this case I want the same dimensions as the original image so I'm going to say image dot shape 0 and then did it does since this is gonna be this large number here maybe I'll divide this by let's divide it by like 300 image ape 1 divide that by 300 and there you go we have a much bigger image in the axis the y axis and the x axis are turned off so that's how you display an image now I want to show you a couple of other cool tricks really quickly while we're here and that's how you can display just one of the RGB channels using matplotlib here so let's go back up here copy this come down so now instead of plotting the whole image let's plot all of the rows of pixels all of the columns of pixels but let's only plot the first of the RGB channels so I think this this will probably be the our channel the red Channel and let's take a look at what that gives us well look at that so this adds a color map to it so this is gonna add a color map by default these are actually just these are just single valued pixels so you can think of it as kind of a black-and-white image here but then matplotlib chooses to add a color map to it the default color map here so what if we actually wanted to see this as a grayscale image well that's where this nice C map parameter comes in again we say C map equals gray and now we have that nice grayscale image and this is remember this is just using the single channel the red Channel I believe so let's try another channel and this is where you can get into some some really cool image analysis stuff so this is a different channel of the image and you can see that you can see a you can see a lot of different detail based on what the original color was so in particular notice notice up here how this is very dark up here on the top left and then up here you've you've definitely got a lot more lights a lot more like going on and in this first image the land here is much brighter than it is in this second image here so that's how you can start doing some cool image analysis using matplotlib and you can you know this is just an umpire array you can go in there and you can edit this however you want you could feed this through a convolutional neural network you know do some do some machine learning do some deep learning on that and plot your results using matplotlib alright our last question and that is how do I find what colors are available in matplotlib this is a great question because this whole time you know we've just been doing sea map u equals gray and like you know equals R and B or you could type in red and blue and it's it can be very confusing so from matplotlib let's import colors and we're gonna import it as M colors for matplotlib colors here so if we do M colors do some documentation you'll see that the colors module is a module for converting numbers or color arguments to RGB or RGB a and it also includes functions and classes for color special specification and if we do M colors dot and then hit tab you'll see some things that look like list of colors here so let's check out base colors look at that we got our B we got our G we got our our K is black over here if you keep scrolling down CSS for colors so this is a huge list of colors here that you can use you can use any of these in matplotlib if we scroll all the way to the bottom you'll see one of my favorites xkcd colors check that out this is from a great xkcd comic if you have not just just google xkcd colors and you will find what that is talking about but if you go to the see names now this is going to be kind of the huge list of a lot of the standard colors in matplotlib and there's actually someone on stackoverflow who did a really nice visualization of these different colors and i'm going to paste this link here so that you can check it out but this is the very last thing in the video very last thing we're done yay we're done I'm gonna show you this beautiful representation of a lot of the named colors in matplotlib so let's copy this code over here this is kind of a lot of which is why I'm not gonna type all this out for you and check that out all of these colors with their names that you can use in Matt plot lib pretty beautiful I would say pretty beautiful and that's it we're done what a marathon of a tutorial here that was a lot of information don't feel bad if your brain feels like it cannot hold all of it I'm gonna leave you with a little bit of homework section number are we up to section number six is that where we are does it even matter at this point let's go here section six so we are now at section seven actually section seven here's some homework for you if you want to keep learning about matplotlib so number one get your own data set to play around with you oh we got to change this to a markdown so change that to a markdown so or actually let's just use the use the one that we got going on right here so you learn better whenever you're interested in what you're doing so get your own data set that you're interested in and try plotting some things that will be a great way for you to learn and I'll say here kind of a little sub-bullet of this is with your data set try going back through all of the things we did and that will help solidify a lot of this knowledge in your mind the next thing I would recommend is read through some of the matplotlib documentation and I'm going to point you to two specific places here so the first one is the matplotlib tutorials those are really good they can help reinforce a lot of the stuff that we just went through and the next one is the examples so just looking at some examples and the code that generated those examples and the final thing which I will recommend is reading about the let's just say learning about the PLT style or the state based style of using matplotlib so this is kind of a a way of using matplotlib that we did not go into I don't necessarily even recommend that you use this style most of the time I think that the the way that we've been doing things is how I would recommend doing them but other people are going to use this and you're gonna see code examples that use this and so it's good to at least learn a little bit about what it is and with that I want to say thank you so much for going through this tutorial I hope you've learned a ton and and are able to apply this to your own data science work and if you need additional data science tutorials definitely check out other project data science videos and courses and leave a comment let us know what you want to learn and will help you learn it alright that is it happy learning project data scientists and we will see you later bye
Info
Channel: Project Data Science
Views: 1,643
Rating: undefined out of 5
Keywords:
Id: axSTGczvYIE
Channel Id: undefined
Length: 146min 1sec (8761 seconds)
Published: Sat May 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.