Seaborn countplot | What is the countplot? | Seaborn countplot vs barplot

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome or welcome back to this introduction to seabourn series today we're talking about the seabourn count plot so to start off what is the count plot well the count plot is a way to count up the number of observations you have per category and then display that information in bars so you can kind of think about this like a histogram but for categorical data it's a very simple plot but potentially very useful especially when you're doing exploratory data analysis so let's check out the count plot in the seabourn code so let's get started coding up the count plot using seaborn by the way all of the code i'm about to demo is available on my github page okay so first i want to import the seaborn library and then i'm going to load in some data from the seabourn library itself about diamonds and so each row of this data set contains information about one particular diamond i'm also going to narrow down to just clarity equal to si1 or vs2 and i'm really just doing this so later on i have a category with only two options and we'll see that later so once i narrow everything down i've got about 25 000 different diamonds in this data set i'm going to set my styling to be dark grid and now i'm ready to create my first count plot to do that i'll reference the seaborne library and call up the count plot then i just need to pass what column would i like to plot i'm going to be plotting the color column and these data come from our diamonds data frame so basically what seaborne does with this plot is just count up the number of observations we have for each category that it finds in the color column so for example seaborne found about 1500 different diamonds with color equal to j so if you're familiar with pandas this is really just plotting out what we would see if we applied value counts to this column so these numbers here are exactly what we're plotting when we do a count plot one really nice thing about the seabourn count plot is that we can very easily switch from vertical bars into horizontal bars all we need to do is switch this x into a y now we'll see that color column along the y axis and we have horizontal bars instead of vertical ones so at this point you may be thinking that that seaborn count plot looks very similar to the seabourn bar plot but there is one really big difference with the seabourn count plot we are literally just counting up the number of observations per category with the seabourn bar plot however we're getting an estimate for some summary statistic per category so for example you might see the average per each category and we're also getting confidence intervals that are created using bootstrapping so they're really used for two different things however the coding options available to you for the seabourn count plot are very similar to those of the bar plot let's check out some of those options in the seabourn code for our first option let's talk about the order that these bars appear so if i take a look at my count plot for the color of these diamonds you'll see that the bars are not currently sorted based on most popular to least popular they're actually lined up alphabetically from d to j but if we take a look at another column let's say cut now you'll see that the bars are no longer arranged alphabetically so it can be very confusing at first to figure out how seaborn is actually arranging these bars so i wanted to walk you through the process a bit if we take a look at the data types of this data frame so diamonds is the data frame if we take a look at d types you'll notice that we have several floats integers and then we have these three columns that are considered category data types cut color and clarity are all categories this is a special data type and what it means for us is that we can actually check a property of these so let's check the color they actually have this property called categories this is what seabourn is actually using to line up those bars so typically category columns are going to come with this property called categories and seaborne is going to use this to figure out how it should line up those bars so in the first one we're lining up alphabetically but in the second one we're lining up based on the best diamonds first all the way down to the worst diamonds and this is a property that's been set up beforehand by the creators of this data set so seabourn is actually going to try to leverage this if it can't find a property called categories it will sort your strings based on the ones that appear first in the data set or if we're talking about numbers seabourn will sort those in order okay but what if that category's order is not how you'd like these bars to appear the seaborn count plot has an argument called order and you can just pass in a list of how you'd like to order these bars so here i'm passing in a list that starts with j first and goes up to d and that's exactly how we see the bars in the plot but oftentimes we might want to sort these bars either ascending or descending so since this is a pandas data frame i would recommend using the value counts method this will actually sort your bars by the most popular to the least popular if we go ahead and grab the index from here we would see the most popular category is e all the way down to the least popular category which is j then we can just use this index when we create our order for our bars now we'll have these sorted descending but if you prefer to have them sorted ascending all you need to do is just reverse this index which you can do with two colons and a negative one that will just switch the index completely around and now you'll have ascending bars instead if you use seaborne before you know that most of these plots are going to have an argument called hue which will allow you to show off another categorical variable so right now we have the color going along the x-axis but if we'd also like to know something about how color changes with clarity we could just use the hue argument to pass in that other column called clarity so an interesting thing happens here and i wanted to walk you through what's going on and how you can update this at the very beginning of this video i filtered my data set down to just clarities which were vs2 or si1 but here i'm actually seeing a legend for all of the different clarity categories so what's happening is that issue of this is actually a category data type so seabourn is pulling this legend from that category's property that we talked about earlier so if this happens to you and you don't actually want to show off all those different categories what you can do is leverage this one other property called hue order here we can pass in a list of exactly how we'd like these clarities to be ordered i'll do si1 and then vs2 if you pass in a list like this for just those two categories that you have in your data that will be updated and you'll only have those to appear in your legend and you'll only have space for those two along your axes so in this style of plot we're actually counting up how many diamonds we saw for each color and for each clarity subgroup so that can be really useful to see what that breakdown is when you are doing exploratory data analysis like usual there's tons of styling options available to you for the seabourn count plot so let's check it out in the python code when it comes to styling the count plot probably the first thing that you'll want to update are the colors having one different color for every single bar might be a little jarring to someone who's trying to look at this visual so my recommendation here unless you have one particular bar that you're trying to highlight you might decide just to switch all of these bars to the exact same color and you can do that through this color property and just passing in a string that represents the color you'd like to pick of course if you would like to have each bar with its own color you can use this palette argument which will switch your bars over to a seaborn palette and seabourn has over 100 different palettes that you can choose from the other nice thing about the count plot is that other keyword arguments that you pass into the count plot will get passed on to the matplotlib bar plot so if you have other arguments that you know about from the bar plot you can use those here as well for example we could increase the line around the outside of each of those bars we could change the edge color of that line and we could even add a pattern to these bars leveraging this hatch property so those keywords are all going straight through to the matplotlib bar plot so i hope you enjoyed learning all about the seaborn count plot if you want to see more about the seabourn bar plot you can check out my past video about that and if there's any other videos that i should include in this intro to seabourn series be sure to let me know about them in the comments section below thanks so much and i'll see you next time civilization the temple temple [Music] i'll do it
Info
Channel: Kimberly Fessel
Views: 17,225
Rating: undefined out of 5
Keywords: seaborn countplot, countplot, seabornĀ countplot, seaborn countplot order, countplot order, seaborn countplot hue, python countplot, seaborn python countplot, seaborn count plot, seaborn countplot color, seaborn barplot vs countplot, barplot vs countplot, what is countplot, countplot hue, horizontal countplot, countplot vs barplot, seaborn countplot vs barplot, seaborn countplot palette, countplot seaborn, seaborn countplot order by count, countplot order alphabetically
Id: 8U5h3EJuu8M
Channel Id: undefined
Length: 9min 3sec (543 seconds)
Published: Mon Apr 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.