Bar Charts or Bar Graphs | Matplotlib Tutorial Part 3 | Analysing data from a csv file

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all welcome back to the lecture series of matplotlib and today we will be discussing bar charts bar plots or bar charts are used to visualize a continuous variable versus a categorical musical they provide a great way to visualize the magnitudes of a quantitative variable in terms of a qualitative variable now here we are talking about two variables continuous and categorical that means the bar charts or the bar plots are bivariate in nature that means we need at least two variables to plot them now depending upon the language or the software or the tool that you will be using or the other peoples are using to create bar charts uh the bar the bar of the bar charts the height of the bar charts can either represent the maximum value or the uh average value of the particular continuous variable or numerical variable now what do we mean by this you will get to know once we start creating bar charts so let's first create our first simple bar chart so i've imported some libraries and if you're following along uh from the previous videos then you know how we can import matplotlib but let me just give a brief about it so how do we import matplotlib we import matplotlib dot pi plot as plt that's the way we import matplotlib and i have imported numpy as well and i'm going to use the style a different style it is a inbuilt style there are a lot of inverse styles available in matplotlib so i'm going to use plt.style.use i'm going to use ggplot and now i have created here four dummy variables on which we'll be covering uh different types of bar charts and a list of hex values for color values that i will be using for colors so let's say i want to represent the salary of an analyst on the basis of their age so what i'm going to do i'm going to create a simple bar chart so we are we use plt dot bar function in order to create a bar chart we use plt dot bar function in order to create a bar chart now what do we want on the x axis i want employee underscore ages i mean i want the ages of the employee to be on the x axis and this analyst salary analyst salary to be on the y-axis to be on the y-axis now suppose i want to change the color suppose i want to change the color of my graph so i will use color equals to and i can pass any color from the list of colors i have created so i'm going to pass any of the color let's say three and let's say i want the borders of the bar to be visible i want the borders of the bars to be black so the borders are called edge colors so it is a ec is the short form of edge color and k means black now i'm going to label the x-axis i'm going to label the x-axis plt.x label so i'm going i'm showing ages over the x axis and plt dot y label i'm showing salary over the y axis and then i'm going to write plt dot show so as you can see this is my first simple bar chart or we can also say a vertical bar chart because the bars are vertical in this case so as you can see the bars are vertical and it is a very simple bar chart now let's say i want to compare the salary of an analyst and let's say i want to compare the salary of an analyst and a scientist so what i want to do i want to create two bar charts on the same plot but what i want i want all uh i want the grass on this uh same figure and i want so that i can compare so let's say we try and creating two bar graphs from the uh two bar graphs on the same figure so if you're following along from the previous videos you must have seen that we can draw two charts on the same two charts on the same figure by just typing employee underscore ages let's say i'm going to employ industries analyst underscore salary and plt dot bar we just have to write another plt dot bar statement so that they will be on the same figure employ underscore ages scientist underscore salary and let's say we hit plt dot show we haven't customized anything these are very simple bars but these are going to be very simple bar plots so as you can see we can see only one color this is this is because this is because the bar charts are overlapping over each other the bar charts are overlapping over each other what you can do you can reduce the transparency of any one of them so let's say i reduce the transparency of salary and i keep it like 60 percent so so as you can see the analyst salary is hiding behind the scientist salary that's why i'm not able to see the analysis out so we cannot create bar charts two bar charts on the same graph in this way we have to opt a different approach so how do we create it first we need to create indices basically positions that will represent on the x-axis indices equals to np dot a-range and i'm going to keep create an area of length employ underscore ages and then i'm going to fix the width of each bar in the bar chart so width is equals to 0.25 you can keep it any way you can keep any width you want but let's keep it 0.25 that will be not much wider not much thinner now i have created the indices in the width so let's say i want to create uh i want to represent the employ the ages oh sorry i want to represent the uh salary of the analyst so how do i do it in this case instead of passing the employee underscore ages on the x axis i pass indices and then analyst underscore salary then i pass the color i want color equals to colors let's say 2 and then i pass the edge color i want and then also i pass a label like what is what what this bar will represent this bar will represent analyst on a list and i'm now going to copy this and then paste this so what i'm going to do i'm going to keep the bar for scientist salary on the right hand side so i'm i have to shift my indices i have to shift my indices by width so indices plus width and then also one more parameter we need to pass is basically the width parameter which will be equal to the width of we have created and then same over here we have to do width is equals to width and then we need to change the color let's say i choose another color and then this represent scientist and then let's label the x axis the x label basically again showing the ages and plt dot y label is showing the salary part and then plt dot show plt dot show okay i didn't uh change here it should be scientist underscore selfie so now you can see and also we call plt dot legend function yeah now this is good to go so now you can see we have created two bar charts basically these type of bar charts are called grouped bar charts these type of bar charts are called group bar charts so you can see we can easily compare the salary of an analyst and salary of a scientist now let's say i want to what i want i want to compare the salary of an analyst to that of a scientist to that of a developer so basically what i want i want three bar charts on the same graph so we can do that as well we just need to adjust the indices of the bar but before going to that just have a look at this graph as you can see the x axis is basically representing the numbers 0 2 4 6 8 10 but we want ages we want ages so in order to fix this what we need to do plt dot x text basically the values you see on the x axis are called x text so text parameter accepts the position the position is basically the indices and the labels parameter basically accepts the label which you want to be visible on the x-axis so the labels will be employed underscore ages so now you can see the ages are visible 24 25 27 26 29 28 31 now the bar chart this group bar chart is complete now let's move forward to the next group bar chart where i was talking about comparing the salary of analyst to data scientist to that of developer so so as you can see the salary of analyst is on the left-hand side and salvia scientist on the right hand side as i said that i will shift to the right because i added width into the indices so what i'm going to do i'm going to place the developer salary to the left most and that way the analyst will be in the middle so how do i do that plt dot bar indices indices minus width indices minus width uh developers underscore salary then colors sorry color equals to colors any color 5 let's say ec equals to k and then label equals to developer and width is equals to it so and then i guess the the code looks fine to me now now you can see we can easily compare the salary of an analyst salary of developer and salary of scientist we can increase the figure size as well we can increase that by plt dot figure plt dot figure fixed size is equals to eight comma six so as you can see we have created three bars on the same figure so this is kind of a group bar chart again so we have covered simple bar chart and group bar chart now let's say uh we want to we i there is another bar chart type i want to show which is the horizontal bar chart so first of all in order to create horizontal bar chart i need to import a data which will be suitable for the horizontal bar chart so salary is equal to pd dot read underscore csv so this is the data we'll be working on for the horizontal bar chart so why do we need a horizontal bar char okay so let's just create a normal bar chart with occupation on the x-axis and ages on the x-axis ages on the y-axis and in this bar chart you will get to know you will get to know uh why i said uh depending upon the software the bar chart either shows the maximum or the average value basically uh okay so you will get to know over here in this data so let's create a bar chart plt dot bar and what i want on the x axis i want occupation to be on the x axis occupation to be on the x axis and i want salary to be on the y axis let's say i just don't label anything i just show the graph okay so you know it's h yeah so see the x-axis is clearly so congested because the labels are too large to be clearly visible at the x-axis too large to be clearly visible at the x-axis now this kind of data is suitable for horizontal bar chart when you have larger labels which cannot be clearly visible on the x-axis and in order to convert this into a horizontal bar chart we just need to add h after bar so it is rate represents bar horizontal and then we change the color color equals to colors let's say zero and then i obviously we label the axis always pld.x label now if you see the the axis will shift x will become y and y will become x so x level will be x label will be age and y label will be occupation so now you can see now you can see the horizontal bar chart and it is the labels are clearly visible now let's talk about why i was saying the height of the bar shows the maximum value let's say from this data i slice two columns age and occupation then i group by this data on the basis of occupation and i calculate the maximum value i calculate the maximum value for each occupation for each occupation okay okay for each occupation so administration is 59 back-end developer is 45 and let's sort this in ascending order sort underscore values by age so back in developer 45 what is the maximum age so professor has maximum 868. so if you see the longest bar is 68 twists for professor and 68 is very close to 70 is almost 68 only so the height of the bars represent the maximum value the high this is what we mean this is what i mean earlier the height of the bar shows the maximum value of the data shows the maximum value of the data okay now this is what is horizontal bar chart now let's move to another category of group bar charts so basically now we are going to talk about stacked bar charts so earlier when we created group bar charts what we did we created bar charts such that the bars are side by side but in stacked bar charts if you know what our stack is we place things one over the another so in stack bar charts we place bar over the previous bar or on the top of previous bar so how do we actually create it so let's say i want the i i will be creating uh bars for all the arrays for developers for analysts and for salary for scientists so plt dot bar uh employ underscore ages and then again analyst underscore salary and then we create a a label label equals to analyst and then we create then we give a color to it colors equals to five and that's it now i want the salary of analyst salvia scientist as well salary of scientist as well so i create one more bar chart i change the label i change the color now i want the scientist salary the scientist bar to be on the top of analyst bar so what is at the bottom the bottom the analyst salaries at the bottom so we have to mention it bottom is equals to analyst underscore salary now in the same way i'm going to create the bar chart for the developer salary developers salary developers and then let's give a different color to it now the developer salary will be on the top of analyst salary and scientist salary so what is at the bottom the bottom contains analyst salary plus scientist salary analyst salvi plus scientist salary again we label the access we label the access always pld dot y label and then salary plt dot show uh i didn't call the legend function rate plt dot legend so now this is what is start bar chart we will place the bars one over another so that's how you can create stack bar charts this is a type is this kind of a modification to group bar charts only now the last kind of bar chart we will be creating in this lecture will be error bar plots will be error bar plots so error bar plots are useful when you want to compare the when you see the variability in the data when you see the variability in the data like how the how much the data is varying as compared to the mean value as compared to the mean value so for that i'm going to import one more data set which is the which is a very popular data set iris data set so for that i will import c bond import c bond as sns and then i'm going to load that data set sns dot load underscore data set and then iris iris dot head so this is the data i will be creating so let's say i want to create an error bar plot for simple length for only one column so uh how do we create another bar plot there is no special function to it we will be using plt.bar function only so first of all you pass a label which you want to be visible on the x-axis so that will be simple length this can be anything this can be anything it is not particular to it is not particularly to this name this is just a label which will be visible on the x-axis now what your what you want your error bar to show i want my error bar the height of the bar to show the average value of the simple length you can show the media median value of the simple length as well i want my error bar to show the height as the average length of sql so simple underscore length dot mean now there comes the y error parameter so basically it accepts the error parameter basically a error value which actually represents the variability in the data a data set and i will be using the standard deviation iris sql underscore length dot standard deviation and let's say i give a color to it color equals to colors one and then plt dot show so as you can see this is what we call an error bar plot an error bar plot and this particular line is showing the error bar so and this then the uh another name for error bar plot is basically dynamite plot makes it base because it looks like a dynamite so what is exactly showing it is the black bash is showing the standard deviation if you see the height of the bar it is basically the average value of super length we can calculate the average value that is the mean value of c per length by simply slicing the column and then dot mean so it is 5.84 and as you can see the height is approximately equal to 5.8 it is very close to 6 as the value is so the height of the bar is showing the mean value of superlength and the black line is showing the standard deviation so when the standard deviation is large when the standard deviation is large this black line will be longer this black line will be longer or in a very easy language longer the black line higher the variability longer the black line higher the variability larger the black line higher the variability smaller the black line lower the variability so let's create one more bar chart oh sorry one more error bar plot over bar chart next to this so what i'm going to create i'm going to create another bar plot for simple width or let's say petal length let's say petal length petal length and again petal length so when i use another color so that's it so as you can see the black line that is the so what the line which shows standard deviation is very large in pity length so we can see the high there is a high variability in the data set what do we mean by that i simply mean that the average value the other values actually vary a lot from the average value actually varies a lot from the average value let's create uh the error bar plot for the rest of the two columns as well simple width and sql length simple width and petal width sql width and sql width simple width simple petal width and petal width and i'm going to change the color let's say 4 and i'm going to change the color over here so okay i did something oh sorry i didn't change the label over here yeah as you can see the highest variability is in petrol petal length the highest variability is in the petal that means the values are are far away from the average value where the distance basically is large as compared to the variable as we can see in variability the values are varied too much from the mean so these are the bar charts the type of bar charts uh you can draw to visualize your data to analyze your data so what all we covered we covered simple bar charts how we can actually label the bar charts and simple bar charts group bar charts then we covered horizontal bar charts and then we covered stack bar charts for moving on to error bar chart and finishing the lecture so the jupiter notebook and the data set will be available for the download in the description below and i hope you enjoyed the video so please like share and subscribe and have a good day thank you so much you
Info
Channel: GNOSIS
Views: 448
Rating: undefined out of 5
Keywords: matplotlib, python, data visualization, bar charts, bar graphs, data science, data analyst
Id: CESwnns0VSA
Channel Id: undefined
Length: 24min 7sec (1447 seconds)
Published: Sun Jun 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.