Excel Statistical Analysis 06: Frequency Distributions, Visualizations & Skew for Quantitative Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to excel business statistical analysis video number six and in this video we got to talk about histograms and frequency distributions for quantitative data now in this video we have data with one variable and it's going to be quantitative that just means number for tabular we'll do frequency distributions and graphical it'll depend on whether it's discrete then we use column and bar or continuous then we use a histogram now on the discrete quantitative data sheet we have a data set here this is from isaac's italian restaurant which has four banquet rooms that are used on the weekends the data collected number of rooms used in a day over the past year so on january 4th one of the four banquet rooms was used january 5th to reuse january 11th four were used and our goal is to create a unique list from zero to four as our category count how many times each one of those number of rooms was used create percent frequency and then a column chart all right we click in a single cell and we're going to use our keyboard to create a pivot table alt n v t i want to put this on the existing sheet cell f8 click ok number of rooms when we drag it down to rows we get a unique list those are the five possibilities for any given day we can use none of the rooms all the way up to using all four banquet rooms now we drag the same field down to values and you got to be careful when you're dragging a number down here and your goal is to count you got to remember that the default calculation for a number is sum now i say this because i have forgotten many times and then when i right click show values as and make some calculation from here the base aggregate calculation is wrong so summarize value by and we want count now with a frequency distribution eyes you can see that two rooms are used most often and the frequency distribution can be used to help him plan employee schedule up at the top i'll type frequency and enter now i'm going to copy the whole pivot table control c and four rows below control v now i want to right click always making sure i have count for frequency distributions and then show values as we'll do percent of column total all right so 43 of the time only two rooms are used about two percent of the time we don't use any and almost eight percent we use all four rooms now let's visualize and i'm going to use this pivot table insert charts up to the column we'll do clustered we only have one series of numbers here scrolling down and moving it right click hide all field buttons there's some chart junk i'm using the delete key i want to click on the title i see the solid line equal sign and there's the title i've already created and enter i definitely want some labels so axis titles i want to link this so equal sign frequency and enter equal sign and number of rooms used in a day now that chart's looking pretty good but i'm going to rename this label here f2 and change it to percent frequency now let's learn a different trick than we learned last video because this is similar to what we used last video last video we're using categorical data here we're using discrete quantitative data and we definitely need gaps because we're never going to use 1.5 rooms only one and then jump to two now i built this table here because i want to put these percentages at the top of each column and we have an option for that so click the green plus and instead of clicking data labels i can click the arrow more options and you could see right now it put the actual frequencies but i want to come up to value from cell check and look at that i can highlight these percentages and now when i click ok oh wait a second they're not at the top uncheck value check values from cells now i probably want to come here right click number formatting probably we could go down to zero it'll be less cluttered that way click ok and so now isaac has these frequency distributions and visualization which can help him to plan employee staffing and in chapter 5 we'll learn and treat this as a probability distribution and consider these as probabilities to help estimate how the restaurant is going to staff in the future now i gotta show you a cool trick because what if isaac says well what about these numbers here but give them to me separately for saturday and sunday control c come a few rows down ctrl v and i actually want to add frequency so i'm going to drag rooms used and i'm very carefully going to drop it above frequency i see it says sum so right click summarize values by and we want the correct aggregate calculation count i'll type frequency and now watch this i'm simply going to drag day and very carefully drop it above and when i see that green line in rows i drop it and bam now we have a frequency and a percent frequency for all the rooms but by the particular day so it looks like sunday we have one and one with zero rooms but look at this sunday four rooms were used one time but on saturday seven times so by dragging this extra variable in here isaac gets a little bit more information to make better decisions now let's scroll up now if we scroll over on the same sheet here's a data set with discrete data if we're considering dick's hamburgers on 45th avenue in seattle oh man they make great hamburgers and fries the data collected this is the number of arrivals to dick's hamburger at 45th avenue in seattle at saturday noon lunch during a one minute period that means one observation we saw zero people come in a one minute period down here there was lots of instances where one person arrived control down arrow since i already sorted this lots of eights and all the way down to during one of the one minute periods it was observed that 14 people arrived during that one minute period well this is discrete so we create this pivot table here's the unique list from 0 to 14 here's the frequency percent frequency and then here's the chart now the importance of this is that later in chapter 5 we'll learn that this is a poisson discrete distribution and we can use certain numeric measures to calculate and predict using probability in this chapter we're just learning how to create the frequency distribution and the visualization so back on this sheet now you're going to try it yourself now after creating your dicks drive in tabular in graphical summaries we want to go over to the sheet monthly report and i'm going to start on the answer sheet and we want to talk about dates as a variable now dates are numbers they're discrete because you have one day two days and so on you can also group dates into categories like month and quarter but the thing about dates is that sometimes this quantitative variable can be considered a continuous variable like the time series chart we talked about in chapter one or illustrated here as a line chart or we can treat it as a categorical variable and so we want to see how to do both of these we're going to go over to the sheet monthly report in addition in this example here for tabular and for our visual we're not going to be counting we're going to be summing sales and cost of goods sold and we're going to see one of the most powerful tricks in excel that's not taught in the textbook but that you will inevitably use in your job all the time we're going to take a bunch of dates and the pivot table will automatically group them into months and years and once we have months and years we just drag and drop a number field sales and cost of goods sold and the totals for each month in each year are automatically created so back on monthly report we're going to click in a single cell i'll use the keyboard alt n v t now this is an xl table with the name f sales so that shows up in table range i want to put this in f7 click ok now we see all the individual dates but as soon as we drag the date field down to rows it's going to automatically group into years mislabeled as dates but those are months and quarters now we don't want quarters and notice the field list shows quarters years and dates down in the pivot table cache those groupings are memorized now let's uncheck quarters now if your column of dates looks like this instead of automatic grouping that's because up in options data tab this option is checked but no worries to manually group right-click group and you get to select whatever time variable you want now back to our data set and if we look at the pivot table years are showing and they're collapsed to not show the month so i want to right click expand and collapse expand entire field and that is amazing we now have our categories so when we drag this number field sales down to values we know it's a number so it defaults to sum now let's add some number formatting remember you don't have to highlight all the cells this is inside the values area of the pivot table so you right click number format and i'm going to choose number with a comma separator and will show zero decimal so it's less cluttered notice i didn't choose currency or accounting because then it would show a dollar sign in every cell by choosing number click ok it's not as cluttered but we do need to at the top f2 need to indicate the unit so i'm going to indicate the unit in the column header drag cog cogs cost of goods sold those are the costs right click number format and yes you do have to do this to each column click ok up at the top f2 and when i hit enter that's one of the fastest monthly sales and cost reports you will ever create now with a tabular report like this we have all the detail every single number but if we want to see trends and patterns that's where visualizing the data comes in so we'll click in a single cell insert charts and we don't want to use the xy scatter chart that's for xy data we have sales and cost data that we need to plot against an equidistance time variable on the horizontal axis we need to take these numbers show the up and down trend and for that we use a line chart i'm going to use this first one here line drag it make it bigger resize right click hide all field buttons select control 1 we'll put it at the top i want to add a chart title so i'll check that i see the solid line i type in equal sign shoots me up to the formula bar and i already put a label in cell d5 and enter and we have some of sales cost of goods sold with the dollar sign so we don't need to indicate here what these numbers are and these dates are obvious from up here in our chart title so there we are done and we're looking for patterns and trends and sure enough there's a seasonal trend boomerang sales just skyrocket during the christmas time you'd think that the biggest sales are during the summer when all the boomerang tournaments are but that's not the case this company sells a lot of boomerangs at christmas time now although the tabular summary has all of the detail all of the numbers it's the visualization which allows us to see the pattern or trend immediately this line chart immediately shows us the seasonal pattern all right now let's go over to side by side and here we have quarters and projected amounts and actual amounts and this data is already summarized so i click in a single cell and you're not going to believe how easy this is insert we want a clustered column the textbook calls it a side by side and let's do clustered column and then select i see the solid line equal sign click on b1 and enter and that's amazing it came out perfect except for that title at the top and that'll be true with our cross tabulation reports too when we use the column chart and we have these clustered columns basically one click and we're done and now we can clearly see for quarter two the actual is much bigger than the projected about the same in quarter one another great actual outperforming the projected here but then here quarter four not so good all right so we saw treating a time variable as a category here and over here whatever you call it category or continuous when you're showing a time series this is an appropriate chart to show changes over time all right now we want to go talk about frequency distributions and histogram for continuous quantitative data so we'll click on this sheet now here's our column of data and this data was collected last month from a small online retail store website our goal is to take the retail sales data create a frequency distribution with these counting categories for the sales amounts then take the frequencies and create a histogram visualization now histograms are great because they help reveal the pattern or the shape in the continuous data and for retail sales this is a typical pattern most of the sales are for small dollar amounts and there's only a few large dollar amount transactions then we'll create our second frequency distribution with percent cumulative frequency create our histogram but add the percent cumulative frequency line which is great for analysis because we can say things like 76 percent of the transactions were for less than the upper limit of a hundred bucks here we could say 93 of the transactions were for less than 150 bucks now with a single cell selected let's use our amazing pivot table tool alt nvt i'm going to put this in the existing sheet cell c7 click ok now when we drag this number field down to rows we instantly get a unique list but we can use the grouping feature right click group now this grouping dialog box gives us starting at which is the min value it found in this column the max value it found in the column and a suggested increment for each counting category now this is the art side of statistics because we have to pick a lower limit for the first class an upper limit for the last class and an increment these numbers are pretty easy so we can eye it we'll move this one down to zero so the lower limit for the first class will be zero this one we can move up to 500 once we have lower limit of the first category and the upper limit of the last category we can calculate the range which is 500 500 divided by an increment of 100 would give us five counting categories let's try 50. that'll give us 10 counting categories when i click ok bam we have created counting categories that'll allow us to count the sales amounts now remember the upper limit is never included the lower limit is included so if there's a 50 over here it's definitely not counted here it's counted here now two videos ago we talked about three ways to change these ambiguous labels i'm going to choose to indicate in this label at the top so f2 and in parentheses i'm going to say upper limit not included and we'll put this as a label in the histogram horizontal axis so when they see these labels they'll know upper limit is not included now we can drag this number field and normally when we drag a number field down to values it defaults to sum but this is a counting category so it defaults to count up at the top we'll type frequency and enter so there's the retail sales pattern most of the transactions are for small dollar amounts now we're going to use a column chart and change the gap widths what you don't want to use is we don't want to create our histogram by going to insert charts drop down histogram if we click this it won't even let us it's like what watch this this is the intention of microsoft and this is what we are never going to do insert and histogram what in the world is this those are totally messy categories so there's three problems with this type of chart we have these messy categories we're not allowed to set the lower limit of the first class and it assumes that the data has a bell-shaped normal distribution which is not a safe assumption for all data sets so big x here we're not going to use this delete we already created our pivot table so now we can come up to insert clustered column right click hide all field buttons we'll delete this and we'll cut to the chase we have other things to do but let's select the column control 1 to open up the task pane and we're immediately going to change gap width to 0. so 0 and tab i'm also going to go over to the paint bucket and i'll say border solid and black now you can change these to whatever colors you'd like i'm going to leave it like that now we'll select the title i've already created one so equal sign and i'll link it to cell d4 and enter home font size let's try 12 and enter 11 and enter now over to the green x i definitely want axis titles i'll link this vertical one equals click on the cell with frequency and enter select the axis title equal sign click on the label we created and enter and now we have a histogram to show us the shape of this data and throughout the rest of the class we'll have lots of probability distributions and one way to decide which distribution to use is plot a histogram look at the shape of the data and pick the distribution now let's create our second frequency distribution and i'm going to highlight and copy we'll come over to cell l7 control v scroll over click inside the pivot table we'll drag the field down to values and our calculation in this column percent cumulative frequency and enter and then we right click make sure summarize values by is count then come down to show values as and running total in click that calculation based on the row area condition click ok that gives us a running total the sum of these is 358 the sum of these three 440 and so on but we want percent cumulative frequency which means take all of these values and divide each of them by the total count right click show values as percent running total in click ok and looking at the tabular display it's not as immediate as the histogram will create but when we see 93 cumulative we look at the upper limit and we can say that 93 of the transactions were for less than 150 bucks now let's click in a single cell and we have two calculations in the pivot table which is what we want we'll use the combo chart insert combo and we'll select the second one remove field buttons control 1 we'll put this at the top select the columns gap with zero you can change the color or the border we'll add a chart title equal sign i created one in m4 and enter control shift f is the keyboard hopefully 11 will work click ok select the cumulative line we'll add data labels escape to close that select the data labels control 1 we'll say below and guess what last video we added custom number formatting with three semicolons to hide these but below in the comments wayne said there's a much easier way and i didn't know about this for 20 years i've been using that crazy difficult custom number formatting but i want the secondary axis i just don't want to see anything so select it control 1 labels we want to say the position none and bam look at that now we definitely need a label down in the horizontal axis so we'll click axis title primary horizontal equal sign l7 and enter now let's select the line and we'd like to add markers to the line go over to paint buckets there's markers marker options let's say built-in i use a circle and then the fill notice there's a fill and a border for the marker all i want is to fill it with red and now if i click the outside edge and drag down we have our histogram for this retail sales data with a percent cumulative line 93 percent of transactions are for less than 150 bucks and of course here when you get to 100 that means all transactions are less than 500 bucks now if we scroll over now here's another example a recent online job site published this data about entry-level accounting job salaries i already took the data made a frequency distribution and then plotted it and from this histogram we can see that the shape is a bell shape or normal shape distribution in chapter 6 when we see data with this shape that means we can use the normal distribution to make probability calculations if we scroll over here's another example where we have customer service phone wait time in minutes this data set frequency distribution and histogram and this histogram reveals what's called a uniform distribution that means we can use uniform distribution formulas to make probability calculations now i want you to go back over to the continuous quantitative data sheet and in the s column in aeg column you can try to create the frequency distributions and histograms for both of these data sets and as you're creating your histograms remember the order of the categories in the horizontal axis sorted smallest to biggest helps to reveal the shape of the data that was different than our categorical data where the labels on the axis conveyed no information it's the order of these categories and the height of the columns that help reveal the shape of the data now we do have one last topic we want to talk about skew now on this sheet i have a number of histograms pulling from separate data sets and what i want you to do is hit the f9 key i'm hitting f9 f9 and you can see each time i hit f9 it's pulling from a random set of data and what we're doing is we're simulating taking samples from the same type of data so for example this upper left one these are test scores and every time we have a test in the class the shape tends to be the same the higher frequencies are for higher scores but there tends to be a few really low scores when you see this pattern in your data you say the skew is to the left or negative now next chapter we'll learn about calculating averages and in particular the mean that's adding them up and dividing by the count well imagine if you have a bunch of really big values and there's a few really small values on the left what does that do to the mean that pulls it down but in this chapter we're just talking about what the skew looks like visually in a histogram now if we jump down to this bottom histogram f9 f9 this is retail or department store data and the higher frequencies tend to be for the smaller dollar amount but there's a few big transactions on the right when you see a histogram skew like this you say it's skewed to the right or positive and if we calculate the mean with a bunch of small values but there's one or two really big values that will tend to pull the mean up now up here this histogram is based on human height data and we tend to always get a bell shaped or normal distribution now if you get a perfectly symmetrical histogram then there is no skew now another type of pattern which has nothing to do with skew is bimodal that means we have some big columns on the right and some big columns on the left and f9 f9 sometimes this happens with test scores here's one right here house prices tend to be a bunch of small ones but there's a couple million dollar houses that create a skew to the right or positive and would pull the average mean calculation up all right that was a lot of fun with quantitative variables discrete and continuous we talked about skew and histograms we talked about how to create histograms we looked at a date variable where we used a side by side chart we saw the amazing pivot table grouping feature where we totaled sales and cost of goods sold and used the appropriate chart to show changes over time the line chart and we started it off with a discrete quantitative variable and we built a frequency distribution and a column or bar chart with no gaps all right we'll see you next video you
Info
Channel: ExcelIsFun
Views: 4,638
Rating: undefined out of 5
Keywords: Statistical Analysis, Excel Statistics, Highline College, Business Statistical Analysis, Busn 210, Mike Girvin, excelisfun, Mike excelisfun Girvin, Data Analysis, Microsoft 365 Excel, Excel Statistical Analysis, ESA 6, Histograms, Column Charts with No Gap Width, Frequency Distribution, PivotTable, Grouping in PivotTable, Group for Sales Categories, Monthly Sales Report, Line Chart, Clustered Column, Discrete, Continuous, Quantitative, Numbers, Grouping Dates, Microsoft Histogram
Id: L-KNeRTT5N4
Channel Id: undefined
Length: 28min 45sec (1725 seconds)
Published: Mon Dec 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.