Basic Excel Business Analytics #17: FREQUENCY Array Function for Frequency Distribution & Histogram

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Highline BI 348 class video number 17. Hey, if you want to download this workbook BI 348 Chapter 2 start, or the finished file, click on the link below the video. Hey, we have this data set here from 2014 sales for Boomerang Incorporated, and we have date, time, the website that we sold the boomerang through, the product, the type, the quantity, and the discount. Now the goal of this video, from this data set, is to create a frequency distribution, cumulative frequency distribution, and a histogram for revenue. Now noticed, we don't have a revenue column, so we'll have to create that. Now I want to go over to the sheet Goal For Quantitative FD. Here's our end result, because in this video, we're going to learn about the frequency array function. Now when our prerequisite class, when we created frequency distributions for continuous quantitative data, we either used the COUNTIFs function or a pivot table. So this is going to be new, and it's actually a little bit easier than the COUNTIFs function, but we're going to have to deal with a certain type of category that it creates for an upper and lower limit. All right, so let's go over to our sales data sheet, and we have to create a revenue column. Now, here's the deal, we are, from our transactional data set-- and again this transactional data set, Control Down Arrow, us like 26,000 records, Control Up Arrow-- we're given the product that we sold and the quantity. We also have a discount, and that's the discounted as it applies to the price. But over here, we have a completely separate LOOKUP table with our price. So we're going to need to, from our transaction product column, look up the price, get the price, multiply it by quantity, and times 1 minus the discount rate to get our revenue. So let's go ahead and create this. I added some formatting there, equals VLOOKUP. We are looking up product, comma, within this two columned table, and I'm going to hit the F4 key to lock that. Comma, and the column index, one, two column has the item we want to get and return to the cell, so I put a 2, and comma. We're doing the exact match, because we're looking up text here, so I'm not going to put false. I'm going to put it 0. Close parentheses, Control Enter, double click and send it down. All that gives us is the price. We can see [? younaki ?] is 23.95, and sure enough, over here, [? younaki ?] 23.95. Now the whole column is highlighted down 26,000 rows. And there's an active cell at the top, so I'm simply going to hit the F2 key to put that in edit mode. And they need to multiply that times quantity. Then I need to multiply that times-- not the discount rate, because that gives me the discount-- I need to go 1 minus. We gave him 16.5% discount, so whatever the remaining amount after subtracting from 1 is the actual net revenue. Now notice we're going to run into rounding problems here, because some of these decimals exceed the penny position. Now before I fix the rounding problem, I want to enter this formula. And notice active cell, whole column highlighted, so I hold Control and Enter to populate that edited formula all the way down. And we can clearly see there's lots of extraneous decimals, F2. I'm using the ROUND function, R-O-U-N-D, at the beginning. The number is that big huge formula, but I need to very carefully come to the end and with my I-beam click. Comma, and number of digits, you put it 2 when you're rounding to the penny. Close parentheses, enter the edited formula all the way down with Control and Enter. Now we have our revenue. And no matter what method you use to count the frequencies between upper and lower limits-- whether it's frequency array function, pivot tables, COUNTIFs-- we still have to figure out what the lower limit and upper limit and the increment between the two are for our particular data set. Now in the textbook, they just say, hey create classes, between 5 and 20 of them. In our prerequisite class, we had a particular method, but we can sort of use common sense. We definitely are going to have to look at the max and the min, and then figure out the range for data set. So I'm going to highlight this whole column, Control Shift Down Arrow, Control Backspace, and then Enter. So looks like the max is near $3,000, the min-- equals min, Control Shift Down Arrow, Control Backspace, and Enter-- oh, and so the min is like $4.09. That's probably for a sale of one boomerang with a slight discount. So it looks like we could go from 0 to 3,000 and still our min and max and have nice neat numbers to deal with. Another thing we're going to need to know is that count. So I'm going to use the count, equals COUNT, and we're counting the numbers so I'll use COUNT, Control Shift Down Arrow, Control Backspace, and Enter, so about 26,524. Now another thing we're going to need to know is number of classes, and I'll come back to that one in just a moment. Our rule that we used back in our previous class, Business 210, and we're going to call number of classes equals k. Our rule was any time we get 2 to the k greater than n-- and our count was equal to n, that was our sample size. Here, let's just make a formula, 2 caret, and I don't have anything in the k so far, but it doesn't matter. Any number raised to 0 is 1. And I'm going to try 6. Well, is 64 bigger than our account? No way. I'm going to try 10, not that either. How about 15? OK, so that number is bigger there, how about 14? Now, so it looks like 15, but remember this is all an art, right? We're just kind of trying to get a feel for the number. So 15 classes, we could use that, or 10, or whatever. We're going to try and use 15. And it clearly looks like we could start at 0 and go to 3,000, but I'm going to be explicit and say range. And the range is simple, we say equals, hey, whatever the max is minus the min. And that should be approximately 3,000. Now with a range and number of classes, it's easy to figure out the minimum interval for each one of our classes. And that would simply be equals, hey, the range divided by our 15. Hey look at that, I betcha we can guess that easily we could round this up to 200. So it looks like class-- and I tend to see interval but class size is what they call it in the book. Interval or class size, either one is fine. I'm just going to say 200. And our lower limit for our first class, how about we start at 0 since that's pretty darn close there. Now, with these two numbers, we can use any one of the methods, frequency array function, pivot table, COUNTIFs, or even our data analysis at n. Now I went ahead and added some formatting there just to be consistent. Now if we're going to start at 0 and go to 200, then our category is going to be like 0 to 200, 400 to 600, 600 800, all the way up to 3,000. Now we're going to scroll over here and our first example is going to be how to create a frequency distribution with the frequency array function. Now I created a bunch of labels here, category, revenue upper limit-- because our frequency function is going to be awesome and it only needs the upper limit for counting between a lower and upper limit-- then we're going to calculate a frequency with a frequency array function, our percent frequency, cumulative frequency, and percent cumulative frequency. Now I want to come over to the revenue upper limit column. And remember our categories are 0 to 200, 200 to 400, et cetera, all the way to 3,000, but the array function doesn't need both the lower and the upper limit. It only needs the upper limit. So I'm going to give it the upper limit of the first category, the upper limit of the second category. I'm not going to link these to an assumption area, this is historical data and we've already established the category limits. So now watch this. I'm simply going to highlight these to, point to my fill handle, and when I see my angry rabbit I'm going to click and drag. And because there's a difference of 200 between those numbers, look at the screen tip. It's giving us exactly increments of 200. Now I'm going to stop at 3,000 because that's bigger than any number we have in our data set. Now these are the upper limits and there's 15 of them. The frequency function is amazing. We're only going to give it those upper limits and the data set, and it will calculate all the frequencies. But this is a special type of function called an array function, because it will deliver all the answers simultaneously, which means you have to highlight all of the cells in advance. Now another amazing thing about the frequency function is if you give it 15 upper limits, it'll actually create 16 categories just in case there's some number in our data set and we weren't paying attention when we gave it the last upper limit. It will create one extra category that says, hey are there any numbers bigger than the last upper limit. So you have to highlight in advance a number of cells that equals exactly one more than the number of our upper limits. Now in the active cell, I'm going to type frequency, and all we have to do is give it the actual data numbers, revenue numbers, and our upper limit. So I'm going to scroll over, click in the top cell, Control Shift Down Arrow, and Control Backspace, scroll all the way over, data array, comma, and the bins array. Those are the upper limits. Now we do not have to lock these ranges because this is a special function called an array function. It will be entered simultaneously into all the cells. And when Excel copies it down, it will not move. Now actually we saw this array function in Business 218 when we discussed array formulas. Close parentheses, and because it's an array function, we have to enter it with the special keystroke Control Shift and Enter. Now we used Control Shift Enter to enter it into the cells, that was us telling Excel that this is an array function. But look up in the formula bar. Those curly brackets cannot be typed in. Excel put those in automatically, and that's Excel telling you that it understood that this is an array function. Hey, there's our count. That is absolutely amazing. In each one of these, count. That 2,507 means it counted between 200 and 400. Now I actually explicitly want to show you the categories here. Here's an explicit listing of what the frequency function will do every single time. It will always take the first upper limit and count everything less than or equal to it. Notice that that's sort of polite for them to program the frequency function to do this. That means it will catch everything below this limit, and then all of the ones in between the first and last category, include the upper limit but not the lower limit. And notice that's the opposite of what a pivot table does. And finally, it's polite because it will create this last category just in case we weren't paying attention when we created the upper limit of the last category. It'll don't catch any values greater than that last category. We were paying attention when we created that so of course it caught 0. Another thing about this array function, if you click in a cell and hit the Delete key, you cannot delete it. That whole range of values is considered one object. It's an array of answers. Notice it says you cannot change part of an array. Not only that, but if you come over here and try to insert a row, it's the same thing. You cannot change that. Those values are one object stored together as an array. Now I'm going to add some formatting, and I'm going to copy that formatting over to these cells right here. Now I want to calculate a total, Control Enter, and B Tab, Alt equals, and there's our total, Control B. Now we can calculate our percent frequency, that means each individual part. And we're going to take the total, F4 to lock it, so each individual part divided by the total, Control Enter, and copy it down. Now that's called a relative frequency, or we're going to add a percent number formatting so we'll call it percent frequency. And there we go. Now cumulative frequency, that's something we haven't done in this class, but we did it in a previous class. It just means, hey, I'm going to take this one, and then for this one I'm going to add both. And then for this one a to add all three of them. Now there's a few ways we can do this, but let's just do this this way. We're going to put the first one-- because there's only one category-- and then for the second we're going to say, equals hey, take whatever is in the previous class plus whatever is new in this class. And that formula, when we copy down, will calculate the cumulative frequency. And when we get to the last one, it's exactly equal to our total. In this case, since it's 0, this one is also. Now we can calculate percent cumulative frequency. That means whatever the cumulative total is so far divided by our total, and I'm going to click on this one here. We can click on this one here because it's the same number, F4 to lock it, Control Enter, and copy it down. Now let's add some formatting to this, Home and Percentage, two decimals. Now the cumulative frequency and cumulative, you can pick any one of these categories. Let's select this one. There's the upper limit, 1,400. That means 25,597 of the transactions were for $1,400 or less. As a percentage it means 96.51% of our transactions were for $1,400 or less. Right here, about 77% of our transactions were for $600 or less. Now we're going to create a chart from this, and the cumulative frequency line that we get is even easier to understand. Now what I would like to do is use this as our label on our horizontal axis. This'll be the column height, and then we're actually going to change this to a line, so we'll have a combo chart. Now we're going to highlight, and I'm not going to highlight the last category since we've got 0 there. And then I'm going to hold the Control key and highlight that label all the way down to 100%. Now I can go up to Insert. And we could use Column, but let's try this new thing in 2013. There's a combo chart. You can see the preview, that's not looking good. Wow, that one's looking good. So we're going to choose this one. Even though there's a mistake in it, it wasn't smart enough to pick it up. Now notice what it did down here, it thought that our revenue upper limit actually was a column, so we're going to fix that. We could actually fix that, not going to Select Data, but to simply click on the columns and Delete. Now that means we have to right click, go to Select Data because we want to add it back in, but as our horizontal category. You can see the dancing ants dancing in there for both of our series. Those are the numbers. Click on Edit, and now I'm going to select just our labels for all horizontal axis. And click OK. Click OK. Now I don't think we're going to need a chart title, so I'm going to delete that. I'm going to click on the legend, Control 1, that opens up-- oh yeah, there it is from earlier. I want this at the top. That will kind of serve as our chart title. We definitely need labels. Axis Titles, click on the Vertical Axis Title, equal sign to jump up to the formula bar, and I'm going to type frequency and Enter. Click on the Horizontal Axis, equal, and I'm going to say the revenue upper limit to clearly describe what that is. Now we can come over and select our columns, Control 1. Notice it's right here, gap with 0. And you've seen me do this many time. I go to the Fill and do vary color by point. But in a pivot table, that's just not there. I'm going to do a solid fill, something like a dark blue, and I want a border, solid line. And I'm going to select black just to give it some emphasis there. That's chart junk right there. You could stop right there. That's beautiful. Revenue upper limit, that describes that. Frequency, that describes that. The percent cumulative frequency, right so this label up here is describing what that is. Now that actually is quite a beautiful chart. I'm going to dragged this up here, and then I'm going to Control C, click down here, and Control V to copy it. Now let's change it up, because this is the art side of statistics. I'm going to come over here and say, let's say, Data Label. So now it's going to get messy. I have numbers there, so I'm going to click on the frequency and delete. But that deletes the actual data there, so I'm going to Control Z. I want to learn a trick here, because I want to keep this here so that the data is still in the chart. Control 1 with the axis selected, and we're going to go down to Number Formatting, and this is a trick from Business 218. You can actually-- there's an actual number formatting that says show nothing. So what we want to do is still have this vertical axis here, but show nothing, and it's semicolon, semicolon, semicolon. From 218, custom number formatting, there's positive, negative, 0, and text sections to custom number formatting. So when you list all four sections, three semicolons with nothing in there, it just means no matter what you have there show nothing. So I'm going to click Add and there it is. I want to do the same thing over here because if I delete it, it collapses the chart, Control Z. So I'm going to Control 1 with that axis highlighted. Scroll over here, Number, here's our trick. Number Formatting, semicolon, semicolon, semicolon, that's a great trick. Any time you want to go to your friend's spreadsheet and highlight all the cells and add this number formatting, no matter what they put in the cells, it won't show up. No no, that would be terrible. You're going to know how to fix it when someone else tricks them. All right, click Add. Now this is looking pretty messy, right? I want to try and select these labels. And over here, let's try above. Well that still pretty messy. Go down to Number, Let's see if we can add a percentage. So the 0%, no decimals, and Add. That's looking sort of OK. I'm thinking I'm going to try and select one of those 100s, Delete, maybe another one. Notice when you click and select something like data labels, it selects them all. But when you select a second time, it allows you to select just a single one. We could actually probably remove those. This is rounded, right? There are so 99.6s and 99.7s, however you want to do that. Maybe we'll just delete all those. And this last one right here, I'm going to try and get to font. And then we saw this in an earlier video, the task pane over here doesn't have font. You either have to click Font, Control Shift F, or go up to the Home ribbon, right? And then there's a font color. I'm going to change it to white. So we have two different charts that are articulating the same information. And what this means is you can pick any point. So 94%, you come down here and there's the upper limit. 94% of our transactions were for $1,200 or less. 62% of our transactions were for $200 or less. That's the percent cumulative frequency. Our frequencies, this is actually an extreme, but typical pattern for retail data. Most of the transactions are for a very small amount. And as you get further into the upper transaction amounts, there's fewer and fewer actual transactions. All right, so in this video we saw how to create a bunch of things-- frequency, percent frequency, cumulative frequency, and percent frequency-- all based off of this amazing frequency functions, only using the upper bins and the special categories. And then we saw how to create two charts to visually articulate the frequency and cumulative frequency. All right, when we come back, I'm actually going to show you two other ways to accomplish the same task. We'll use the data analysis add-in feature and a pivot table. All right, we'll see you next video.
Info
Channel: ExcelIsFun
Views: 14,692
Rating: undefined out of 5
Keywords: Excel, Microsoft Excel, Highline College, Mike Girvin, excelisfun, Michael Girvin, Mike excelisfun Girvin, Slaying Excel Dragons, Excel Magic Tricks, Ctrl Shift Enter Mastering Excel Array Formulas, Array Formulas, Business Analytics, BI 348, Data Analysis, FREQUENCY Array Function, Frequency Distribution, Cumulative Frequency Distribution, Histogram, continuous quantitative variable Revenue, VLOOKUP, VLOOKUP to lookup price, Column Chart, special key stroke, Ctrl Shift Enter
Id: ZvkriI_y_c8
Channel Id: undefined
Length: 23min 2sec (1382 seconds)
Published: Sat Oct 10 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.