Excel 2013 Statistical Analysis #23: Five Number Summary, Outliers and Box Plots

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Excel 2013 Statistical Analysis video number 23. Hey, if you want to download this workbook and follow along, click on the link below the video. Hey, we're on the sheet five number summary. That's cause we want to talk about the five number summary. Now, it's straightforward to calculate. You've got to calculate the min, quartile 1 to 3, and the max. Those are the five numbers. Hey, the five numbers summary provides information like min, max, the median. You can figure out the range quickly. It gives you the quartiles. You can figure out the quartile range quickly. And it's the basis to build a box plot, which we'll see in just a second. Now, our example here is a sample of salaries for accountants in Oakland, California. Here's our column here. And we want to calculate our five number summary. Now, we're using .exc. Quartile exclusive, it excludes 0 and 1. We saw back in the video on quartile that you could use .inc and use 0, 1, 2, 3 4. And it would automatically calculate the five number summary. We're going to use the min, the max, and the quartile function. So in cell C13 equals min. Click in the top cell, Control+Shift down arrow to highlight the whole range. And I'm going to Shift+Enter to put that in the cell and push the cell up. Now, I'm going to use quartile equals quartile.exc. The array, click on the top cell, Control+Shift down arrow. F4, because we want to lock it with our dollar signs. And jump the screen back up, and then comma. And I'm going to use my arrow keys to get to my 1, 2, 3. Control+Enter, and now I can copy this down. And there's quartile 1, 2, 3. Click on the last cell, and I'm just checking to see if it's locked and if it's got the right cell reference. Yes, it does. Enter equals MAX tab. Click in the top cell, Control+Shift down arrow, Shift+Enter to put that in the cell and push the cell up. So there it is, five number summary. So the min salary for an accounting position in our sample for a month was $2,510. The max was $6,987. Quartile 1 to 3, there they are. If we subtracted $4,003 from $4,664-- that's quartile 1 from quartile 3-- that would be the interquartile range. And we can see that about $7,000 to $2,500, the difference between those two, is the range. Now, let's go over to the sheet five number summary and box plot. Now, in this class, we're not going to build this chart in Excel. Because although it can be done, it's quite tedious. I do have other videos on how to build box plots. Now, here's the deal. Here's some quizzes. And this articulates the five number summary. So the rectangle in the middle represents quartile 1, which is 17.5 points, all the way up to quartile 3, which is about 30. We can clearly see the middle 50% of the values go from about 30 to about 17. The line down the middle, that's quartile 2 or the median. It looks like 27. We also put the mean up here, right? Remember, if the mean is below the median, then there's probably some small values pulling the mean down. We can also see the little whiskers here that extends to the biggest value inside the lower limit, which happens to be our min of 10. This whisker up here extends and shows the max, which also is inside our upper limit. Now, these limits are created. We had 0 and 47.5. If we had values, they would be marked as an x or a square or something. That would represent an outlier, an unusually small or unusually large value. Now, in this class, we're going to see how to calculate all of these pieces from this dataset. You can sketch out a box plot like this. I'm going to scoot this off to the side. All right, so here's our dataset. We will calculate equals min, highlight the whole range, Enter, equals q. And the very first function is quartile.exc. So Tab, highlight the range, and I'm immediately going to hit F4 to lock that range, comma and quartile 1, arrow, arrow, over to the right, Control+Enter, and copy it just down to the third quartile. The max equals max. And there's our range, Enter. Our interquartile range, that is quartile 3 to quartile 1, and Enter. Now, to calculate the lines up at the top, our textbook uses the multiplier 1.5 to determine an unusually small or large value. There are other multipliers like 2.2. There's some notes down here if you want to look at that. We're going to use 1.5. And here's the calculation for the lower limit. Equals, we're going to actually start with quartile 1. And we're going to subtract our multiplier times the interquartile range. Boom, now, this is negative, right? A quiz can't go below 0. So we would convert that to 0. Notice the rule right there. Our upper limit, hey, we're going to extend past quartile 3 plus our multiplier times the interquartile range, and Enter. Now, those are two assumed marking positions. Any values that we find past these two would be considered outliers. If you wanted to change it to 2.2, boom, there would be different lower and upper limits, Control+Z. Now, I'd like to actually count to see if there's any values below or above our limits. Now, this one we don't have to worry about. Because the quizzes can't possibly get below 0. Not only that, but on a small data set, we could just eye it. But of course, we have to COUNTIFS functions. And that will allow us to count-- bloop, there's all the values, comma, and the criteria. Well, there's the hurdle. Anything above would be considered an outlier. So I need to combine comparative operator and the number. I put the comparative operator in double quotes. The greater than symbol is pointing over there. So if it finds any of these that are bigger than 47.5, then it will be counted. We have to join it, Shift+7, the ampersand to our criteria sitting in the cell, Control+Enter. So there are 0 if I were to-- if I had one in here like 50, instantly I would get a count, Control+Z. Now, we need to calculate the mean average. Highlight the range and Enter. So there we have all of the inputs that will allow us to draw our box plot, a visual summary of our five number summary. Now, we have one more topic. Here, we considered outliers in this 1.5 rule. Let's scroll over here and look at the sheet, Outliers. Now, we talked about z and our normal curve with plus or minus 3 standard deviations as being really rare. So another way to calculate an outlier is simply to check the z. So I'm going to say the particular value minus the mean-- F4 key to lock that-- close parentheses divided by standard deviation, F4, Control+Enter, double click and send it down. Now certainly, we could go through and eye this. I can already see it looks like there's one there and there. So we could actually add an extra column and use-- since we're checking for positive 3 or negative 3, we could use the ABS, the absolute value. So that will return whatever the number is distance from 0. And I'm going to say anytime you are greater than 3. Now, because we're using a comparative operator directly in our formula, this is a logical formula that will return just a true or false. Control+Enter, double click, and send it down. Now, notice even here, we can see the truths now and pick those out. We could come over to this side. We could do a formula like equals COUNTIFS, highlight the whole range, Control+Shift down arrow, Control+Backspace to jump the screen back in view, comma. And then because it's a logical, we don't have to put it in double quotes. It is its own data type, neither text nor number, so Control+Enter. Actually, if you hit F2, you see it pop back up to true. At the beginning of class, we talked about how that's a characteristic of true/false values in Excel. So that is another way we could look through the data set and count. Right click, paintbrush to copy just the formatting, and click. Still another way, watch this. Control+Shift down arrow, because we want to highlight the whole column. Now, I'm going to scroll back up so you can see that I've highlighted the whole column. We can go to Home, Conditional Formatting, Highlight Cell Rules, equal to-- and I'm going to type true here. And then instead of accepting the default, I'm going to click the dropdown and go to a Custom Format. And now, I can do whatever I want. This is our Format Cells dialog box. I'm going to select Red Fill, Font, I want White. Click OK, click OK. I kind of like that way. Because now, we can go through and pick it out. Because an outlier could be an actual value that should be in the data set, a value that was mistakenly in the data set, or even typing error, all right? So there you go. Outliers with z, logical formula COUNTIFS, and even a little conditional formatting. All right, so we talked a little bit about outliers with the z rule. We talked about box plot. And we talked about our five numbers summary. All right, next video, we're going to talk about our last numerical measure, covariance and coefficient of variation for two variables. All right, see you next video.
Info
Channel: ExcelIsFun
Views: 17,794
Rating: 5 out of 5
Keywords: Excel 2013, Statistical Analysis, Excel Statistics, Highline College, Business and Economic Statistical Analysis, Busn 210, Mike Girvin, excelisfun, Mike excelisfun Girvin, Data Analysis, Five Number Summary, Min, Quartile 1, Quartile 2, Quartile 3, Max, Box Plots, Visual Summary of the Five Number Summary, How to calculate and draw on paper, 3 Standard Deviation Outliers, ABS function, COUNTIFS to count TRUE values, Conditional Formatting to highlight Outliers
Id: 91Yam83AOhk
Channel Id: undefined
Length: 11min 44sec (704 seconds)
Published: Fri Jan 23 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.