Welcome to Excel 2013
Statistical Analysis video number 23. Hey, if you want to download
this workbook and follow along, click on the link
below the video. Hey, we're on the sheet
five number summary. That's cause we want to talk
about the five number summary. Now, it's straightforward
to calculate. You've got to calculate the min,
quartile 1 to 3, and the max. Those are the five numbers. Hey, the five numbers
summary provides information like min, max, the median. You can figure out
the range quickly. It gives you the quartiles. You can figure out the
quartile range quickly. And it's the basis to
build a box plot, which we'll see in just a second. Now, our example
here is a sample of salaries for accountants
in Oakland, California. Here's our column here. And we want to calculate
our five number summary. Now, we're using .exc. Quartile exclusive,
it excludes 0 and 1. We saw back in the video on
quartile that you could use .inc and use 0, 1, 2, 3 4. And it would automatically
calculate the five number summary. We're going to use the min, the
max, and the quartile function. So in cell C13 equals min. Click in the top cell,
Control+Shift down arrow to highlight the whole range. And I'm going to Shift+Enter to
put that in the cell and push the cell up. Now, I'm going to use
quartile equals quartile.exc. The array, click on the top
cell, Control+Shift down arrow. F4, because we want to lock
it with our dollar signs. And jump the screen
back up, and then comma. And I'm going to use my arrow
keys to get to my 1, 2, 3. Control+Enter, and now
I can copy this down. And there's quartile 1, 2, 3. Click on the last
cell, and I'm just checking to see if it's locked
and if it's got the right cell reference. Yes, it does. Enter equals MAX tab. Click in the top cell,
Control+Shift down arrow, Shift+Enter to put that in
the cell and push the cell up. So there it is,
five number summary. So the min salary for
an accounting position in our sample for
a month was $2,510. The max was $6,987. Quartile 1 to 3, there they are. If we subtracted
$4,003 from $4,664-- that's quartile 1
from quartile 3-- that would be the
interquartile range. And we can see that
about $7,000 to $2,500, the difference between
those two, is the range. Now, let's go over to
the sheet five number summary and box plot. Now, in this class,
we're not going to build this chart in Excel. Because although it can be
done, it's quite tedious. I do have other videos on
how to build box plots. Now, here's the deal. Here's some quizzes. And this articulates
the five number summary. So the rectangle in the
middle represents quartile 1, which is 17.5 points, all
the way up to quartile 3, which is about 30. We can clearly see the
middle 50% of the values go from about 30 to about 17. The line down the middle,
that's quartile 2 or the median. It looks like 27. We also put the
mean up here, right? Remember, if the mean
is below the median, then there's probably some small
values pulling the mean down. We can also see the
little whiskers here that extends to the biggest
value inside the lower limit, which happens
to be our min of 10. This whisker up here
extends and shows the max, which also is
inside our upper limit. Now, these limits are created. We had 0 and 47.5. If we had values, they
would be marked as an x or a square or something. That would represent an outlier,
an unusually small or unusually large value. Now, in this class,
we're going to see how to calculate all of these
pieces from this dataset. You can sketch out a
box plot like this. I'm going to scoot
this off to the side. All right, so
here's our dataset. We will calculate equals min,
highlight the whole range, Enter, equals q. And the very first
function is quartile.exc. So Tab, highlight the range, and
I'm immediately going to hit F4 to lock that range, comma
and quartile 1, arrow, arrow, over to the right,
Control+Enter, and copy it just down
to the third quartile. The max equals max. And there's our range, Enter. Our interquartile range,
that is quartile 3 to quartile 1, and Enter. Now, to calculate the
lines up at the top, our textbook uses
the multiplier 1.5 to determine an unusually
small or large value. There are other
multipliers like 2.2. There's some notes down here
if you want to look at that. We're going to use 1.5. And here's the calculation
for the lower limit. Equals, we're going to
actually start with quartile 1. And we're going to
subtract our multiplier times the interquartile range. Boom, now, this is
negative, right? A quiz can't go below 0. So we would convert that to 0. Notice the rule right there. Our upper limit,
hey, we're going to extend past quartile
3 plus our multiplier times the interquartile
range, and Enter. Now, those are two
assumed marking positions. Any values that we
find past these two would be considered outliers. If you wanted to
change it to 2.2, boom, there would be different lower
and upper limits, Control+Z. Now, I'd like to actually count
to see if there's any values below or above our limits. Now, this one we don't
have to worry about. Because the quizzes can't
possibly get below 0. Not only that, but on a small
data set, we could just eye it. But of course, we have
to COUNTIFS functions. And that will
allow us to count-- bloop, there's all the values,
comma, and the criteria. Well, there's the hurdle. Anything above would be
considered an outlier. So I need to combine comparative
operator and the number. I put the comparative
operator in double quotes. The greater than symbol
is pointing over there. So if it finds any of these
that are bigger than 47.5, then it will be counted. We have to join it, Shift+7,
the ampersand to our criteria sitting in the
cell, Control+Enter. So there are 0 if I were to-- if I had one in here like 50,
instantly I would get a count, Control+Z. Now, we need to
calculate the mean average. Highlight the range and Enter. So there we have all
of the inputs that will allow us to
draw our box plot, a visual summary of our
five number summary. Now, we have one more topic. Here, we considered
outliers in this 1.5 rule. Let's scroll over here and
look at the sheet, Outliers. Now, we talked about
z and our normal curve with plus or minus 3
standard deviations as being really rare. So another way to calculate
an outlier is simply to check the z. So I'm going to say
the particular value minus the mean-- F4 key to lock that-- close parentheses divided
by standard deviation, F4, Control+Enter, double
click and send it down. Now certainly, we could
go through and eye this. I can already see it looks like
there's one there and there. So we could actually add
an extra column and use-- since we're checking for
positive 3 or negative 3, we could use the ABS,
the absolute value. So that will return whatever
the number is distance from 0. And I'm going to say anytime
you are greater than 3. Now, because we're using a
comparative operator directly in our formula, this
is a logical formula that will return
just a true or false. Control+Enter, double
click, and send it down. Now, notice even here,
we can see the truths now and pick those out. We could come over to this side. We could do a formula
like equals COUNTIFS, highlight the whole range,
Control+Shift down arrow, Control+Backspace to jump the
screen back in view, comma. And then because
it's a logical, we don't have to put
it in double quotes. It is its own data type,
neither text nor number, so Control+Enter. Actually, if you hit F2, you
see it pop back up to true. At the beginning
of class, we talked about how that's
a characteristic of true/false values in Excel. So that is another way we
could look through the data set and count. Right click, paintbrush to copy
just the formatting, and click. Still another way, watch this. Control+Shift down arrow,
because we want to highlight the whole column. Now, I'm going to scroll back
up so you can see that I've highlighted the whole column. We can go to Home, Conditional
Formatting, Highlight Cell Rules, equal to-- and I'm going to type true here. And then instead of
accepting the default, I'm going to click the dropdown
and go to a Custom Format. And now, I can do
whatever I want. This is our Format
Cells dialog box. I'm going to select Red
Fill, Font, I want White. Click OK, click OK. I kind of like that way. Because now, we can go
through and pick it out. Because an outlier could
be an actual value that should be in the data set,
a value that was mistakenly in the data set, or even
typing error, all right? So there you go. Outliers with z, logical
formula COUNTIFS, and even a little conditional formatting. All right, so we talked a
little bit about outliers with the z rule. We talked about box plot. And we talked about our
five numbers summary. All right, next
video, we're going to talk about our last
numerical measure, covariance and coefficient of
variation for two variables. All right, see you next video.