Welcome to Highline BI
348 class video number 17. Hey, if you want to download
this workbook BI 348 Chapter 2 start, or the finished
file, click on the link below the video. Hey, we have this data set here
from 2014 sales for Boomerang Incorporated, and
we have date, time, the website that we sold
the boomerang through, the product, the type, the
quantity, and the discount. Now the goal of this
video, from this data set, is to create a
frequency distribution, cumulative frequency
distribution, and a histogram for revenue. Now noticed, we don't
have a revenue column, so we'll have to create that. Now I want to go over to the
sheet Goal For Quantitative FD. Here's our end result,
because in this video, we're going to learn about
the frequency array function. Now when our prerequisite
class, when we created frequency distributions for continuous
quantitative data, we either used the COUNTIFs
function or a pivot table. So this is going to be
new, and it's actually a little bit easier than
the COUNTIFs function, but we're going to have to deal
with a certain type of category that it creates for an
upper and lower limit. All right, so let's go over
to our sales data sheet, and we have to create
a revenue column. Now, here's the deal, we are,
from our transactional data set-- and again this
transactional data set, Control Down Arrow,
us like 26,000 records, Control Up Arrow-- we're
given the product that we sold and the quantity. We also have a discount,
and that's the discounted as it applies to the price. But over here, we have a
completely separate LOOKUP table with our price. So we're going to need to, from
our transaction product column, look up the price,
get the price, multiply it by quantity, and
times 1 minus the discount rate to get our revenue. So let's go ahead
and create this. I added some formatting
there, equals VLOOKUP. We are looking up product,
comma, within this two columned table, and I'm going
to hit the F4 key to lock that. Comma, and the column
index, one, two column has the item we want to
get and return to the cell, so I put a 2, and comma. We're doing the exact
match, because we're looking up text here, so
I'm not going to put false. I'm going to put it 0. Close parentheses,
Control Enter, double click and send it down. All that gives us is the price. We can see
[? younaki ?] is 23.95, and sure enough, over
here, [? younaki ?] 23.95. Now the whole column is
highlighted down 26,000 rows. And there's an active
cell at the top, so I'm simply
going to hit the F2 key to put that in edit mode. And they need to multiply
that times quantity. Then I need to multiply that
times-- not the discount rate, because that gives
me the discount-- I need to go 1 minus. We gave him 16.5%
discount, so whatever the remaining amount
after subtracting from 1 is the actual net revenue. Now notice we're going to run
into rounding problems here, because some of these decimals
exceed the penny position. Now before I fix the
rounding problem, I want to enter this formula. And notice active cell,
whole column highlighted, so I hold Control and Enter to
populate that edited formula all the way down. And we can clearly
see there's lots of extraneous decimals, F2. I'm using the ROUND function,
R-O-U-N-D, at the beginning. The number is that
big huge formula, but I need to very
carefully come to the end and with my I-beam click. Comma, and number of
digits, you put it 2 when you're
rounding to the penny. Close parentheses, enter the
edited formula all the way down with Control and Enter. Now we have our revenue. And no matter what
method you use to count the frequencies
between upper and lower limits-- whether it's frequency array
function, pivot tables, COUNTIFs-- we still
have to figure out what the lower limit
and upper limit and the increment between the
two are for our particular data set. Now in the textbook,
they just say, hey create classes,
between 5 and 20 of them. In our prerequisite class,
we had a particular method, but we can sort of
use common sense. We definitely are going to have
to look at the max and the min, and then figure out
the range for data set. So I'm going to highlight this
whole column, Control Shift Down Arrow, Control
Backspace, and then Enter. So looks like the
max is near $3,000, the min-- equals min, Control
Shift Down Arrow, Control Backspace, and Enter-- oh,
and so the min is like $4.09. That's probably
for a sale of one boomerang with a
slight discount. So it looks like we
could go from 0 to 3,000 and still our min
and max and have nice neat numbers to deal with. Another thing we're going to
need to know is that count. So I'm going to use the
count, equals COUNT, and we're counting
the numbers so I'll use COUNT, Control Shift Down
Arrow, Control Backspace, and Enter, so about 26,524. Now another thing we're
going to need to know is number of classes,
and I'll come back to that one in just a moment. Our rule that we used back in
our previous class, Business 210, and we're going to call
number of classes equals k. Our rule was any time we get
2 to the k greater than n-- and our count was equal to
n, that was our sample size. Here, let's just make
a formula, 2 caret, and I don't have
anything in the k so far, but it doesn't matter. Any number raised to 0 is 1. And I'm going to try 6. Well, is 64 bigger
than our account? No way. I'm going to try
10, not that either. How about 15? OK, so that number is
bigger there, how about 14? Now, so it looks like
15, but remember this is all an art, right? We're just kind of trying to
get a feel for the number. So 15 classes, we could use
that, or 10, or whatever. We're going to try and use 15. And it clearly looks
like we could start at 0 and go to 3,000, but I'm going
to be explicit and say range. And the range is simple, we
say equals, hey, whatever the max is minus the min. And that should be
approximately 3,000. Now with a range and
number of classes, it's easy to figure out
the minimum interval for each one of our classes. And that would simply be
equals, hey, the range divided by our 15. Hey look at that, I betcha
we can guess that easily we could round this up to 200. So it looks like class--
and I tend to see interval but class size is what
they call it in the book. Interval or class size,
either one is fine. I'm just going to say 200. And our lower limit
for our first class, how about we start at 0 since
that's pretty darn close there. Now, with these
two numbers, we can use any one of the
methods, frequency array function, pivot table, COUNTIFs,
or even our data analysis at n. Now I went ahead and added
some formatting there just to be consistent. Now if we're going to
start at 0 and go to 200, then our category is going to be
like 0 to 200, 400 to 600, 600 800, all the way up to 3,000. Now we're going to
scroll over here and our first example
is going to be how to create a frequency
distribution with the frequency array function. Now I created a
bunch of labels here, category, revenue upper limit--
because our frequency function is going to be awesome and
it only needs the upper limit for counting between a
lower and upper limit-- then we're going to calculate
a frequency with a frequency array function, our
percent frequency, cumulative frequency, and
percent cumulative frequency. Now I want to come over to the
revenue upper limit column. And remember our categories
are 0 to 200, 200 to 400, et cetera, all the way to
3,000, but the array function doesn't need both the
lower and the upper limit. It only needs the upper limit. So I'm going to give
it the upper limit of the first category, the upper
limit of the second category. I'm not going to link these
to an assumption area, this is historical data and
we've already established the category limits. So now watch this. I'm simply going to highlight
these to, point to my fill handle, and when I
see my angry rabbit I'm going to click and drag. And because there's a difference
of 200 between those numbers, look at the screen tip. It's giving us exactly
increments of 200. Now I'm going to stop
at 3,000 because that's bigger than any number
we have in our data set. Now these are the upper
limits and there's 15 of them. The frequency
function is amazing. We're only going to give it
those upper limits and the data set, and it will calculate
all the frequencies. But this is a special type
of function called an array function, because it will
deliver all the answers simultaneously, which
means you have to highlight all of the cells in advance. Now another amazing thing
about the frequency function is if you give it
15 upper limits, it'll actually create
16 categories just in case there's some
number in our data set and we weren't paying
attention when we gave it the last upper limit. It will create one extra
category that says, hey are there any numbers bigger
than the last upper limit. So you have to
highlight in advance a number of cells that
equals exactly one more than the number
of our upper limits. Now in the active cell, I'm
going to type frequency, and all we have to do is
give it the actual data numbers, revenue numbers,
and our upper limit. So I'm going to scroll over,
click in the top cell, Control Shift Down Arrow, and
Control Backspace, scroll all the way
over, data array, comma, and the bins array. Those are the upper limits. Now we do not have
to lock these ranges because this is a special
function called an array function. It will be entered
simultaneously into all the cells. And when Excel copies it
down, it will not move. Now actually we saw this
array function in Business 218 when we discussed
array formulas. Close parentheses, and because
it's an array function, we have to enter it with the
special keystroke Control Shift and Enter. Now we used Control
Shift Enter to enter it into the cells, that
was us telling Excel that this is an array function. But look up in the formula bar. Those curly brackets
cannot be typed in. Excel put those
in automatically, and that's Excel telling
you that it understood that this is an array function. Hey, there's our count. That is absolutely amazing. In each one of these, count. That 2,507 means it counted
between 200 and 400. Now I actually explicitly want
to show you the categories here. Here's an explicit listing of
what the frequency function will do every single time. It will always take the
first upper limit and count everything less
than or equal to it. Notice that that's
sort of polite for them to program the frequency
function to do this. That means it will catch
everything below this limit, and then all of the ones in
between the first and last category, include the upper
limit but not the lower limit. And notice that's the opposite
of what a pivot table does. And finally, it's
polite because it will create this
last category just in case we weren't
paying attention when we created the upper
limit of the last category. It'll don't catch
any values greater than that last category. We were paying attention when
we created that so of course it caught 0. Another thing about
this array function, if you click in a cell
and hit the Delete key, you cannot delete it. That whole range of values
is considered one object. It's an array of answers. Notice it says you cannot
change part of an array. Not only that, but if you come
over here and try to insert a row, it's the same thing. You cannot change that. Those values are one object
stored together as an array. Now I'm going to
add some formatting, and I'm going to copy that
formatting over to these cells right here. Now I want to calculate a
total, Control Enter, and B Tab, Alt equals, and there's
our total, Control B. Now we can calculate our
percent frequency, that means each individual part. And we're going to take
the total, F4 to lock it, so each individual part divided
by the total, Control Enter, and copy it down. Now that's called a
relative frequency, or we're going to add a percent
number formatting so we'll call it percent frequency. And there we go. Now cumulative frequency,
that's something we haven't done in this
class, but we did it in a previous class. It just means, hey, I'm
going to take this one, and then for this one
I'm going to add both. And then for this one a
to add all three of them. Now there's a few
ways we can do this, but let's just do this this way. We're going to put the first
one-- because there's only one category-- and then for the
second we're going to say, equals hey, take whatever is
in the previous class plus whatever is new in this class. And that formula,
when we copy down, will calculate the
cumulative frequency. And when we get to
the last one, it's exactly equal to our total. In this case, since it's
0, this one is also. Now we can calculate percent
cumulative frequency. That means whatever the
cumulative total is so far divided by our total, and I'm
going to click on this one here. We can click on this
one here because it's the same number, F4 to
lock it, Control Enter, and copy it down. Now let's add some
formatting to this, Home and Percentage, two decimals. Now the cumulative
frequency and cumulative, you can pick any one
of these categories. Let's select this one. There's the upper limit, 1,400. That means 25,597
of the transactions were for $1,400 or less. As a percentage it means
96.51% of our transactions were for $1,400 or less. Right here, about 77%
of our transactions were for $600 or less. Now we're going to
create a chart from this, and the cumulative
frequency line that we get is even easier to understand. Now what I would like to
do is use this as our label on our horizontal axis. This'll be the column
height, and then we're actually going to
change this to a line, so we'll have a combo chart. Now we're going to
highlight, and I'm not going to highlight
the last category since we've got 0 there. And then I'm going to
hold the Control key and highlight that label
all the way down to 100%. Now I can go up to Insert. And we could use
Column, but let's try this new thing in 2013. There's a combo chart. You can see the preview,
that's not looking good. Wow, that one's looking good. So we're going to
choose this one. Even though there's
a mistake in it, it wasn't smart
enough to pick it up. Now notice what
it did down here, it thought that our revenue
upper limit actually was a column, so we're
going to fix that. We could actually fix that,
not going to Select Data, but to simply click on
the columns and Delete. Now that means we have to
right click, go to Select Data because we want
to add it back in, but as our horizontal category. You can see the dancing
ants dancing in there for both of our series. Those are the numbers. Click on Edit, and now I'm
going to select just our labels for all horizontal axis. And click OK. Click OK. Now I don't think we're
going to need a chart title, so I'm going to delete that. I'm going to click on the
legend, Control 1, that opens up-- oh yeah,
there it is from earlier. I want this at the top. That will kind of serve
as our chart title. We definitely need labels. Axis Titles, click on
the Vertical Axis Title, equal sign to jump up
to the formula bar, and I'm going to type
frequency and Enter. Click on the
Horizontal Axis, equal, and I'm going to say the revenue
upper limit to clearly describe what that is. Now we can come over and
select our columns, Control 1. Notice it's right
here, gap with 0. And you've seen me
do this many time. I go to the Fill and
do vary color by point. But in a pivot table,
that's just not there. I'm going to do a solid fill,
something like a dark blue, and I want a border, solid line. And I'm going to select black
just to give it some emphasis there. That's chart junk right there. You could stop right there. That's beautiful. Revenue upper limit,
that describes that. Frequency, that describes that. The percent cumulative
frequency, right so this label up here is describing
what that is. Now that actually is
quite a beautiful chart. I'm going to dragged
this up here, and then I'm going to
Control C, click down here, and Control V to copy it. Now let's change
it up, because this is the art side of statistics. I'm going to come over here
and say, let's say, Data Label. So now it's going to get messy. I have numbers there, so I'm
going to click on the frequency and delete. But that deletes the
actual data there, so I'm going to Control Z. I
want to learn a trick here, because I want to keep this
here so that the data is still in the chart. Control 1 with
the axis selected, and we're going to go
down to Number Formatting, and this is a trick
from Business 218. You can actually-- there's
an actual number formatting that says show nothing. So what we want to do is still
have this vertical axis here, but show nothing,
and it's semicolon, semicolon, semicolon. From 218, custom
number formatting, there's positive, negative,
0, and text sections to custom number formatting. So when you list all four
sections, three semicolons with nothing in there,
it just means no matter what you have
there show nothing. So I'm going to click
Add and there it is. I want to do the same thing over
here because if I delete it, it collapses the chart, Control
Z. So I'm going to Control 1 with that axis highlighted. Scroll over here,
Number, here's our trick. Number Formatting, semicolon,
semicolon, semicolon, that's a great trick. Any time you want to go to
your friend's spreadsheet and highlight all the cells
and add this number formatting, no matter what they put in
the cells, it won't show up. No no, that would be terrible. You're going to
know how to fix it when someone else tricks them. All right, click Add. Now this is looking
pretty messy, right? I want to try and
select these labels. And over here, let's try above. Well that still pretty messy. Go down to Number, Let's see
if we can add a percentage. So the 0%, no decimals, and Add. That's looking sort of OK. I'm thinking I'm going
to try and select one of those 100s,
Delete, maybe another one. Notice when you click and select
something like data labels, it selects them all. But when you select
a second time, it allows you to select
just a single one. We could actually
probably remove those. This is rounded, right? There are so 99.6s and 99.7s,
however you want to do that. Maybe we'll just
delete all those. And this last one
right here, I'm going to try and get to font. And then we saw this in an
earlier video, the task pane over here doesn't have font. You either have to click
Font, Control Shift F, or go up to the
Home ribbon, right? And then there's a font color. I'm going to change it to white. So we have two different
charts that are articulating the same information. And what this means is
you can pick any point. So 94%, you come down here
and there's the upper limit. 94% of our transactions
were for $1,200 or less. 62% of our transactions
were for $200 or less. That's the percent
cumulative frequency. Our frequencies,
this is actually an extreme, but typical
pattern for retail data. Most of the transactions
are for a very small amount. And as you get further into
the upper transaction amounts, there's fewer and fewer
actual transactions. All right, so in
this video we saw how to create a bunch of
things-- frequency, percent frequency, cumulative frequency,
and percent frequency-- all based off of this
amazing frequency functions, only using the upper bins
and the special categories. And then we saw how
to create two charts to visually articulate
the frequency and cumulative frequency. All right, when
we come back, I'm actually going to show
you two other ways to accomplish the same task. We'll use the data
analysis add-in feature and a pivot table. All right, we'll
see you next video.