Welcome to Highline BI
348, class video number 48. Hey, if you want to
download this workbook, BI Chapter 04 and follow
along, click on the link below the video. Hey, we've done a
lot in this chapter. We've been doing linear
regression, scatter plots, creating estimated equations,
R squared, SST, SSC, all sorts of calculations. And we did it all longhand. And we did it with Excel
formulas and functions. But we got to look
at a great feature. Here's our x, that's
weekly ad expense. And here's our y, weekly sales. And we're hoping to predict
weekly sales based on weekly ad expense. We're going to do
all the calculations we did so far by simply taking
our data set, going up to Data. And if you have Data
Analysis feature added in, you can use Data Analysis. If you don't have it
added in, its easy. Go to File, Options,
down here in Add-ins, you click down here, Go. And make sure that
checkbox is checked. Escape. All right, so you're ready? I'm going to click
Data Analysis. And down here is the
Regression feature, click OK. Now we're going to make
sure we got our y values. And I definitely want a
field name at the top. Control Shift Down Arrow to
highlight out all the way down. Tab, the x input. Let me use Control Home to
jump up back up to the cell A1. Click in B1, Control
Shift Down Arrow. So we have y and x. Now we only have one x here. Next couple of videos, we'll
do multiple regression with multiple x's. And it's just as easy as
putting multiple columns in. And it will do simple
linear regression with a single
independent variable x or multiple regression. Now we definitely always what
field names, because the output is hard to read unless you
have field names at the top. We definitely want
to click this when we have field names at the tops. Confidence Interval, we'll
talk about that later. But I'm going to do all the
features that we should use, even though we're not going
to look at all of them now. So I'm going to put 99 here. By default, Confidence
Interval comes up as 95. So if you put 99 and then
it does one for 95 and 99. Now I want residuals--
we talked about that. That's the difference
between the actual y value and your predicted value--
and residual plots. We're not going to talk
about them in this video, but a couple videos
ahead we will. But that's how we want
our dialog box to set up, except for one thing. You've got to tell
it where to put this. I'm going to say, hey, go to E1. Now one thing about
this dialogue box, if you run different
regressions over and over in the same workbook,
it'll always remember whatever you did last. So you've got to make sure and
visually look through and make sure you've got all
right ranges and cells. And then click OK. Now I'm going to mess this
up, because that is not the way it comes
when you spit it out. It comes like that with all
the columns squished up. So you actually always
want to highlight the columns, from
whichever the start is, all the way to the end. And then double click
between any two columns. Double click on
that Best fits it. Now I'm going to drag the
residual plot down here. We'll talk about that awesome
automatic charting later. But let's just go through and
see the things we already know. I'm going to
highlight these three. And I"m actually going
to add some yellow. And I'm going to put
some notes off the side. Now multiple r, that really
is r, or correlation. R squared, we know
what r squared is. Standard error, we just talked
about that in our last video. That is standard
error of the estimate. Observation, that's
our sample size. Some of the things we
talked about in last video-- and I'm going to
expand the size. I'm going to change the
number formatting here. So Control 1. Come over to Number. Use a comma. And I'm only going
to show 0 decimals. And click OK. Now this SS means
sum of the squares. So when you see total
here and SS up here, that means that
intersecting cell is sum of the squares total. When you see Residual,
that's error or residuals. Those are synonyms. So when you see SS and
residual, you know that is SSE. When you see regression
and SS at the top, you know that's sum of the
squares of regression, or SSR. Here's our estimate of variance. That's where we
take the actual sum of the squares of
whichever one it is divided by its degrees of freedom. That's our estimate
for variance. So really all of these things
so far we have discussed. And then when we come
down here, we only have a single x variable. So we have two
coefficients, they call it, or estimates of
population parameters. So there's the intercept,
and there is the slope. Down here, we have
our predicted values. Control Shift Down Arrow. And right here we
have our residuals all automatically calculated. All right, so that's
pretty fast and easy. If you're doing a lot
of linear regression, then this is a great feature. The only drawback is,
of course, it is static. If you want to change your
data and have everything update automatically, that's
where our formulas that we learned earlier in
the class come in handy. Now we will come back
and talk about what f is and the p-value for
the f and the p-value for our individual estimates
of our population parameters and the residual plots
in upcoming videos. But so far in the
class, there's a lot of calculations that come
out automatically when you use the Data Analysis
Regression feature. Now if you come
down to DA Reg 3, I have like a cheat
sheet here that tells you all the individual
cells and what they are. All right, next
video, we'll come back and we'll talk about the
amazing Line Est function that will actually spit out a
bunch of these statistics all in one swoop. All right, we'll
see you next video.