Welcome to Highline BI
348 class video number 44. Hey, if you want to download
this workbook BI 348 chapter 4, or you want to download the
PowerPoint slides, click on the link below the video. Hey, in this chapter--
and I'm going to hit the F5 key to start
that PowerPoint-- we're going to talk about linear regression. Now regression just
means we're going to have a method
to take sample data and create a model that
will help us predict, and the model will be
represented by a straight line. Next slide, wow, we have a lot
of awesome topics associated with linear regression. There they all are. Now, next slide, we want to
talk about making decisions based on a relationship
between two or more variables. So managerial
decisions are often based not just on
a single variable, but on a relationship between
two or more variables. Hey, we might want to be able
to predict or estimate sales based on advertising expense. Now notice we have a y
there and an x there. So the y, that's the predicted
variable, will be the sales. And the x, the
independent variable that will help us predict, will
be advertising expenditures. Another example,
what if we wanted to make a prediction about the
annual amount spent on a credit card? Why? Based on, whoa,
one, two variables, household annual
income and education. How about making a prediction
about the price of a bike based on the bike weight? Another example, how
about predicting a stroke based on age, blood
pressure, and smoking. Next slide, hey x-y
data, that's what we're going to be doing
with this whole chapter. x is the independent variable.
y is the dependent variable. Hey, x is called a
predictor variable because we'll have an equation,
and we'll throw an x-value in and it will calculate
the y-value. So that makes x the
predictor variable. The dependent variable,
y, then is the variable that is predicted or
estimated, sometimes called the response variable. Now here's two data
sets that we'll be dealing with a lot
throughout this chapter. Hey, here we have weekly
ad expense, an x-variable. That's the one that will
try and get to predict the y-variable, weekly sales. Now this is the raw data. We collected this
sample data and we'll use this to create a model. Another example, here's the y,
annual credit card charges y. And then we have an x, annual
income, and number of years post high school education. So hopefully we can build a
model based on these two x's. That means we plug-in
an annual income amount and number of
years, and the model will predict annual
credit card charges. Now again, this is
the sample data. From this sample data
we'll create the model. Next slide, regression
analysis, hey it's just a statistical
procedure used to develop an equation showing
how two more variables are related. It will allow us to build a
model slash equation to help us estimate and predict. Now the entire
process will start with taking an initial
look at the data-- that will be like a scatter
plot-- and then creating the equation, which
has a bunch of steps, and then assessing whether the
equation fits the sample data. We'll use a measure
called r square using statistical
inference to see if there's a significant relationship. We'll use hypothesis
testing, and then we can predict with the equation. Now one that you want to
be careful with regression analysis, it will not prove a
cause and effect relationship. All it will do will
help us to create a model equation that can
help us estimate or make predictions. Now, simple regression,
when you hear simple in front of regression
it just means we only have one independent variable. Linear regression,
when you see linear that just means
the model's going to be represented
by a straight line. So when we study simple
linear regression, that means one independent
variable with a straight line. When you hear multiple
linear regression, multiple just means two or
more independent variables. Linear of course
means straight line. There are definitely
curvilinear relationships. We will not cover
those in this class. Next slide, this
is where we start. Anytime we get a data
set, we'll plot it, and look or see if it looks
like they're a relationship. Now this is the end result of
the charting engine in Excel. The actual first step is just
getting the blue markers. All of the process that gets
us our model or equation will create that line
and that equation. Now in the charting
engine, we'll see that there's a fast and
easy way to visually show the line and the equation. So now we want to
jump over to Excel and do step number one, which
is to plot and look or see if it looks like there
is a relationship. And then we can proceed with
the rest of the process. Now we have our x and our y. And if we're going to use the
charting engine to plot it, you want to have the x first and
then the y, Control Shift Down Arrow to highlight all that. Now, one thing is,
if I make a chart, it will show up down here. Now you can either use your
scroll bar to scroll up, or you can use a little tick
Control Backspace, to jump back to the active cell. And now we go to Insert. And there it is,
the scatterplot. We click on that, and for
sample data you use the markers. If you have a model like
we did earlier in the class when we did our fixed
cost variable cost, then you use the line. You can see that is
ridiculous for sample data. So now we click on
that, and there it is. Now this is a
completely bad chart, because there's some numbers,
there's some more numbers, but I have no idea
what they are. So we immediately--
the first step is you click on the green
plus and say, please show me the axis titles. There it is. It's highlighted. So I type in equal sign. That shoots me up to the
formula bar, and this is y. The y is always plotted
on the vertical axis here. I click on that
cell and hit Enter. There it is. Now I click down here. When I see that
solid line I can type an equal sign that shoots
me up to the formula bar. I click on my x and Enter. Man, that's already
looking much better. Now I want to give this a good
chart title, so solid line all the way around. Equal sign, and I
happened to already have my chart title in cell E1. So I click on the E1 and Enter. And there it is, step one
of linear regression when you have two variables
is to plot and see if there's a relationship. Now clearly, it looks as
x increases, y increases. That means as we increase
weekly ad expense, it looks like weekly
sales are increasing. As we go out the x,
it tends to go up. Now with the charting
engine, there's a fast and easy way to actually
show the line, the equation, and a measure called r square. Simply right click
any one of the dots and point to Add Trendline. And instantly. there's kind of a slight
line in the background, but this pane opens up. There's linear. I come down here and
display equation on chart. And I want to show r square. Now I'm going to click and
drag this over to the side. I want to click on the
horizontal grid lines. I see those translucent
circles, and I hit Delete. Now I'm going to click
on the vertical lines-- I see those translucent
circles-- I hit Delete. I'm going to try
and select the line. Now if you accidentally
click like right there and you can't quite get to the
line, all of the chart elements are easily accessed
by going up to Format. And over to the
current selection you click the Dropdown. And notice we always
name everything smartly like our field names at the top. We know there's
the weekly sales, there's the weekly
sales trendline, there's the weekly
trendline equation. I'm going to select
the trendline one, and you can see, sure
enough, it's selected. Otherwise, if you
very carefully click, you can kind of
see a little marker right there and right there. Now we can go over to
our bucket of paint, and I want to say
the line is solid. And I'm going to change the
color, because right now it's the same color. I'm going to use red. That's looking much better. Now I want to select the box
with the equation and r square. Now that's font,
so I can come up. There's a few ways, go
up to the home ribbon, change the font color. Now we'll talk about
what the slope means, what the y-intercept
means, and what the r-square, the goodness of
fit, means later in the class, but that's a quick
and easy way to do it. Now if you're going to use
these numbers in formulas and equations and
things like that, you want to do it in the
cells not from the chart. Now what kind of
relationship is this? Well we kind of already
hinted what the relationship-- as x increases, y increases. That means it's direct. Up, up, any time the x
goes up, the y goes up. Now let's go over and
look at scatter two. We want to create a
second scatter chart. Here it is. We have from the
internet, I went out and searched for BMX bikes. And here's a bunch of entry
level BMX racing bikes. Here's the weight of each in
pounds, and here's the price. And we want to see, is there a
relationship between the weight of the bike and the price? Now in racing, the lighter
the bike, in general the better the
bike is, so I would think there would be
some relationship. I'm going to highlight, and
notice we have our field names at the top. Those will show up as names
for the series of numbers. So you always want
to have field names. Insert, Scatter, and
this is sample data so we click on the markers. Now look at that. That's got a lot
area with no markers. We'll deal with that
in just a second. But always, the first thing,
click on the green plus, Axis Titles. There is the y. I type in equal sign
price, and enter. Click on the x-axis,
equal sign, I'm going to click on bike weight, enter. Chart Title, I'm selected, I see
that square line, equal sign, and I'm going to
click in A1 and enter. Now here, in this
case, I would like to change the minimum value
on this horizontal axis. So I'm going to select
the access, Control 1 and change the task pane,
and there it is, min and max. I'm going to change the
min to say 15 pounds. Now I think I'm going to
change min into 19, and enter. Now one thing that's very
important we'll learn later, we see that there's
about sum min and about sum max for our sample data,
that'll be very important. That'll be call the
experimental region. And when we make
predictions, we'll always want to
limit our x-values that we put into our equation
to the min and the max. Now I'm going to right click
and point to Add Trendline. There it is, linear, display
equation, and r square. Drag this off to
the side, and you have to see that solid line
with your move cursor, click and drag. I don't think I want either one
of the vertical or horizontal grid lines. Notice when I click there,
that's not what I want. If I delete, it will
delete everything. So I'm going to try and
click the lines again when I see those translucent
circles, delete. Now this is a completely
different relationship than the one we saw just a
moment ago with our ad expense and sales data. Here it looks as bike
weight increases, the price of the bike decreases. And that makes sense, right? The heavier the bike,
the lower the price. Or if we do it backwards,
as we decrease the price and it becomes a better
and better race spike, the price goes up. Now this one is called
a negative relationship, or indirect, or inverse. If we go back over
to scatter one, this is called a direct
or positive relationship. Positive, negative, and we'll
have measures to measure those relationships later. Now I want to go over to
the sheet scatter three. Now here, we have a
couple of examples. This is a team
baseball data set. Things like team name, the
season batting average-- that's where the whole team--
runs scored for the whole team, and there's one, two,
three, four plots. We're wondering if there's
a relationship between team batting average and runs score. Well it sort of looks like
direct or positive as batting average increases,
runs scored increased. Over here, how about team
at bats and team home runs, it looks like more and more
less of a relationship, right? When the dots start to
be all over the place, doesn't look like there's
much of a relationship. Here's walks as the x and
strike out says the y. Again the dots seem to
be all over the place. How about team batting average
and team on base percentage? Looks like there's somewhat
of a direct relationship. Now let's scroll
over to the side. And one thing you
want to be careful of, here it's a data set,
and it doesn't really look like there is
much of any direct or indirect linear relationship. But it looks like there
may be some other type of relationship. Now again, we're not going
to cover a curvilinear relationship, but for
our linear example it doesn't look like there's
a linear relationship, but there might be some
other relationship. Now I want to go to one summary
slide over in our PowerPoints, types of relationship, looks
like direct or positive, inverse indirect or negative,
and no relationship. So here we have hours
studied as our x and test score for a particular
class as our y. So as hour studies
increased, test score looks like it's increased. That's direct or positive. How about number of absence in
the class in the final grade? It looks like as number of
absences increases, final grade decreases, inverse
indirect or negative. And then here, we
plot a particular data set that had customer
age and amount spent, and it doesn't look like
there's much of a relationship. Now when we come back
in our next video, we'll talk about
not a graphical way to determine if
there's a relationship, but a numerical measure. And we'll talk about
covariance and correlation. All right we'll
see you next video.