Basic Excel Business Analytics #44: Intro To Linear Regression & Scatter Chart

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Highline BI 348 class video number 44. Hey, if you want to download this workbook BI 348 chapter 4, or you want to download the PowerPoint slides, click on the link below the video. Hey, in this chapter-- and I'm going to hit the F5 key to start that PowerPoint-- we're going to talk about linear regression. Now regression just means we're going to have a method to take sample data and create a model that will help us predict, and the model will be represented by a straight line. Next slide, wow, we have a lot of awesome topics associated with linear regression. There they all are. Now, next slide, we want to talk about making decisions based on a relationship between two or more variables. So managerial decisions are often based not just on a single variable, but on a relationship between two or more variables. Hey, we might want to be able to predict or estimate sales based on advertising expense. Now notice we have a y there and an x there. So the y, that's the predicted variable, will be the sales. And the x, the independent variable that will help us predict, will be advertising expenditures. Another example, what if we wanted to make a prediction about the annual amount spent on a credit card? Why? Based on, whoa, one, two variables, household annual income and education. How about making a prediction about the price of a bike based on the bike weight? Another example, how about predicting a stroke based on age, blood pressure, and smoking. Next slide, hey x-y data, that's what we're going to be doing with this whole chapter. x is the independent variable. y is the dependent variable. Hey, x is called a predictor variable because we'll have an equation, and we'll throw an x-value in and it will calculate the y-value. So that makes x the predictor variable. The dependent variable, y, then is the variable that is predicted or estimated, sometimes called the response variable. Now here's two data sets that we'll be dealing with a lot throughout this chapter. Hey, here we have weekly ad expense, an x-variable. That's the one that will try and get to predict the y-variable, weekly sales. Now this is the raw data. We collected this sample data and we'll use this to create a model. Another example, here's the y, annual credit card charges y. And then we have an x, annual income, and number of years post high school education. So hopefully we can build a model based on these two x's. That means we plug-in an annual income amount and number of years, and the model will predict annual credit card charges. Now again, this is the sample data. From this sample data we'll create the model. Next slide, regression analysis, hey it's just a statistical procedure used to develop an equation showing how two more variables are related. It will allow us to build a model slash equation to help us estimate and predict. Now the entire process will start with taking an initial look at the data-- that will be like a scatter plot-- and then creating the equation, which has a bunch of steps, and then assessing whether the equation fits the sample data. We'll use a measure called r square using statistical inference to see if there's a significant relationship. We'll use hypothesis testing, and then we can predict with the equation. Now one that you want to be careful with regression analysis, it will not prove a cause and effect relationship. All it will do will help us to create a model equation that can help us estimate or make predictions. Now, simple regression, when you hear simple in front of regression it just means we only have one independent variable. Linear regression, when you see linear that just means the model's going to be represented by a straight line. So when we study simple linear regression, that means one independent variable with a straight line. When you hear multiple linear regression, multiple just means two or more independent variables. Linear of course means straight line. There are definitely curvilinear relationships. We will not cover those in this class. Next slide, this is where we start. Anytime we get a data set, we'll plot it, and look or see if it looks like they're a relationship. Now this is the end result of the charting engine in Excel. The actual first step is just getting the blue markers. All of the process that gets us our model or equation will create that line and that equation. Now in the charting engine, we'll see that there's a fast and easy way to visually show the line and the equation. So now we want to jump over to Excel and do step number one, which is to plot and look or see if it looks like there is a relationship. And then we can proceed with the rest of the process. Now we have our x and our y. And if we're going to use the charting engine to plot it, you want to have the x first and then the y, Control Shift Down Arrow to highlight all that. Now, one thing is, if I make a chart, it will show up down here. Now you can either use your scroll bar to scroll up, or you can use a little tick Control Backspace, to jump back to the active cell. And now we go to Insert. And there it is, the scatterplot. We click on that, and for sample data you use the markers. If you have a model like we did earlier in the class when we did our fixed cost variable cost, then you use the line. You can see that is ridiculous for sample data. So now we click on that, and there it is. Now this is a completely bad chart, because there's some numbers, there's some more numbers, but I have no idea what they are. So we immediately-- the first step is you click on the green plus and say, please show me the axis titles. There it is. It's highlighted. So I type in equal sign. That shoots me up to the formula bar, and this is y. The y is always plotted on the vertical axis here. I click on that cell and hit Enter. There it is. Now I click down here. When I see that solid line I can type an equal sign that shoots me up to the formula bar. I click on my x and Enter. Man, that's already looking much better. Now I want to give this a good chart title, so solid line all the way around. Equal sign, and I happened to already have my chart title in cell E1. So I click on the E1 and Enter. And there it is, step one of linear regression when you have two variables is to plot and see if there's a relationship. Now clearly, it looks as x increases, y increases. That means as we increase weekly ad expense, it looks like weekly sales are increasing. As we go out the x, it tends to go up. Now with the charting engine, there's a fast and easy way to actually show the line, the equation, and a measure called r square. Simply right click any one of the dots and point to Add Trendline. And instantly. there's kind of a slight line in the background, but this pane opens up. There's linear. I come down here and display equation on chart. And I want to show r square. Now I'm going to click and drag this over to the side. I want to click on the horizontal grid lines. I see those translucent circles, and I hit Delete. Now I'm going to click on the vertical lines-- I see those translucent circles-- I hit Delete. I'm going to try and select the line. Now if you accidentally click like right there and you can't quite get to the line, all of the chart elements are easily accessed by going up to Format. And over to the current selection you click the Dropdown. And notice we always name everything smartly like our field names at the top. We know there's the weekly sales, there's the weekly sales trendline, there's the weekly trendline equation. I'm going to select the trendline one, and you can see, sure enough, it's selected. Otherwise, if you very carefully click, you can kind of see a little marker right there and right there. Now we can go over to our bucket of paint, and I want to say the line is solid. And I'm going to change the color, because right now it's the same color. I'm going to use red. That's looking much better. Now I want to select the box with the equation and r square. Now that's font, so I can come up. There's a few ways, go up to the home ribbon, change the font color. Now we'll talk about what the slope means, what the y-intercept means, and what the r-square, the goodness of fit, means later in the class, but that's a quick and easy way to do it. Now if you're going to use these numbers in formulas and equations and things like that, you want to do it in the cells not from the chart. Now what kind of relationship is this? Well we kind of already hinted what the relationship-- as x increases, y increases. That means it's direct. Up, up, any time the x goes up, the y goes up. Now let's go over and look at scatter two. We want to create a second scatter chart. Here it is. We have from the internet, I went out and searched for BMX bikes. And here's a bunch of entry level BMX racing bikes. Here's the weight of each in pounds, and here's the price. And we want to see, is there a relationship between the weight of the bike and the price? Now in racing, the lighter the bike, in general the better the bike is, so I would think there would be some relationship. I'm going to highlight, and notice we have our field names at the top. Those will show up as names for the series of numbers. So you always want to have field names. Insert, Scatter, and this is sample data so we click on the markers. Now look at that. That's got a lot area with no markers. We'll deal with that in just a second. But always, the first thing, click on the green plus, Axis Titles. There is the y. I type in equal sign price, and enter. Click on the x-axis, equal sign, I'm going to click on bike weight, enter. Chart Title, I'm selected, I see that square line, equal sign, and I'm going to click in A1 and enter. Now here, in this case, I would like to change the minimum value on this horizontal axis. So I'm going to select the access, Control 1 and change the task pane, and there it is, min and max. I'm going to change the min to say 15 pounds. Now I think I'm going to change min into 19, and enter. Now one thing that's very important we'll learn later, we see that there's about sum min and about sum max for our sample data, that'll be very important. That'll be call the experimental region. And when we make predictions, we'll always want to limit our x-values that we put into our equation to the min and the max. Now I'm going to right click and point to Add Trendline. There it is, linear, display equation, and r square. Drag this off to the side, and you have to see that solid line with your move cursor, click and drag. I don't think I want either one of the vertical or horizontal grid lines. Notice when I click there, that's not what I want. If I delete, it will delete everything. So I'm going to try and click the lines again when I see those translucent circles, delete. Now this is a completely different relationship than the one we saw just a moment ago with our ad expense and sales data. Here it looks as bike weight increases, the price of the bike decreases. And that makes sense, right? The heavier the bike, the lower the price. Or if we do it backwards, as we decrease the price and it becomes a better and better race spike, the price goes up. Now this one is called a negative relationship, or indirect, or inverse. If we go back over to scatter one, this is called a direct or positive relationship. Positive, negative, and we'll have measures to measure those relationships later. Now I want to go over to the sheet scatter three. Now here, we have a couple of examples. This is a team baseball data set. Things like team name, the season batting average-- that's where the whole team-- runs scored for the whole team, and there's one, two, three, four plots. We're wondering if there's a relationship between team batting average and runs score. Well it sort of looks like direct or positive as batting average increases, runs scored increased. Over here, how about team at bats and team home runs, it looks like more and more less of a relationship, right? When the dots start to be all over the place, doesn't look like there's much of a relationship. Here's walks as the x and strike out says the y. Again the dots seem to be all over the place. How about team batting average and team on base percentage? Looks like there's somewhat of a direct relationship. Now let's scroll over to the side. And one thing you want to be careful of, here it's a data set, and it doesn't really look like there is much of any direct or indirect linear relationship. But it looks like there may be some other type of relationship. Now again, we're not going to cover a curvilinear relationship, but for our linear example it doesn't look like there's a linear relationship, but there might be some other relationship. Now I want to go to one summary slide over in our PowerPoints, types of relationship, looks like direct or positive, inverse indirect or negative, and no relationship. So here we have hours studied as our x and test score for a particular class as our y. So as hour studies increased, test score looks like it's increased. That's direct or positive. How about number of absence in the class in the final grade? It looks like as number of absences increases, final grade decreases, inverse indirect or negative. And then here, we plot a particular data set that had customer age and amount spent, and it doesn't look like there's much of a relationship. Now when we come back in our next video, we'll talk about not a graphical way to determine if there's a relationship, but a numerical measure. And we'll talk about covariance and correlation. All right we'll see you next video.
Info
Channel: ExcelIsFun
Views: 10,741
Rating: undefined out of 5
Keywords: Excel, Microsoft Excel, Highline College, Mike Girvin, excelisfun, Michael Girvin, Mike excelisfun Girvin, Excel Magic Tricks, Business Analytics, BI 348, Data Analysis, Scatter Chart, Visually check to see if there is a relationship, Positive Relationship, Negative, Relationship, Trendline
Id: 9OgCPUy7Qok
Channel Id: undefined
Length: 15min 46sec (946 seconds)
Published: Wed Nov 25 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.