The Easiest Introduction to Regression Analysis! - Statistics Help

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to quant concepts education this is a short 14 minute lecture on the introduction to regression analysis it is assumed that the viewer has little background in statistics suppose you've been in the same dead-end job for 10 years and finally decide that you need to scale up to improve your career you consider enrolling into University in order to gain some valuable skills that will hopefully improve your job prospects however your friend Sam has some thoughtful advice for you mate don't waste your time at uni my mate Jimmy is a PhD and he's unemployed interesting Sam continues Rob never went to school and he's loaded with cash Sam's advice sounds quite wise indeed but upon further reflection you realize that Jimmy is a kleptomaniac and is essentially unemployable and Rob is rich because he won the lottery but Sam has got you thinking how do we know whether more education will improve your career more importantly does education increase your salary how can we formally test whether there is a relationship between education and wages more importantly how much can we expect our wage to increase for every additional year spent on education that is is it even worth it to go and study education is expensive nowadays and reading hurts your brain a natural experiment would be the survey a sample of individuals and ask each of them how much they earn and how many years they have spent in school and record this down and then determine whether we can observe a pattern in their responses let's say you decide to survey 100 individuals in statistics it is important to make sure that our sample of individuals is representative of the population that is we must ensure random sampling this will allow us to make inferences or conclusions about the population at large each individual's response will produce a data point that is our random sample of 100 individuals will produce 100 data points for us to estimate our regression for example our first data point is sue she has 13 years of education and is currently paid $20 an hour our second data point is Peter with 20 years of education and an hourly wage of $30 and so on for the other 98 individuals in our sample now given such data what is the best way to present this information it's a scatter graph let's begin with a simple XY graph education is on the x-axis or the horizontal axis and 0 is on the intersection point and increases as we move towards the right wages is on the y axis or the vertical axis and 0 is on the intersection point and increases in value as we move up each point on the graph denotes an individual who we have surveyed for example this first person in our dataset has 13 years of education and earns $20 an hour another person in our sample has 20 years of education and earns a wage of $30 an hour the scatter graph allows us to present all individuals in our sample based on their combination of education and wage now how can we determine the general pattern in the data set that is can we observe any relationship between wages and education based on the points on our scatter graph a natural way of uncovering a possible pattern or relationship is by drawing the line of best fit the line of best fit is a line that best represents the general pattern in the sample a regression line is simply the line of best fit for a given sample now recall from high school maths that the equation of the line is y equals to MX plus b where m is a gradient or the slope of the line that is rise over run and B is the intercept of the line that is where the line cuts the y-axis in regression analysis we represent the line as y equals to beta 0 plus beta 1 X instead all that has changed is a notation beta 0 is the intercept term and beta 1 is a gradient or slope of the line in the context of this example y is equal to wages and X is equal to education now is there a relationship between wages and education we can see this by observing beta 1 the slope of the regression line if beta 1 is positive then there is a positive relationship between wages and education the more education a person attains the higher the wage now what happens if there is a negative relationship between wages and education the data from our survey may look like this we then draw the line of best fit otherwise known as a regression line as you can see the regression line is now downward sloping from left to right in this data set the general trend is that the more educated and individual the less they earn in wages so when there is a negative relationship between wages and education the slope of the regression line beta 1 is negative what happens if there is in fact no relationship between wages and education if that is the case our data may look like this if we were to draw the line of best fit the line would cut through the data as follows notice that the line of best fit is now a horizontal line so when there is no relationship between wages and education the slope of the regression line beta 1 is 0 recall that the gradient rise over run of a horizontal line is 0 now let's take a look at how we can use a regression line to make predictions suppose we analyze our survey results for wages and education and estimate the following regression line again beta 0 is the intercept term and beta 1 is a slope coefficient or gradient of the line many software packages such as Excel eviews Minitab etc can easily estimate a regression line for us now suppose X all estimates a regression line for and informs us that the regression line is in fact y equals 2/5 plus 1x or in this case wages is equal to 5 plus 1 times the number of years of education this tells us that beta 0 the incept of the line is 5 this means that the line cuts the y-axis at $5 beta 1 is equal to 1 which means that the slope of the line is 1 now let's make some predictions suppose we meet an individual who has just finished high school and has 12 years of education she would like to make a prediction of her hourly wage in order to estimate her hourly wage we simply make education equal to 12 in the regression equation wages or Y is then equal to 5 plus 1 times 12 this gives us a wage of $17 an hour suppose there is another individual who has 22 years of education and he would like a prediction of his hourly wage if he were to enter the workforce today again we simply let education equal to 22 in the regression equation his expected wage is 5 plus 1 times 22 this gives him a predicted hourly wage of 27 dollars an hour so what do we say about the relationship between wages and education for every one additional year of education wages is expected to increase by beta $1 per hour so in this case beta 1 is equal to 1 this means for every one additional year of education wages is expected to increase by $1 an hour how do we interpret beta 0 the intercept term when education is equal to 0 wages is expected to be beta 0 dollars per hour in our case beta 0 is equal to $5 so when education is equal to 0 wages is expected to be $5 per hour beta 0 in the wages and education context may be interpreted as a minimum wage if an individual had zero education he or she is expected to at least get paid five dollars an hour I'd like to now introduce the idea of the residual term let's again look at our regression line for wages in education as in the previous slide that has been estimated and beta zero is equal to five and beta 1 is equal to 1 notice that we now have the estimated regression line and the actual data points remember the regression line is used to make predictions of an individual's hourly wage based upon his or her level of education the red data points represent the actual information gathered from our survey of 100 individuals suppose we meet Jasmine who has 12 years of education we then use our regression model to make a prediction of her hourly wage as we saw in the previous slide 5 plus 1 times 12 is equal to 17 so our prediction for Jasmine's wage is $17 an hour however Jasmine in fact actually earns 22 dollars an hour and her data points it's above the regression line the residual is a difference between the actual wage and the predicted wage specifically it is the actual wage minus the predicted wage so for Jasmine her residual is five dollars we saw that our wage prediction for Jasmine underestimated her actual wage by five dollars does this mean our regression model is wrong the answer is no our regression model simply provides us with a prediction based on Jasmine's level of education so it's our best guess at her hourly wage given her level of education however in reality there are many other factors in addition to education that will affect Jasmine's hourly wage because these were not accounted for they are contained in the residual term that is they will cause slight errors in our prediction so the residual term contains other factors that affect wages but are not contained in the model in the context of wages and education other factors that may influence a person's wage are experience the more work experience you have the higher your hourly wage IQ the smarter you are the faster you learn on the job and presumably the fast you get promoted and earn a higher wage or height did you know that studies have found that taller people tend to get promoted faster than shorter people true story Google height and salary to find out for yourselves okay so what we've covered so far seems easy enough yeah well you now have a good foundation of regression analysis and the other more advanced concepts will now make sense much more easily let's recap what we've covered so far the regression line is simply the line of best fit it is a line that best represents a general trend or relationship in the data beta 1 is the slope of the regression line this is simply the gradient of the line rise over run it is interpreted as follows a one unit increase in the X variable will lead to a beta 1 unit increase in the Y variable beta 0 is a value of the Y variable when X is equal to zero in our example today it is the expected wage when an individual has zero education if beta one is larger than zero then there is a general positive relationship between x and y that is when x increases Y tends to also increase if beta 1 is less than zero then there is a general negative relationship between x and y that is when x increases Y tends to decrease if beta 1 is equal to zero then there is no relationship between x and y if X changes in value this has no effect on the value of y we also showed how a regression can be used to make predictions of Y based on our information about X in this lecture we estimated the following regression line now if we meet an individual with 12 years of education we can make a prediction about this person's wage by simply setting education equals to 12 in the regression 5 plus 1 times 12 is equal to 17 therefore our predicted hourly wage for this individual is $17 of course predictions aren't perfect and our prediction may be slightly off the residual is the actual value of Y minus the predicted value of y for example a predicted wage for someone with 12 years of education is $17 however the actual wage may in fact be $22 an hour this means using this regression our individual has a residual equal to $5 why do we have errors in our regression predictions because the regression model does not account for all factors that affect the Y variable in reality there an infinite number of potential factors that impact a person's wage the residual term represents all other factors other than those included in the regression model that also impact Y in the case of wages in education our regression model do not include the person's experience IQ height ability to network or whether he's a kleptomaniac these factors are thus included in the residual and will prevent our predictions from being perfectly accurate I hope you've enjoyed today's lecture please visit us at
Info
Channel: Dave Your Tutor
Views: 450,103
Rating: 4.8989205 out of 5
Keywords: regression, regression analysis, simple regression, OLS regression, simple OLS regression, regression help, quant concepts, regression analysis help, simple regression help, statistics help, econometrics help, statistics, stats help, simple OLS regression help, OLS regression help, quant concepts education, regression tutorial, OLS regression tutorial, regression analysis tutorial, simple regression basics, regression basics, OLS regression basics
Id: k_OB1tWX9PM
Channel Id: undefined
Length: 14min 1sec (841 seconds)
Published: Fri Apr 12 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.