House Prices Prediction on Kaggle.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is important I'm a science student Internet Explorer teacher Science Academy today I'll be taking you through the house price protection competition of our table the data set is based on a city in aim and cold ends in Iowa the United States of America before me starts predicting house prices we needed to go through exploratory data analysis we needed to dig deep into the data and standardise ahead prepare data before prediction there were crucial steps that we needed to go through which is removing outliers dropping some horns replacing missing values creating them variables and standardizing our data now what does a normal person think about when they want to buy a house factors that come to mind when they wanna buy a house they think about how big the house is how many bedrooms the house has right so we looked at the correlation of the features of my house and the same price of a house and we found that the sale price and the overall policy are highly correlated with a correlation of zero point seven nine as well as the grand living area correlation of zero point seven one we then decided to use the sales price versus the grand living area to remove some on lies in our we've seen here that we have two extreme outliers with the ground-living area between 4,000 and 6,000 square feet and the cells price just under three hundred thousand US dollars this could be because of mentions there are now old and priced low this is the plot after we have removed our archives one of the assumptions is for our dataset to be normally distributed so we looked at the distribution of the sales price looking at this plot we see that it is highly skewed to the right we then Rock transformed our distribution and it is now somewhat normal we then visualized our missing data we see that the poor miscellaneous features onion fence have a lot of missing values we needed to deal with this so the first thing that we looked at we look at a Google map and Google Earth sorry picture of aims we see that these two matters these two matters mark appoo so according to a data set in this picture people in Ames don't really have a lot off who so we decided to drop this feature another feature that we decided to drop is fences these people are very very trusting this looks like a very safe area so we decided to drop the fence feature because we didn't think it affects the sales price another thing that we also did to impute our data we replaced some missing values with the mean categorical variables with none and numerical variables with zero the last thing that we needed to prepare our data for prediction was to standardize the data after we standardize the data we then used three regression models we used to each last oh and elastic Nets this is just the picture of how the elastic net model did on our testing data this is the predicted data in the training data you see that it is a good prediction and the last row a list elastic net indeed did really well on the testing data on cable with a route to mean square error of 0.1 to 1 better than reach and lasso of 0.123 and reach of 0.131 this is a very good technique to use when you want to assist your times and finding houses that meet their means thank you [Laughter]
Info
Channel: MD
Views: 205
Rating: 5 out of 5
Keywords:
Id: HzatnVb9fV8
Channel Id: undefined
Length: 4min 53sec (293 seconds)
Published: Thu Jun 06 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.