3 Proven Data Science Projects for Beginners (Kaggle)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well everyone can hear back with another video for you today I'm giving you three data science projects that you can do as a beginner these projects can help you learn data science through experiential learning which I really believe is the best way to pick up this field when starting out in data science it's difficult to know what to actually work on these three projects are concrete they have good documentation and they go through step by step with the tools you need and the implementation that you need to create a good project here these are all on Carole calm and Kaggle is a great site because it has a lot of different code from a bunch of different individuals who have tackled these problems before you can actually follow along with the work that other people have done who are good data scientists already and you can see how they tackle these very specific challenges in a previous video which I've linked above I talked about the four projects that you should do to get you a data science job now these three projects that I'm talking about in this video the beginner projects are precursors to those so you should be using these to learn Python to learn the data science concepts the language etc but I wouldn't recommend putting these on your resume unless you don't have any other project experience out there so after doing these three basic projects you can take what you've learned and you can apply them to new data sets now if you're using a unique data set or you're trying to solve a very unique problem that is a project that is worthy of your resume if you enjoyed this video please hit that like button it helps me out with the YouTube algorithm and as a data scientist I really care about those algorithms also please subscribe if you want to see more videos at the intersection of data science and sports analytics all the projects that I'm going to be discussing are found on Kaggle comm cago is probably my favorite resource for learning data science there's a ton of great datasets there and there's also this code of other people who are working on these problems that you can see and work through so if I was going to learn data science starting from ground zero I would be absolutely wearing out this website and learning from the tools and datasets there so in data science we're usually trying to solve one of three different problems so the first is a regression problem or to predict a continuous outcome that's generally like we're trying to predict the score of a basketball game based on the you know any variables that we would have associated with the games the second is a classification problem where we're either trying to figure out a binary outcome if a student for example is male or female or we're trying to fit them into a category you know is this student a heavy sleeper a light sleeper or do they not get any sleep the last type of problem that we're looking at is a clustering problem where we're trying to take unorganized data and make sense of it so we're trying to put people into initial groups if you're just beginning data science I'd recommend focusing on the first two types of problems the regression and the classification and this is generally where these projects are going to begin the first project that I would recommend is really focused on that regression problem so this project is on kaggle you can see it linked in the description below and we're trying to evaluate or estimate the price of houses based on the attributes of the house so if we look in the data and we look in the train set we can see that there's a ton of different information that we can plug into our model the number of bedrooms bathrooms if it's central air if these bathrooms are half or full if it has a fireplace all of these things can be used to make this estimation of the price we also see in the notebooks that many other people have tried to tackle this estimation problem so we have hundreds of different kernels where if you go in you can see how they went about trying to estimate this you can learn what packages they use how they write their code how they make visuals and I would recommend just going through a couple of the top ones of these to see what they do what they do and then to try and experiment with it on your own the second project that I would recommend doing is using this Titanic data set again link in the description below and it's a little bit more of it but you're trying to predict which customers survived the Titanic iceberg crash based on a couple factors so we're looking at their age their sex their their ticket their price what cabins they were in and where they embarked from so all of these can be used to try and again predict this outcome just like in the regression problem there are a ton of different notebooks out there you can see some of these have over 4,000 up votes and I would recommend going through them seeing what classification packages they use what techniques they're using and as you go through I would try and Google all of these different algorithms this will give you a sense of you know how these algorithms are used and what their constraints are you know some of them you might have to scale the data some of them you might not some of them might be linear some of them might be nonlinear so the third project I would recommend it might be more towards the intermediate level but it is a great introduction to neural nets and deep learning techniques so this is using the Mint's dataset which is a bunch of basically numbers that people have drawn and your job is to try to classify them based on the image so you your algorithm looks at the image it looks at the pixels and it tries to determine what number that is written there the data in this case is an image encoded as pixels so I believe there's 98 pixels in each of these images and you want to build a usually a neural net I believe you can use some other techniques as well to evaluate what number that is I would lean really heavily on the notebooks here see how other people have done this there's a bunch of different types of around that's and I would again for everything you see that you don't understand I'd recommend googling it looking at the documentation for it trying to find some some textbook literature about these things there's some good medium articles that might speak about some of these more complex algorithms in simpler terms so just keep learning keep absorbing and don't get frustrated when you don't know what some of these things are when it feels a little bit overwhelming that's part of the learning process I know when I was learning these things there were many times where I was like oh my god this is so complex I don't know if I'll ever understand this but eventually after I really took the time and I got over that fear when I came back to it later all these things seemed relatively easy and intuitive so don't be alarmed that that kind of fear and that confusion is pretty normal but you you really have to push to get through it again these projects are just the ground floor the the baseline for what you expect from someone who's learning data science I would say most data science courses will use any one of these projects to teach you the fundamentals and it's your job to apply these fundamentals to new data sets to new problems that you're trying to answer but to me it's always interesting to hear from my subscribers if you don't mind please comment in the section below with a project you want to do I'd love to hear about the truck the kinds of questions you guys are trying to answer I genuinely thank you guys for watching and good luck on your data science journey
Info
Channel: Ken Jee
Views: 207,310
Rating: 4.9844575 out of 5
Keywords: Data Science, Ken Jee, Machine Learning, data scientist, data science projects, data science project, data science for beginners, data science projects for beginners, deep learning, how to, kaggle, kaggle projects, kaggle data science, kaggle beginners, kaggle projects for beginners, data science python, data science project in python, data science project ideas, data science project portfolio, python, data science portfolio, 3 proven data science projects for beginners
Id: 8igH8qZafpo
Channel Id: undefined
Length: 7min 34sec (454 seconds)
Published: Mon Feb 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.