Five Data Science Project Ideas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up everybody this is jay from interview query and data science j and today i would love to go over some personal data science project ideas with you guys i think projects are super important and i've highlighted them and definitely in other videos because projects are actually what get you seen if you're a beginning data scientist they are what keeps you up to date if you're an existing data scientist and they're just generally fun to do because a lot of the times you're tasked with something that you're doing on your own creatively and you're applying data science to it right and you can do this for anyone of your own interests if you don't have any interest then i'm here to give you some project ideas awesome uh so generally i kind of want to get more into the background of like why it matters to actually think of your own project to do i've seen a lot of resumes and a lot of them will have generally like a project section right in the middle and it'll highlight something like uh you know analyze the titanic data set analyze the movie predictions data set right and they're all generally ideas and projects that have already been done before most of the time off kaggle or they've been taken like a clean data set and they did it on some other side of like data science course or whatever and just analyzed it did some exploratory data analysis built a model boom done uh and it kind of does it doesn't look good i mean for one if you don't actually link to what you did then i don't even know if you did it or not um and then two it just kind of shows that um it lacks some creativity too with it right i mean these are great for learning these are great for learning purposes when you already don't know any data science and you follow a template and you see how they uh do the like exploratory analysis how they do the model building but at the end of the day uh what actually matters is overcoming obstacles right a good data science portfolio project uh one that we're gonna go over uh generally consists of a couple of different things right and i think the main points are that you do stuff uh you choose a project that no one has done before number one you clean the data on your own and you get it in like some sort of fashion such as scraping or piping it from an api to and then you build something useful that generally doesn't exist that allows someone to learn information that they didn't know or they get to use some sort of application with it uh three i did those in reverse whatever doesn't matter uh so how do you make a better personal data science project right um so step number one let's start with the problem statement that is actually useful even if you don't know how to solve it right um something if you actually do something that is uh let's say like an easy simple dom linear regression but you do it off data that no one else has or no one else has uh knows about or has ever seen before like in some sort of news you are kind of like a journalist right you have an edge you have an edge in something you have data that no one else really has or didn't even think to analyze and now you apply a simple linear regression and you show the coefficients and you say okay this is why this thing matters or this is why that thing matters right let's take for example uh craigslist rental pricing right when i did a project with the seattle uh rental analysis uh based on the craigslist housing prices at that time of rents no one else had ever done it before the data had already existed people already kind of knew what it was about but at the same time it was so easy anyone could reproduce it but it was just an easy linear regression i didn't do any fancy neural nets deep learning i didn't know any of that stuff i still don't know any of that stuff right but effectively uh when you do give people an opportunity to read about something almost like it's a article in that fact i mean it was an article but it was done by me who was just like a college student and uh being able to do that actually effectively showed that i would take one the initiative to uh analyze some data two i would actually scrape it and then three actually try to build a model that had some insights off of it all right step two uh always do the full stack data analysis uh and what do i mean by full stack data analysis i mean by uh going from project end to end this shows that you can one clean ugly data to get the ugly data by scraping it three create like a data pipeline to a database from your scraper or from like an api that you call four eventually analyze the data provide some shiny graphs that look cool that no one else has ever seen before that provides the end user with new information and then five possibly if it actually matters uh build a web app with it uh or something that you can actually demonstrate uh to the public so if i take my old example i didn't actually build a web app out of the seattle rental analysis uh but if i would have all i would have done is i had someone do something like where you enter in your address and then output like the estimated rent for the number of bedrooms and bathrooms something like that is just at one providing value and then two you're actually showing the inner workings of it people love the how did i built this how is this actually done here's the end result because it's really cool projects like that are very very cool they're very interesting uh and people love to engage with it and eventually if you could showcase that you've done all this got some views then people that will actually be viewing the project may also be hiring managers that will hire you as well uh number three the third step is to go over some data science ideas so i wanted to take some time to go over uh some ideas i have that you guys can start with uh analyzing data and providing as a project uh today even if you wanted to all right the first project idea that i have is around uh live events and this one is pretty doable depending upon how well you are how good you are at scraping right uh and so effectively i think one thing that's been super interesting over the past few years has been uh basically sorry past few years because of uh how much the live ticket marketing industry has gone up uh in the secondary market but also because of the past few months how much it's crashed because of the fact that people are uh not going to events anymore right from the pandemic and it's super sad but you can see that there's still events being played right now and there's all this data from it and so interestingly enough there's a pretty easy way to get access to this data so i go to this network tab right and i do a refresh i can see that uh oh well i can't see it anymore weird let's do it again refresh check it out so now we're looking at this network tab i'm preserving the log but look here's the listings all we have to do is grab this copy of the link address put into a new tab if my computer won't die on me and boom look at this all this json on all the tickets that you can find on seekgeek for specific events for specific ids uh check it out uh analyze this data save it into a database check out what you know concerts are the most interesting what are the sports events that are most interesting how much are each concert and how much is each uh you know sport event cost um i find this super super uh fascinating how the concert industry basically functions as like a stock market um especially on the secondary market which sea geek is a part of cool uh let's check out the next one these this is one of my favorite websites twitter uh and twitter itself has a very very interesting api right twitter data can be accessed through the twitter api which is pretty simple and easy to use if you look at for the python api right here if you look at the documentation we can see that a lot of this stuff comes from basically uh you have to create like your own oauth key but for the most part you can search a lot of these tweets and get back uh different actual uh tweets and so i think if you search for different keywords that are around uh things that you're interested in such as um something that's very trending right now is the black lives matter movement you can see these different things uh kovi 19 has always been around for a while these kinds of threatening topics make for really really good data and i think a lot of the interesting usage comes from how you can visualize this data and how we can use this data to understand exactly what is important what the keywords are doing natural language processing analysis on it all of that stuff all right my next favorite website for because i love looking at apartments and houses but i would say that well rental analysis is really cool and i still do it to this very day another interesting thing is just the rent the pricing of different items on secondary markets again but just for stuff like bicycles right like how do you know how much a good bicycle is going to cost if you are a bicycle enthusiast you probably have an idea you have some features that you can use if you're not then you really don't have anything else and i think creating some sort of feature list and model to honestly give an uninformed user like me uh an understanding of why this by cost three hundred dollars uh or and if it's a good deal or if it's a shitty deal right uh and personally i think this is really interesting for anything um surfboards this is what i'm primarily interested in why does this board cost 400 and why does this cost uh 200 right um so these things are generally variable and it's really helpful to understand why all these different kinds of items cars surfboards bikes just anything generally a little bit expensive uh will actually be uh differently priced and find out what's actually a good deal at the end of the day all right lastly uh one of my favorite websites that i check every single day is hacker news right and so why is hacker news why do why do things go on hacker news if you aren't familiar with hacker news hacker news is basically a uh forum basically news board that surfaces the most interesting content almost like reddit except specifically surfaced and specific to tech and so i think all their data is very easily scrapable if you go on github right now you can find tons of hacker news scrapers um that effectively take uh you know that are written python and all you really have to do is go in and clone like the scraper and effectively apply it to hacker news to get like the number of points something is how long it was posted to go and uh all the comments if you wanted to as well as the title uh and i think doing a really interesting natural language processing analysis on this uh would really go to show like what does actually get surfaced uh on hacker news and get upvoted by people uh within the tech industry right and this is like an interesting problem because uh you have all the stuff that doesn't generally get uploaded all this new stuff that goes on and on and on that people post that only gets one point three point uh and then you get all the stuff that goes viral on hacker news and i think this kind of stuff is uh super interesting uh given the fact that there's been a lot of work done with it on reddit specifically a lot of research papers that have been done on it but any kind of amateur analysis is also very very doable as well and i think that having this option and having all this available data that you can download over a course of a period uh will definitely increase um and probably go like make the article that you write go viral on hacker news at least i'd hope so uh that is generally uh kind of a wrap up for kind of different kinds of data science projects that you can do i hope that gave you a good amount of ideas on how to basically check out different interests how to apply your own interests into analyzing and creating a project out of this kind of data most of those things i just showed you today were my interests and i think i'm sure that you frequent different websites on your own you have different hobbies and i would implore you to really really think about what your hobbies are and how you can apply that to data science merging two hobbies together is generally almost always a good idea and i think that because uh it fuels the kind of interest as well as the creative learning process uh for yourself um you'll find that working on it will be way more interesting uh than any other kind of uh data science project that you'll find on kaggle that is just templated for you that won't provide as much as a good enough experience as if you're just finding your own kinds of project to work on and so definitely think about how you can one do a project end to end two clean the data uh build something showcase something present some sort of analysis at the end or a web app and then three uh provide some sort of edge towards what you're building uh when you're actually thinking about the idea and understand that uh at the end of the day you know data science is the best when it's a passion project uh and these kinds of passion projects generally never die out um it just takes some time to ponder about it uh but don't lose hope uh if you can't think of anything yet because i'm sure eventually uh you'll find yourself questioning something one day and then just thinking how can i apply data science to that right all right thanks for watching if you guys have a data science idea please add it in the comments uh and then like and subscribe this video and i'll talk to you guys later
Info
Channel: Data Science Jay
Views: 15,467
Rating: undefined out of 5
Keywords: data science project ideas, data science ideas, data science projects, data science scraping, data science job search, data science career, data science career tips, data science portfolio projects, data science, data science for beginners, data science python tutorial
Id: uzWpKXDZ6ME
Channel Id: undefined
Length: 15min 26sec (926 seconds)
Published: Thu Jul 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.