The One and Only Data Science Project You Need

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys it's nate here with some advice if you're trying to figure out your next data science project let's talk about the one and only project that you need to build that will help you gain full stack data science experience and impress interviewers on interviews if your goal is to jump start your career in data science let's break down the components of what a good data science project includes and exactly what an interviewer is looking for and why they're looking for it i'll also let you in on a secret about this data science project and why i think it's the best one out there and the only one you need to actually do so watch until the end to hear about what this is so if you like content like this please subscribe to this channel now let's get started so one piece of advice before we start talking about the components of a good data science project let me tell you about two things to stay away from when you're trying to find a project number one avoid any analysis on the titanic or iris data set it's been done to death and i don't care about your survival classifier number two as you gain more experience you can start to migrate away from kaggle so avoid kaggle it's to me too commonplace too ordinary everybody does it so unless you can rank in the top 10 i just stay away from it great so with that out of the way let's start talking about the components of a good data science project again i'll break down the components of a good project and tell you what the interviewer is looking for and why they're actually looking for it but basically as a summary what an interviewer is looking for what i'm looking for is a data scientist with real world skills real world relevance skills in both coding and analytics but also in using modern technologies and tools this is going to get you closer to becoming a full stack or fully independent data scientist so here's a quick breakdown on the components of a good data science project so number one working with real data number two working with modern technologies like apis and databases in the cloud number three obviously building models number four making an impact getting validation and i'll explain a little bit about application frameworks towards the end of this video all right so now let's talk about each component in detail so component number one working with real data specifically with data that gets updated in real time streaming data working with real data that users produce and working with data that is produced in real time helps prove to the interviewer that you know how to work with relevant data and timely data you're not analyzing some data set that was produced in 1912 like the titanic data set right you're basically working with data that was just produced and data that's updated frequently so having said that you're probably asking well how do i get a data set like this so that's a perfect segue to component number two using modern technologies in industry so how are you gonna get that real life data set that is updated in real time you can use apis to collect that data almost all apps and all platforms use apis to basically pass information back and forth learning how to use configure setup apis to get the data that you need for your analysis shows the interviewer that you have relevant keyword relevant data science skills to be able to do your job effectively some popular apis for example are twitter google analytics youtube netflix amazon basically a good api for data analysis will include real time updates data and time stamps for every record geo locations are really nice to have and obviously numbers or text so you can actually do an analysis so for other api examples refer to the links in the description so the skills you're trying to learn when you're working with apis are these number one learn how to set up and configure apis in your code for example dealing with api tokens number two learn how to use libraries like various python libraries that will help you make api calls and number three how to work with data structures like json and dictionaries to help you collect and save the data from the apis all of these skills are skills that you'd be using on the job from day one as a data scientist so as an interviewer if i know that you have these skills i would start seeing you more as an experienced data scientist than somebody that's just starting off and this is basically a leg up and a bonus point to have on an interview so now let's talk about the second modern technology to work with databases in the cloud so once you collect your data from an api and maybe after you clean the data a bit you probably want to store it in a database why well number one because like i mentioned before the data that you're grabbing from an api is updated regularly so if you pull the data again from the api you're going to get new records so instead of just pulling the entire data set again and cleaning the entire data set all over again it would be nice to just pull the new records clean that and then store that in the database and so basically you'll just be storing all of your clean data in that database and adding new clean records every time you make an api call number two every company uses databases and many use cloud services like amazon web services aws and google cloud so having the knowledge on how to build a data pipeline with a cloud provider is a great skill set to have and it will set you apart from other data scientists again if i was interviewing you and you have this experience i'd be very impressed because i know that you can hit the ground running and make an impact from day one all right so component number three this gets us to the part of a data science project that you thought was probably the most important building models so it's definitely really important to learn how to build and implement a model whether it's a regression model or some sort of ml machine learning model and that's kind of why i told you to start with kaggle because i feel that kaggle will give you the experience you need in terms of building models so if you just don't have a lot of experience building models kaggle is a great starting point but while gaining experience building models is important there's another aspect that's even more important it's understanding the decisions you make and why you make them while building your model so here are some questions you would need to answer when implementing your model you'll need to be able to eloquently explain your answers to these questions on an interview otherwise no matter how good your model is nobody's gonna be able to trust it so here are some of the questions number one why did you pick your model why that model what are you trying to accomplish with this model that you couldn't do with others number two how did you clean your data why did you clean it in that way what type of validation test did you perform on the data to prepare for the model tell me about the assumptions of your model how did you validate those assumptions how did you optimize your model what were the trade-off decisions that you made how did you implement your test and control tell me about the underlying math in your model and how it works what you don't see in this line of questions is how your model performed i don't really care about that as an interviewer i care about your thought process and how you made decisions and i care about if you understand the underlying math of your model so lastly how do you know if you've built a great data science project your project should make an impact you should have some validation from others i understand that you're doing these projects to gain more experience and improve your skills but the job of a data scientist is to help others by turning data into insights into a recommendation that can make an impact on the business so how do you even know if your insights and recommendations are valuable if you're building in isolation and not showing others you need to show others your work and build something that they would find valuable so there are three ways to do this the easiest way the first way is to share your code with others that are part of data science communities there are various subreddits out there like data science and machine learning that would be happy to review and look through your code you can just put your code in a git repo and share your project that way but because you're just sharing code it might not get the best engagement from the community so another way the second way is to output your insights in the form of visuals and graphs build nice looking graphs that people want to take a look at share your graphs and write up your insights in some sort of blog article form you can share your articles on various data science publications like towards data science on medium or again through various data science subreddits and lastly the hardest way is to learn an application framework like django or flas deploy your application using a cloud provider like aws or google cloud and serve your insights that way your insights could be an interactive dashboard that you built using plotly that users can kind of interact with or it could be a simple api that users can connect to to grab your insights and recommendations this is obviously the hardest most involved way to share your work but it's worth it if you want to become a full stack data scientist and gain some software development experience any interviewer any data scientist would be super impressed if you have this skill set the main point in all this is just to show that you built something valuable and that people find it interesting show the impact of your work your teammates and the interviewer would be really impressed guaranteed all right so i ran through all of the components here are the components for a good data science project again working with real data working with modern technologies like apis and databases in the cloud building models and lastly making an impact and getting validation possibly from building an application so you're probably thinking that this is a lot of work and it includes so many different skills that it's gonna take you years to master and the answer is yes it's supposed to take you years to master all of these skills to become a very good data scientist but the great part of these components is that you can master them independent of each other meaning that you can learn all about databases and get good with that and then switch over to apis and master apis and then so on and so forth so after a while you just would basically master them all and so now we come full circle from the intro what is the secret to all of this the secret is you don't need to do multiple projects to master these skills this is basically one big data science project you're building a data science infrastructure from end to end and learning the entire data science process so once you build the entire infrastructure end-to-end like connecting and grabbing data from an api to cleaning data to then storing it on a database to building a model to then having a visual as an output you can use the exact same framework and infrastructure to do other analyses the only thing you probably need to do is just slightly refactor and revise your code so for example if you want to analyze a new data set using another api you can just use the same code just revise it slightly to connect to another api and pull new data in you can use similar code and techniques to clean your data and push it into a new database table but it's a database that you already have running in the cloud so there's no more setup or configuration that's really needed so really once you have that infrastructure set up you can just do various other projects learn various other models using the exact same framework with just simple revisions so my advice is just to keep iterating keep improving and keep building to build something that others would find valuable so that's it for me i hope this becomes your next data science project is going to be the only data science project you're ever gonna really need to build and it's definitely a project that would impress interviewers on your next data science interview alright so please leave a comment if you have any questions subscribe to this channel if you like content like this until next time see you guys at the next video
Info
Channel: Nate at StrataScratch
Views: 97,007
Rating: undefined out of 5
Keywords: data science projects, data science project, data science projects for beginners, data science project ideas, data science projects for resume, data science projects from scratch, data science, data science interview, data science interview preparation, data analysis, data scientist, data analyst
Id: c4Af2FcgamA
Channel Id: undefined
Length: 13min 4sec (784 seconds)
Published: Wed Feb 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.