How to Learn DATA SCIENCE Ridiculously FAST

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey man i heard you broke into data science any tips on how to learn it hey yeah i did so in order to break through what you're going to want to do is rewrite gradient descent from scratch publish a paper on yolo v7 ideally you want to be able to train gt4 in under 20 minutes oh and you should probably attend some meetups yes uh maybe three meetups a day and make sure you can code in swift assembly it doesn't need to be this hard throw that in let's dive into it what's happening guys my name is nicholas chernott and in this video we're going to take a look at the exact process that's going to allow you to learn data science ridiculously fast now the cool thing about this is this is the exact same process that i use to go from being an accountant to a data scientist at a large tech company now make sure you stick around to the end of the video because i'm also going to be giving you a free cheat sheet that summarizes all of the concepts and all of the notes discussed in this video ready to do it let's get to it now you're probably wondering what's the first thing i should do to kick this data science journey off well the first and probably one of the best steps that you can take is to begin to learn python now there's a whole heap of different programming languages out there that allow you to build and work with data science concepts but python is probably the easiest one to get up to speed and the great thing about it is that there are a whole heap of advancements being made in python that allow you to work with some really cool use cases for example tensorflow pie torch and a whole heap of the natural language advancements being made by hugging face are available in python so it's a great first language to learn and it's easy to get up and started with so make sure you take a look at different data types and data structures how to write functions how to build classes just to name a few now the other thing that you should probably also do is familiarize yourself with the different development environments that typical data scientists use now my best advice for this is to learn jupiter notebooks jupiter labs and potentially one other integrated development environment so you might choose pycharm or vs code just to get started so once you've got a reasonable understanding of python the next thing that you want to be able to do is to be able to identify different types of data science tasks think of these data science tasks as different workflows outcomes or end products that come as a result of going through the different data science steps now the reason why being able to identify the different types of data science tasks is so important is because it's going to influence the data you collect the different types of processing that you do and the workflow that you're actually going to run through for that particular task in fact it influences the entire data science process now there's a bunch of different types of tasks involved when it comes to data science and we will cover these a little bit more in the modeling section but just know for now some of the most popular ones that you're likely to encounter include churn prediction sales forecasting sentiment analysis image classification and object detection some of these are reasonably advanced so you might want to leave these till a little bit later now the best way to find out about a bunch of different types of data science tasks specifically relevant to your industry is to just jump on to google and to search for machine learning tasks in x industry this is going to allow you to find out what particular types of tasks and problems data scientists in that industry are currently trying to solve now just as there's a whole bunch of different types of data science tasks and use cases there's just as many different types of data that you're likely to encounter when you're learning data science now these can broadly be broken down into two key categories these are structured and unstructured structured data typically refers to data stored in csv files dot txt files microsoft excel spreadsheets and traditional sql style databases now when you're working with these types of data as you're going through your data science journey you want to learn how to work with pandas and numpy pandas is a python library that allows you to read in tabular data and specifically structured data to be able to work with it and build data science workflows using pandas you can create read update and delete different types of data so the core crud life cycle whenever you're working with data numpy gives you the ability to transform and apply a whole bunch of different types of mathematical functions now on the other hand we've got unstructured data unstructured data is a whole big wide world of fun and specifically revolves around image audio and text-based data so there's a whole heap of development happening inside of the unstructured data space and this is typically where you're likely to encounter things like image classification object detection and working with pitch classification as well as a whole bunch of different natural language use cases in order to get started with working with unstructured data for images you're likely to work with opencv or pillow for audio it's best to take a look at the scipy library and learn how to work with spectrograms and for text there's a whole bunch of different libraries but my favorites which i'd recommend a new data scientist to learn are nltk so natural language toolkit for python as well as hugging face transformers these key libraries are going to give you a solid foundation as to how to approach different types of data when learning data science so once you've got a good understanding of the different types of data that you're likely to be working with so remember structured and unstructured data the next thing that you want to do is learn how to analyze and visualize your data now the core thing that you're trying to answer here is whether or not you've got sufficient data of a sufficient quality for a specific use case so remember everything when it comes to data science revolves around your specific use case or your task so you want to make sure that your data is of a sufficient quality to be able to go on ahead and do that now a great way to do this is to first up get a grounding and statistical analysis so you want to get a good grasp of descriptive statistics analyzing distributions as well as hypothesis testing now when it comes to descriptive statistics if you're working with structured data you're able to do a lot of that using pandas now you can also extend this out so if you're trying to analyze distributions i'd highly recommend you get a good grounding in matplotlib which is one of the most popular visualization libraries used in data science through python today if you're working with unstructured data so specifically audio you want to get a good idea of how to transform your audio to a spectrogram now typically you're able to do this using scipy and then treat it using similar techniques as you would for image-based data when it comes to image-based data a great idea is to get a good understanding of the opencv as this allows you to perform data transformations augmentations as well as allowing you to visualize your data and see if it's of a sufficient quality once you've analyzed and visualized your data the next thing that you want to do is get your data ready for modeling this is typically referred to as pre-processing if you're working with structured data what you're going to want to do is fill in your missing values set up your independent and dependent variables and split up into a training and testing partition these are all pretty standard steps that you're going to want to learn how to do as you're learning data science if you're working with text-based data what you're going to want to do is learn how to remove punctuation lemma ties your data that means taking your words and returning them to the base format as well as performing tokenization if you're working with image-based data particularly with tensorflow they've actually got pre-processing scripts that you can actually use to get your data into the right format whatever it is you want to learn how to pre-process your data really well as this is going to improve the quality of your model in the final run so you made it to the good bit models algorithms and evaluation this is where you learn to use your pre-processed data and apply different data models and algorithms to be able to solve your specific data science use case or task now i'm going to go out there on a limb and say that a whole heap of data scientists and practitioners want you to spend a whole heap of time learning how each and every algorithm works in great amounts of detail but my personal opinion is that you're better off spending your time learning how to choose and use specific algorithms to solve and produce the best possible outcome for your data now there's probably tens of thousands of different types of algorithms out there that you could potentially use for your data science use case but in order to save you a bunch of time i've gone ahead and linked to the specific ones that i use for specific data science use cases and tasks inside of the data science cheat sheet that i'm going to link to later on so this is going to save you a whole bunch of time when trying to choose which one to use now broadly machine learning models and specifically data science models and algorithms fall into two key categories these are supervised and unsupervised supervised models tend to have a defined outcome and use something called labeled data so what you're going to try to do when you're applying supervised machine learning is use a whole bunch of input features to be able to predict an output feature now in this particular case you need to have labeled data so say for example you're trying to forecast which customers are likely to leave a particular company something known as churn analysis you need to have a list of historical customers which did leave the company and didn't leave the company so you need label data to be able to try to perform supervised learning now a couple of other different types of supervised learning include churn prediction as i was just discussing sales forecasting as well as sentiment analysis now on the other hand we have unsupervised learning so you want to learn how to apply unsupervised learning as well now some of the most common forms of unsupervised learning include clustering so trying to group different groups of people places and different businesses together as well as anomaly detection where you try to find different types of outliers getting a good understanding of all these different types of algorithms and how to use them and when to use them is going to put you in really good stead as a data scientist now once you've gone and applied all of these different types of algorithms you need to learn how to evaluate them so knowing which algorithms to use and when and knowing how to evaluate their performance are most important when you're learning data science now this might be through a whole bunch of different types of metrics but more often than not the libraries and packages that you're going to use are going to have evaluation standards and libraries included in them now that that's done the next thing that you want to be focusing on is deployment and integration this is what's going to separate the good data scientists from the great the great data scientists are going to be focused on how they can use their machine learning models and end products to be able to generate positive outcomes for the business or startup they might be working for now in terms of how you might go about learning this there's two key things that i want you to take away so first up learn how to take your machine learning models and deploy them using existing cloud service providers so this might be ibm watson machine learning microsoft azure ml or aws sage maker keep in mind that a large majority of startups are going to have their own text app that they may want to use so get familiar with that if you have a particular company that you're targeting now the next thing that you should learn is how to implement your machine learning models and products using open source so this might be building a machine learning application using django building an api using flask or deploying your model using fast api being able to take your machine learning models and build end products and end outcomes is going to put you head and shoulders above the rest and this is normally referred to as being a full stack data scientist now last but not least given the fact that you're now armed with your razor sharp data science skills it's time to start focusing on some soft skills now by soft skills what i'm really talking about is domain expertise and presentation skills first up domain expertise having domain expertise is going to help you an absolute ton when it comes to applying for different data science roles this means that you've got some actual contextual knowledge as to what's happening in the industry that you want to work in so say for example you wanted to be a data scientist at a bank well then what i'd suggest is start reading about the different issues that are facing banks are they losing customers are they suffering from decreasing margins what is affecting them that you can potentially use your data science skills to help solve now a great way to get an understanding of this is to read blog posts and news articles about the industry and once you've done that take a look at how you might apply data science concepts and specifically the data science tasks that we described earlier to those particular problems then my best advice is to start jumping over to github and see if you can find practical examples of how you can solve those problems using data science and your new found python skill sets now the second soft skill is presentation skills so often i've seen absolutely brilliant technicians that are unable to convey their messages to a particular audience having great presentation skills is going to put you heads and shoulders above the competition because it's going to mean you're able to articulate different data science concepts and bring it down to a language that a regular business person or someone who's not a data scientist will be able to understand this establishes you as a thought leader in the field and make sure that you're able to demonstrate what you've actually done now as promised i said i'd be giving you access to a data science cheat sheet so all you need to do to be able to go on ahead and access that is just go to forward github.com knick knock knack forward slash data science cheat sheet or go on ahead to my github account and type in data science cheat sheet and it's going to be there for you to download for free so check it out and let me know what you think thanks so much for tuning in guys hopefully you enjoyed this video if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i release new videos on data science data science tutorials machine learning deep learning and all that data good stuff thanks again for tuning in peace
Info
Channel: Nicholas Renotte
Views: 12,002
Rating: undefined out of 5
Keywords: how to learn data science for free, how to learn data science in 2021, how to learn data science smartly, how to learn data science from scratch, how to learn data science on your own, how to learn data science with python, how to learn data science fast
Id: oLpBGtY-_sI
Channel Id: undefined
Length: 14min 7sec (847 seconds)
Published: Sun Mar 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.