Is Data Science For You? | Guide From a Professor

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
are you wondering whether a career in data science is suitable for you I have compiled six questions to assist you in determining it if the answer is yes to all of them you would likely enjoy a career in data science I've been working as a data scientist for almost a decade and teaching this material for 4 Years yet there are still many things I am not as familiar with as I would like to be for instance I do not know too much about the Practical aspects of database engineering and I am not as knowledgeable as I should be about deep learning I'm trying to learn as much as I can through online and inperson courses reading Publications and blogs and experimenting with new tools learning in this field never ends not even for professors the truth is that no one knows everything there is to know about data science because it is too vast and rapidly evolving so you constantly need to develop yourself I personally find that very exciting but that's not necessarily true for everyone so you need to have a mindset where you are open to learning new things if you want to succeed learning new things comes with a certain level of discomfort because it's not easy to change how you think and you need to be fine with failing and making mistakes as you learn if developing machine learning and deep learning models is what you enjoy the most do not become a data scientist consider machine learning engineer positions instead I can tell from personal experience that data scientists spend as much as 70 to 90% of their time preparing data sets for machine learning they engineer features collect external data and the like only a small amount of time is spent on actually developing models data sets are messy the information you need to solve a problem might be scattered over many databases sometimes you need a small subset of points or features from a database sometimes you need to collect external data and you will almost always need to engineer new features to improve the predictive power of your models these tasks are difficult and open-ended which brings us to the next question working on data science projects is not like solving homework assignments assignments are usually pretty well defined you know there is a correct solution the instructor is looking for and you can compare notes with your fellow students in contrast data science projects are almost always ambiguous the problem might not be well defined at all your manager or the company stakeholders might not have a clear idea of what problem they actually want to solve often it is your job to determine the problem and phrase the question it's also true that you do not know in advance if the question can be solved because you might be the first one trying to solve a particular question with a proprietary data set one problem can be solved in various different ways and it might not be immediately apparent which approach you should take and you might not have enough time to try them all this ambiguity can be pretty frustrating for some people because if your model doesn't perform as well as you would hope it could be that your data it is not good enough to answer the question maybe there is a bug in your code maybe the tool you use is not appropriate for the problem or maybe you just need to engineer more and better features it can be difficult to figure out what exactly is the problem scrappiness comes from the fact that sometimes you will be given really tight deadlines to answer questions and you won't have time to be thorough usually you will just have to do the best you can given the time you have available and then move on to the next question or next project they want you to work on for perfectionists this can be really unsatisfying and it takes some getting used to stay tuned until the end because the final question will surprise you math is absolutely crucial for data scientists statistics probability linear algebra and calculus are all necessary if you want to succeed in this field you might be asked to perform various statistical analysis like hypothesis is testing exploring data sets requires some basic math skills machine learning models are based on linear algebra and calculus and you will need to understand how these models work what the pros and cons of various models are which hyper parameters are important if you hope to apply them successfully my students often ask me why should we know how machine learning models work they are all implemented in Python packages like psychic learn caros and tensorflow while that is true there is a lot more to machine learning than import psychic Lear I spend a lot of time in one of my courses illustrating what kind of Errors one can make if they are not careful and don't understand the mathematics and numerical algorithms behind the models information leakage convergence issues and mistakes in Cross validation are just a few examples while the packages are nice and allow people to quickly and easily train models without necessarily understanding what's going on behind the scenes this can also be quite dangerous strong communication skills are another requirement for successful data scientists consider this you might develop the best models and produce the most mind-blowing results but no one will care if you cannot communicate your results effectively this is also where the business needs of the company come into play first you need to understand the business needs of the company then you need to explain to non dat scientists how your results and insights will benefit the company if you cannot do that they will not Implement and use your models unfortunately the Divide between Technical and non-technical people can be pretty huge for example I was once explaining my data analysis results to Business Leaders and they asked me to explain what a logarithm is because I use log AIS on one of my visualizations [Music] I think this point is often overlooked it's called data science not data feelings not data opinions or not data beliefs your job is to make decisions based on data and experiments constantly questioning your beliefs and change them if new evidence presents itself always ask yourself the fundamental question of rationality why do you believe what you believe what do you think you know and how do you think you know it Let me Give an example students work on a final project in one of my courses their task is to find a data set and develop a supervised machine learning pipeline using it sometimes they chose a large data set with hundreds of features and they want to drop some features because they don't think those features are important wrong they have no evidence to support their claim is just a feeling or an opinion usually motivated by trying to take shortcuts this type of thinking is quite dangerous they have no evidence to support that the features are indeed unimportant if they were to drop those features the performance of the model could suffer as a result which can have severe consequences what I tell them is to use all features and train the model most machine learning models have B to measure how important features are in making predictions once they train a model they can use those tools to measure the importance of the feature and then decide if it makes sense to drop some and finally retrain the model and make sure the performance indeed does not drop if you answered yes to all six questions congratulations you would likely enjoy a career in data science let's continue your journey and learn about three things you need to know to become a data scientist
Info
Channel: Data Science Cross-Validated
Views: 11,383
Rating: undefined out of 5
Keywords: data science, machine learning
Id: h5e2lSl8cws
Channel Id: undefined
Length: 8min 23sec (503 seconds)
Published: Tue Feb 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.