Become a Data Scientist in 2024: A Professor's Guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you think you have what it takes to become a data scientist in 2024 think again I manage the data science Masters program at BR University and I can tell you that there are no easy steps to becoming a data scientist the journey can be brutal it requires a lot of hard work and the competition is fierce data science is applied so first you will need to decide in which discipline you want to apply your skills the Great great thing is that people who can work with analyze and make predictions Based on data are needed in most if not all disciplines so you have a vast range of options to choose from you might be interested in health care public policies banking Finance AI ethics and fairness technology agriculture research and development you name it it really helps if you narrow down your focus as early as possible because there usually are discipline specific requirements you need to be aware of it could be some specialized domain knowledge or experience you need to have before applying for example if you are interested in data science positions in biomed they might rightly expect you to have a solid knowledge of biology and medicine before you start the job and it takes a lot of time to learn those things other disciplines might use special tools and programming languages you need to know for example most data science jobs require your coding experience in Python r or SQL but certain disciplines might use metlab Julia or some other language instead it can take quite a bit of time to become familiar with the new language so do your research early on here is what I recommend you do once you decide on your discipline go on LinkedIn and find at least 10 or 20 relevant job ads and collect the required skills and experiences check which ones you already have and and start working on the ones missing from your resume this exercise will do wonders for your prep work and you will have a clear picture of what you need to work on before you start applying some requirements will be clearly stated in job ads such as specific programming languages certain tools the company uses and experiences they want you to have however other requirements may not be explicitly mentioned but will come up during the tell you process for example jobat usually don't specify that you need to know linear algebra or calculus but they might ask you to explain how linear regression or logistic regression models work and ask you to describe gradient descent during the technical interview these are pretty basic questions and if you don't have sufficient understanding of linear algebra and calculus you won't be able to answer them successfully here are the six most important knowledge areas stick around until the end because number four is what most people Overlook you will need a solid understanding of statistics and probability but the depth of knowledge required can vary significantly from job to job and discipline to discipline sometimes a basic understanding of probability general knowledge of the most important probability density functions basic statistical measures such as mean median percenti standard deviations and very basic hypothesis testing are sufficient however for certain jobs you need to be a statistics expert companies often conduct experiments and AB testing so you will need to delve much deeper into those areas if that is your Niche I hinted at this earlier but to succeed in any data role you will need to have a substantial understanding of mathematics the level of mathematical knowledge required can vary database Engineers might need the least amount of MTH by Machine learning Engineers typically require the most there are two main reasons why solid math foundations are crucial first the field is rapidly evolving with new tools and models being published all the time and you will need to be able to learn on the job Publications technical blogs and documentations often involve complex mathematical Concepts so being well prepared in this area is essential second you often need to decide when and why to apply certain algorithms and sometimes models break and require fixing without understanding the theory and the math behind machine learning algorithms you will not be able to tackle such complex problems most machine learning models use various Matrix operations and nonlinear Transformations if you are aiming to optimize these models you often use techniques like gradient descent forward and back propagation which are based on calculus developing a solid understanding of how deep learning models and generative AI work is impossible without a good grasp of linear algebra and calculus strong coding skills are also required in all data roles and the three main programming languages that appear most often in job interviews are python R and SQL each serves quite different purposes database Engineers use SQL to create collect and manage data sets are strengths lies in solving statistical problems and python is frequently used to develop machine learning models coding interviews are often a part of the hiring process and you might be asked to write code to solve various data set and algorithmic problems live without any tools like Google stack Overflow chpd or GitHub co-pilot to help therefore you need to know the syntax quite well and be able to debug code on the Fly this is an aspect that's often Overlook soft skills are absolutely crucial as a data scientist you will interact with various groups each with varying levels of Technical and coding proficiency it's essential that you are able to communicate effectively with all of them you will need to explain the intricate details of your work to your teammates but you will also need to convey the importance of the main results of your work to non-technical audiences and highlevel managers think of it this way you might develop the most excellent machine learning tool or technique but it will go unused if you are not able to explain it to others your ability to communicate complex Concepts in an accessible and engaging manner is just as important as your technical skills in ensuring the success and Adoption of your projects there are several other tools you might need in your job but usually they are not too difficult to learn version control with Git and GitHub is one example tools for reproducibility include cond or Docker some companies might use Cloud Computing Services developer environments could be Jupiter or vs code additionally some companies might use data visualization tools like tblo or powerbi the list goes on but the truth is that Learning Company specific tools should be the least of your worries this is because they are relatively easy to pick up compared to mastering the stats math and coding skills essential for your role as a data scientist these foundational skills form the core of your expertise while the tools and platforms are often more about adapting to specific workflows or project requirements once you have acquired all these skills and knowledge it's time to apply them in real world settings gaining practical experiences through internships or projects is key it's beneficial if these experiences are relevant to your specific niche when applying for jobs employers often prioritize relevant experience above most other aspects of your resume remember that in the field of data science your hands-on experience speaks very loud I practiced in a couple of kago data science competitions when I transitioned to the field and it was a great way to learn about various machine learning models but kagle usually gives you a clean data set which is ready for machine learning try to also gain some experience working with Messy real life data sets most often you will spend most of your time cleaning data sets and bringing it to a form that's suitable for machine learning as a university Professor I am of course biased here but I believe that the various in-person or online data science masters programs provide you with a structured way to learn and acquire skills you will learn a lot from your professors and you will learn even more from from your fellow students you can of course use the various online learning platforms too but you need to be extremely disciplined and motivated to go through the materials alone with no one to hold you accountable one data science specific online learning platform I use a lot is deep learning AI one of the specializations they offer is mathematics in machine learning which covers the most important topics in linear algebra calculus probability and statistics that you will need if if you have all these skills and you are ready to apply for jobs check out my previous video on how you can improve your resume let me know in the comment section down below which steps of this road map you are struggling with the most And subscribe to my channel data science cross validated for the latest updates
Info
Channel: Data Science Cross-Validated
Views: 4,100
Rating: undefined out of 5
Keywords:
Id: ORNcbd6YEOA
Channel Id: undefined
Length: 9min 48sec (588 seconds)
Published: Wed Dec 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.