Cracking Machine Learning Problems | Data Science Interviews

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey it's emma welcome back to my channel there are tons of machining courses and study materials out there how do you stay focused and be efficient in your interview preparation in today's video we will go over the four kinds of machine learning problems in data science interviews as well as some of the most common asked questions the four kinds of machine learning problems are machine learning basics motion questions from your resume machine learning coding and applied machine learning problems the first two types can appear in any data science interviews and the last two types are more common for algorithm or machine learning focused data scientist positions with this knowledge you can stay focused and not let amount of information overwhelm you let's get started the first kind of questions are the most straightforward and the basic ones they're really easy to prepare because they're usually covered in fundamental machine learning course and they don't require much hands-on experience to be able to provide a good answer the questions can be anything related to data processing machine learning models model tuning and evaluation for example one of the most commonly asked questions is what is overfeeding and how do you deal with overfeeding to answer the question first briefly go through the definition in two to three sentences for example overfitting happens when the learning power of the model is too high or the data size is too small the model ends up feeding the noise rather than the useful information of the data so the model performs badly on observed data sets to convince the interviewer that you truly understand the technical term you can give an example such as you will face an overfitting problem if you have a regression model and the number of data points is less than the number of features afterwards you can provide some common solutions to the problem such as reducing the learning power of the model or adding regularization to the model or increasing the size of the training data whichever solution you provide it's better to provide an example to illustrate it here are some other most commonly asked questions in this category what is an imbalanced data set how to deal with an imbalanced data set briefly describe the random forest classifier how does it work what are the pros and cons list out three evaluation metrics and describe their advantages and disadvantages when we use l1 regularization compared to l2 what are hyper parameters and how do you two model hyper parameters nowadays almost every data scientist has at least one motionline project on their resumes and you might have one as well if so you want to be prepared for the second type of questions which are from your resume specifically you will be asked to walk through your previous motion learning projects during an interview the interviewer will have you describe the project on high level and then dive into the technical details to evaluate your ability to apply much learning knowledge in practice it sounds easy right you may think i definitely remember the details of a project i have done but believe it or not many people fail the interview because they have used a library or package for model training and they don't know much about the algorithm they were using let's see what kind of questions will be asked in the interview for example if you have trained a house price prediction model using the xgboss package the interviewer may first ask you can you explain what is xgboost you should be able to give a short and clear summary of the algorithm in two to three sentences for example for gradient boosting you train a bunch of weak learners such as tree based models in each round of the training you feed the weak learner to the residue of all previous weak learners at the end the prediction of the model becomes the sum of all weak learners being able to summarize an algorithm may not be enough you want to provide some technical details about it for example the speed of training the active boost model is usually faster than other boosting algorithms because of the way of selecting the splitting features and the splitting criteria in each of the decision tree learners once you demonstrate that you have a good understanding of the algorithm the interviewer may also ask you details of the project for example why did you choose to use active boost over other regression models how did you select features how did you evaluate the models basically anything mentioned on your resume or during the interview could be questioned by the interviewer the best way to prepare those questions is to think through all the projects on your resume not only how you use those much learning models but also what they are what are the pros and cons why you use them versus other methods if you're able to answer all these questions it's easier for you to convince the interviewer that you have a good understanding of those algorithms and you gain knowledge and experience from doing the projects the next two types of questions are a bit more hardcore and they mainly appear in interviews for data scientist positions that are focused on motion learning for motion learning coding questions that evaluate not only whether you understand the theory of a motion learning algorithm but also whether you are able to code up an algorithm from scratch in a short amount of time typically the interviewer asks you to implement an orgasm using an online ide or on whiteboard i know it may sound a little daunting because there are so many machine learning algorithms and each has unique implementation but don't worry there are only a limited number of algorithms that appear in interviews some algorithms are too complicated to implement within one hour and it does not make much sense to test it during interviews here are the most commonly asked algorithms during this type of coding interviews for supervised learning decision tree linear and logistic regression and the k nearest neighbors for unsupervised learning the only one algorithm was asked frequently is k means clustering i recommend you to try to implement them by yourself first if you get stuck you could search implementations online make sure you practice a couple of times before the interview so you can write them quickly and bug free you also want to pay attention to the efficiency of your implementations because you might be asked to provide the time and space efficiency in big o notation during the interviews the last type of machine learning questions are the most challenging ones because they require you to have some real experience meaning that you need to be familiar with the whole workflow from getting the data to cleaning the data to building machine learning models to evaluating them and to shaping models to production typically the interviewer gives you an open-ended problem and asks you to come up with a solution about it the interviewer will ask you follow-up questions from any component of the workflow one example question is how do we detect spam emails first of all you want to clarify with the interviewer what data is available and what is a format then you could talk about a high level workflow that contains things that need to be designed for example you can say a typical workflow of a motion learning project contains steps of data collection data processing model selection and model evaluation then you can dive deep into each of the components and discuss the design with the interviewer if you have extensive motion learning experience you may have already developed a good sense on how to approach this kind of problem if you don't i'd recommend using kegel as a reference there are lots of machine learning problems and the solutions posted on kegel i put one example in the description below try to work on project by yourself once you are done compare your solution to other people's work so that you could learn from others make sure you understand the meaning of each step and you're able to explain it in plain english after training yourself on a few projects you could develop a good sense on how to answer this kind of questions now you know the four kinds of machine learning problems in data science interviews and how to prepare for each of them finally i want to share with you two tips that can be helpful for your interview the first tip is to give examples providing examples is the best way to demonstrate your truly understand technical term said interviewer asks you to explain what is a precision you can start by giving the definition precision is the number of true positive cases divided by the number of detected positive cases you can add an example by saying for example we have a copied test that returns 100 positive cases out of which 99 were true positives so the precision is 99 the other tip is to not mention anything you are not familiar with because everything you say will potentially lead to a follow-up question i understand you may want to impress the interviewer by using some advanced models or terminologies such as convoluted neural network but you might dig yourself into a hole if that's something you're not familiar with or you're not able to explain it clearly so there you have it those are the four kinds of machining problems in data science interviews hopefully those are helpful let me know in the comments if you have any questions feedbacks or any topic you would like to see thank you so much for being here i will see you in the next video
Info
Channel: Data Interview Pro
Views: 9,733
Rating: undefined out of 5
Keywords: Data Science Interview, Data Science Interview Questions, Machine Learning, Machine Learning Interview, Machine Learning Interview Questions, Data Science
Id: 21E-bUnGQQ4
Channel Id: undefined
Length: 10min 47sec (647 seconds)
Published: Thu Dec 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.