Acing the Python Data Science Interview Questions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I don't actually use nothing dumped I don't be no pie hi everyone this is Jay from interview query and from the data science jet youtube channel and today I wanted to go over Python data science interview questions and the reason I wanted to go over it is because I feel like there's a lot of confusion about what exactly is a Python data science interview question how does it differentiate from the questions that you see on leaked code or hacker ranked or algorithm types questioned and is it going to be as simple as just understanding the different data types of Python what's the list what's the dictionary or does it get more complex right and so today I wanted to talk about how and what to expect when you're facing Python data science interview questions in the field also everyone please check out interview query it's a website that I made specifically to prepare data scientists for their interviews you can filter a bunch of data science interview questions by company run sequel queries discuss solutions and learn data science with our in-depth solution guides that we provide you can sign up at interview query comm and get the free weekly interview question in your inbox let's go over what the actual difference is between these kinds of data science interview questions versus a software engineering interview question right for both fields obviously they both use Python I think data scientists probably prefer it more and generally they have less languages to choose from but I would say that in general you know Python data science interview questions are more specifically focused around the data munging the more scripting at the analyzing and much around data itself and what do I mean by that right I would say that to really think about it software engineering interview questions are almost always focused around different types of data structures and algorithms right and they're focused on that because of the fact that software engineers many times actually have to become real engineers in a sense that they're actually building software that powers a website right they're building the infrastructure or setting up the data pipelines and they're making sure that everything is working well and actually with good uptime right data scientists on the other hand also do contribute to a lot of that many of the times they actually end up doing a lot more kind of scripting work right if you're a data scientist and you need to pull data from somewhere you can't do it in sequel or you just need to take in a dataset and you need to analyze it and maybe I'll put it somewhere else many times you're gonna be building these scripts with Python and also doing so and I rather quicker and more not exactly efficient but probably something that gets the job done and that's probably the main difference between data scientists and software engineers when it comes to these coding interviews is that the bar for actually doing these coding interviews is not as high for data scientists I can be at some different companies just kind of varies but on average I would say that generally a data scientist don't have to know the extreme levels of different kinds of operations different kinds of DevOps that you have to do different kinds of engineering and infrastructure work that many software engineers are expected to have and I could say that this can be outlaid pretty well with an example right so if you think about Facebook that's a website that always has to be up or else they're losing let's say thousands maybe millions of dollars for every hour that it's down right and so if you're a data scientist and you're building software many times they're not gonna have you actually write production code or code that gets shipped into an environment in which consumers have access to it right many times data scientists actually build potentially data pipelines or dashboards or even internal tools because it's is internal facing and there's not as much of a consequence if it fails think about it as like kind of a metaphor for building a bridge right so if you're an engineer and you're building a bridge for thousands of cars across you have to make sure that everything is up to spec and that nothing can cause this bridge to fail right but if your data scientists think about it when you're actually coding you're building a bridge for yourself when you need a cross like a river on a hiking trail right so you're taking some logs you're throwing them down it's good enough maybe you like in a few hours it'll wash away probably not or at least in a few months it might wash away as well when the tide changes or the the rainfall gets a little bit stronger but in general you're not really thinking about longevity and I think that's the main difference between engineering and data science is that while data scientists do have to think I wanted in fact in parts of their analysis or if they're trying to make things reproducible they don't really have to care too much about it in terms of code in terms of how long it takes to run in terms of general documentation if they're working on analysis type projects and if they're have to write tests to make sure that different edge cases don't break what they're building because a lot of this time they're actually building stuff for themselves all right now that we know the difference between the Python data science interview questions versus the Python engineering interview questions when can we expect to actually see these Python data science questions in different interviews versus actually getting like algorithm type questions right and I'd say that main difference is between the varying roles within data science so if you're more on analytic side or you're more on the scripting kind of like a startup data science side you might not be expected to be faced with these stringent algorithm type leaked code questions right how many times you'll be faced with a lot of these just regular Python data signs interview questions and which are expecting you to be able to at least write scripts build scrapers just hack your way around coding to basically get what you need from the data a lot of the times when companies will ask more of the leak code type questions will be for more machine learning focused roles or data science roles where you're expected to be coding in production right so that means that if you're actually going to be the one that ships out your model in terms of putting it into an API and to app into some sort of infrastructure they're expecting you to be able to then serve that amount of consumers that will be using that app right so if I am to pulling a model into let's say the Facebook newsfeed to change the newsfeed then I am expected to be able to serve probably millions of requests per minute because that's how many requests Facebook will get and I have to build code that will be that good needless to say Facebook data scientists don't have to do that and they have different kinds of roles probably more software engineers that have to do that but if you're working for a start-up in which they don't have that many different kinds of roles and the lines are blurred between what the engineers do and the data scientists do then they might actually ask you and expect that from you if that's what the role entails so when do these kinds of Python data science in questions then come up I would say to figure that out ask the recruiter read the Job Description understand exactly what kinds of requirements will be actually required of the data science role that you are applying to and you can generally bet that once you hear back from the recruiter I understand exactly what goes on the interview experience through interview query or Glassdoor someone will probably tell you if there are Python like interview questions on the interview so what are the different types of Python interview questions then and I'd say that this is definitely a wide range because of the fact that data science rules are so varied right some companies might just expect you to be able to like loop through a for loop and show that you can write fizzbuzz right more likely than not you're probably gonna have to do more than that and so generally from my expertise from looking through thousands of these energy questions I've categorized them into probably distinctly five different buckets right and I'll go through them right now so the first one that I've seen a lot of our statistical and distribution based questions right and what I mean by this is a lot of the times companies will want you to actually apply your statistical knowledge with Python to show that you can actually do both of these things combined it's kind of like killing two birds with one stone basically if you can actually do something like generate a histogram or sample from a distribution or a binomial distribution and do it in Python and write a function to do so that kind of shows that you know the fundamentals of statistics and you know how to apply that when you're actually coding number two I'd say that the second most that's also very likely to happen when you're also faced with a statistics question is a probability simulation type of data science interview question so this one is more about simulating a probability in the sense that if they're giving you a scenario about rolling a die or flipping a coin can you actually write Python code that simulates this probability occurrence like X number of times and it's not just specifically constrained to that example right a really good one I think is a very complex probability scenario in which able to decipher it just by hand or mathematically is difficult then you being able to create a function to then simulate a scenario many times is a good example of your understanding of both probability and Python so number three I say this one is probably the most common question that comes up and this is more just general Python kind of text processing string parsing data manipulation type questions right and so this kind of has to deal with just you're given a list maybe you're given a bunch of text can you go through parse it can you use regex do you know the basic functions of Python and they're testing this without actually literally just asking you do you know Python right this is probably like the baseline level of knowledge that you need to know in terms of being able to get a data science job because of the fact that a lot of the times for these Python interview type questions this is the lowest kind of bar that you need to reach and the fact that they know that you can take some data in a very basic data structure and just do fizzbuzz or that you can take a paragraph and you can count the number of words or you can take a paragraph and count the number of ants or is or the or strip all those words out so basic parsing string manipulation kind of tasks are really common and i'd say that they're generally put in technical screens to just kind of make sure that you know python and that you can't work around it in like a way or fashion that a normal person would so fourth type of category for a Python interview question is around Numbi functions and matrices and i have to say this one is probably the one that I hate the most because I don't actually use num num pi don't be no PI I don't know anyway numpy that much right I worked primarily in pandas and I'd say that a lot of more academic type jobs or just general like people that have come from academia like working with numpy because of the fact that they can represent matrices and different forms of mathematical models and such and so many times you'll find if you're working for a job that is very theoretical someone that one really wants to know your understanding of so I asked you to basically take two matrices and numpy and multiply them together or they'll make you add those two matrices or maybe they'll take this matrices and make you calculate the Jacobian distance and I don't even know how I pronounce that right right and so these types of questions are actually kind of common they're common enough to have their own grouping and I'd say that generally to prepare for these you just kind of have to practice them and so definitely you know sign up for interview query and check it out last type of Python interview question comes from pandas and so this is more of like the data munging types of questions many times you know when you're given a data set and you have to take it in two pandas you basically have to read it in as a CSV or read it in from a database and then analyze it right and so when you're doing this kind of feature manipulation maybe you're creating new features maybe you're one hot encoding your categories I don't know basically those kinds of general tasks are pretty common the knowledge around montz data scientists whenever they have to create models and so specifically I have bigger companies now also they have like specialized roles for this where people are basically tasked with taking an unstructured data and making some structure out of it and so these types of interview questions are there to actually test your knowledge of pandas and how you can manipulate the data and so expect some general questions to be around be able to create features so if you have let's say a column that's basically has a bunch of numbers from like 1 to 100 can you bucket that Colin and create like more columns that are like 1 through 10 10 through 20 cetera also like it's a sub setting right so let's say that you have another category column with different colors and then you have another column that is different kinds of dogs so basically they'd ask you something like can you return all the rows that are just dogs that have green fur right that made no sense but in general they basically want you to apply different kinds of conditions show that you can manipulate data and do all that sorts of things and prove that you generally know how this works lastly I want to share some notes and tips for how to approach a Python data science interview question when you're actually facing it interview right and I would say that the most important piece of advice that you can get is to just practice a lot right the easiest way to get better at Python is just practicing I think for these types of data science interview questions the actual upside of it is that you can practice by doing a lot of cool projects analyzing data going through and just doing the normal stuff that you would do in your day to day because these types of interview questions are actually very indicative of what you would do on the job which is a lot of this analysis of data general string parsing that you would do to clean the data and etc I would say more likely than not though you probably won't find yourself generating you know binomial distributions just on the fly and so that might be something that you may need to practice a little bit more of check out interview query and some sort of laying next always try to clarify up front the question that's being asked by the interviewer a lot of times you won't know if you can use a package or not you don't know if you have to build an algorithm from scratch generally just kind of ask upfront can I use pandas can I use numpy do I have to generate this random variable by hand a lot of that stuff will help especially if you find out that you're doing it all by hand and you could have just imported it know by third I'd say solve an easy problem first right you need an easy win even if it's not the most efficient a lot of the times data scientists will let you go because they just want to see that you can code without reading something online and so I think a lot of the times it's all about understanding how to effectively demonstrate that you can code and you can do so with generally clean syntax always think out loud and communicate I can't stress this enough the worst candidate is the one that doesn't say anything and just does their coding and then at the end expects that everything that you did was right and so you don't communicate you don't get your points across and the interview doesn't know what you're thinking definitely always try to slow down and lastly admit if you don't know all these things are more about effective communication when you're actually doing this process remember that every single interview is always a conversation it's not just a straight-up test of your skills and so remember to ask questions and then if you don't know anything and always slow down don't I've had first without scoping out the problem awesome if you guys have any more questions please leave them in the comments or shoot me an email definitely I would love to hear it get some feedback on what you guys are saying in the interviews are these Python interview questions relatable to your job search and I will talk to you guys later bye [Music]
Info
Channel: Data Science Jay
Views: 29,619
Rating: undefined out of 5
Keywords: python data science, data science, python data science interviews, python data scientist, data science python, data science interview questions, data science interview questions and answers, interview query, data science jay, data science python interview, python interview questions, python data science interview, python data science interview questions and answers, python data science interview questions, data scientist vs software engineer, software engineer python
Id: eNt-IvCi7a0
Channel Id: undefined
Length: 15min 53sec (953 seconds)
Published: Sun Jul 12 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.