Data Scientist vs Data Analyst vs Data Engineer: What's the difference?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to another recall by data aku video today we are going to talk about the difference between a data scientist a data engineer and a data analyst [Music] many people get confused as to what job they want because it's not always clear what kind of work they'll do to make things even more confusing many companies have different definitions of what a data scientist is the only way to know exactly what you're applying for is to look at the actual job description so that you actually know day-to-day what you're going to be doing so i'm going to make things a bit more clear today by explaining the differences using this illustration by monica rogatti before i talk about the different job positions let's talk about what we actually use data for what this hierarchy of needs illustration is saying is that if you can't even collect data properly then there's no point in working on ai or deep learning ai won't magically solve everything and your business probably doesn't need ai to improve itself there are way more low-hanging fruits once you're able to collect data for your business you then have to store them for example software engineers might have logging that looks like this not bad it's data but to be able to do anything with it you have to move it and store it that can be in relational databases in csv files it doesn't matter the fact is you're gonna have to write these data pipelines to move data from one place to another especially once you have a ton of data this becomes a highly complex distributed system problem the people who work on this should be really good at distributed systems so usually they're called either software engineers or data engineers so the code you write is not perfect you're bound to get some weird results so this is where dataku can come in data eq can help you explore your data sets and create nodes to clean them up for example we've seen bugs where users on our app were spending 25 hours per day on the app which is impossible so data engineers will continuously work on transforming the data cleaning up the data so that it's actually usable and queryable if your business uses data eq you'd be using their data preparation features where visually you can connect to your data sources join them aggregate them and de-duplicate them and clean them now hopefully if we did all the previous steps properly anyone in your company can now query that data that's why sql is so useful because it's such an easy language and it's kind of like the standard language to use to query the data from our databases big companies like facebook have their own internal tools to query and visualize the data i don't work there anymore but their tool was very similar to what data aku built basically you can query data and get quick visualization so you can understand your data just with a few clicks now we have data analysts business analysts pms product managers software engineers they can all query that data easily and answer questions like how many users have used my feature in the last week which is very important to know you can already have a lot of impact to your business just by querying that data and being able to answer these questions and make product decisions based off of that data and most companies that's all you need you don't need to go further than that we can do so much with this data now especially if the data is clean and actually useful because we built such a great backbone we can now build on top of it in a b testing framework and this framework is an important tool for businesses to be able to know exactly what features to build and what incremental changes it has to the product for example if i have a like button and i want to change the color to blue and i'm curious to see if people will click it more well now you can with a b testing you can also run simple linear regressions to predict your user behaviors and maybe build features around this data eq has it all integrated so you can build machine learning models pretty easily and choose what features you want for your model with a few clicks now if you want to do deep learning or ai then you need that clean data which if we go back a few steps we see that it's imperative that we properly selected and labeled the training data we also have to make sure we identified the features properly and if truly the simple ml algorithms like linear regressions don't cut it then you could think about ai and deep learning to improve your product okay now let's look at this with a bird's eye view where do data scientists data engineers and data analysts fit in so commonly data engineers would be working on these areas so explore transform move store collect software engineers mostly do the collect part since it's usually implemented on like the front-end side and a little bit of the back-end you know because that's where you collect the user data data analysts most commonly work in this aggregate level part where they have a very important job of interpreting the data and aggregating in a way where you can make decisions based on the results for your business a very good analyst will be able to come up with a strategy and a direction for a company or the product or the feature depends how big your company is they're technical but they also have product intuition and they have amazing communication skills because you have to communicate that insight to the rest of the company in my experience many companies call data analysts data scientist nowadays i'll be frank in general data scientists are paid more because they usually require a more technical background however i've seen many companies use their data scientists to do data analyst work because it is that vital for the company so they get their smartest data people to work on that data scientists can also work on anything above this so building ml algorithms and up to ai and building deep learning models though most of the time nowadays they're called research scientists and they're supported by ml engineers to build out the system they need complex projects usually require phd candidates because they're they have a specialized knowledge in some companies the roles are blurred i worked at google where software engineers would do everything there since some teams aren't that big so the engineers have to do product analysis work so they would have to query data to understand what products and features would have the most impact on their work and then sometimes they would also have to build the machine learning model and then the same software engineers would push the model out but then also do a b testings to make sure that that model is performing better see if it hurts anything see if anything improves et cetera et cetera and then if it's better in every way then we push it live so next time when you see a job position read the description and see what it sounds like in terms of the hierarchy of needs for data and that will help you determine if that position is right for you [Music]
Info
Channel: Recall by Dataiku
Views: 416,187
Rating: undefined out of 5
Keywords:
Id: XW0YptcgZSk
Channel Id: undefined
Length: 6min 57sec (417 seconds)
Published: Fri Apr 29 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.