Week 3 Data Scientist Versus Statistician

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is Brian capo and welcome to the ask for I am part of our weekly newsletter so today I'm going to answer what is easily the most frequent question that I get whenever I do something like this which is to somehow define or differentiate a data scientist from a statistician a computer scientist a data engineer and all these other fields so being a new field data scientist is kind of an undefined the job description at this point and they're they're sort of bridge researchers in the sense that you know I like to use dungeons dungeons and dragons as an analogy you know so that so there were wizards and dungeons and dragons as the Wizards cast spells and there were Knights and dungeons and dragons and knights would fight you with a mace or a sword or whatever and the wizards were great at casting spells but if they ran across something that that you couldn't cast a spell on and the wizards were next to useless knights who are great at fighting but if they ran across something that you couldn't swing a sword or a mace at they were you like a ghost right like they were useless so there were these things called paladins right paladin's were sort of half wizard half knight they could do both so and of course if you're like a Dungeons & Dragons maniac obviously I'm getting this wrong so don't email and let me know I acknowledge that um but I haven't said that so the paladin's in my mind at least are sort of like what I think of a data scientist the first thing I would say is that data scientist tends to be like a bridge researcher they have skills from Statistics they have skills from computer science they have skills from data engineering and so on so they they tend to sit more in the middle of these fields and kind of a second aspect of being a data scientist is the content that they pull from these fields tends to be the more practical aspects of it so when they know some statistics they tend to know the the core parts of applied statistics and methodological statistics and maybe a little bit of theory but but only kind of the component of theory that's actually very useful in a day to day practice and they tend to know that the components of computer science that are relevant for things like data engineering and algorithms and pro ramming but they would know less about kind of you know a bigger little o type type of you know or type type things for for computer science so I think there's an element of just by virtue of being general data oriented problem solvers that what they tend to do is bring in the most practical aspects of these other fields and know them I would also say that data scientists in terms of their differentiation between statistics is statisticians and I would firmly classify myself as a statistician my training is and statistics I'm a professor of biostatistics so I'm firmly in the statistics camp one way in which I think how data scientists tend to think about statistics and statisticians tend to think about statistics is data scientists really like to sort of live in the data a little bit more whereas statisticians also work in the data of course but but they also really try to connect the data to to a population or something like that with a conceptual model so statisticians tend to think a lot more in terms of the conceptual model and things like assumptions and they tend to be more comfortable with with that aspect of model building and I think that's why in sort of data science world very data oriented tools like bootstrapping and permutation testing things like that are highly preferred and and and machine learning for example is very data science II because the way you tend to evaluate the performance of machine learning is using stuff like cross-validation and data splitting and trying it on a new data set which is a different way to get at statistical concepts like avoiding overfitting and generalizability without having to make a lot of statistical assumptions like ID draws from population and that sort of thing so I do notice that in terms of differentiating data scientists from statisticians is that they do tend to like to stay inside the data as much as possible now having said that it's data scientists you know know tend to know a lot of statistics they tend to understand you know things like those sorts of assumptions that go into statistical modeling but I think if you're a statistician you're going to in that kind of modeling world a lot more so in terms the question you know it's kind of a subpart of this question I get is kind of professionally what's the difference I think you know a person who is a data scientist is often going to be hired as a general data oriented problem solver and the term can mean quite a quite a bit quite many different things depending on who's doing the hiring on one end you can have people who want data scientists that are much closer to what I would describe its data engineers what they really want are people who can do big complicated data merges set up databases and that sort of thing ok that's one in another end is they might have people that are data analysts and data analysts have a very different set of skills they don't worry so much about the creation of the data structures and in the the ways in which the data is is collected and organized but care more about given those structures how do we analyze it and extract useful information so I you know I think depending on how the position is advertised and who's doing the hiring you might be more towards a data analyst or you might be more towards the data engineer or they might want you to do both some of these smaller places can't afford to have separate people wizards and knights Wizards for wizard things and nice for nice things they need some paladin's because the those people are going to have to solve many problems having having said that what I've noticed from my friends in industry and I you know I never had a real job of only been a professor my friends in industry that talked to me I think many organizations need probably more data engineers than they need data analysts so but that those couple of data analysts tend to be very important they tend to have a sort of outsized relevance but often they they need fewer of them so in some cases I've heard you know phrases like five to one or ten to one or something like that where they need the one data analyst type person but maybe need five data engineering type person and people and I think that makes sense because I when they're talking about data engineers they're also including some amount of hardware and systems administration stuff that goes into the to the create the collection of the collection and organization of the data so I hope this is useful again I think a lot of this is highly dependent on who's doing the hiring but and as a field data science hasn't really annealed into a very perfectly well-defined discipline but if you want a little bit more about how to manage data scientist and how to think about these things we have a whole specialization on Coursera called executive data science we're really kind of talked about these concepts and details in in detail and you know kind of go over them you know all the various sub components we go over in much greater detail than I'm going through here okay well I look forward to seeing you next week and remember to submit your questions and subscribe I'll try and do one of these every week [Music] [Music]
Info
Channel: Brian Caffo
Views: 13,802
Rating: 4.9748425 out of 5
Keywords:
Id: oo4bYB8J5js
Channel Id: undefined
Length: 7min 45sec (465 seconds)
Published: Fri Feb 17 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.