What it’s like being a data scientist

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys how's it going so as many of you guys may know i'm a data scientist and i've been a data scientist working in tech for a bit over a year now i previously made videos about why i think you should be a data scientist as well as reasons why you should not be a data scientist and i will link those videos somewhere over here in this video though it's going to be a little bit more personal it's going to be one part reflection and another part what i wish i had learned and also some of the lessons i've learned along the way and trust me there have been many many lessons that i have learned along the way just looking back now is actually crazy how much i think i've grown over that past year how many misconceptions i had and how they cleared up and how much i think i've progressed as a data scientist of course a disclaimer i have to make here is that some of it is not going to just be as a data scientist but it's also very much a function of the industry i work in as well as the company that i work in all right let's go number one your ability to learn is your greatest asset after you do your interview and you get hired it's really not about what you know anymore it's not about how much coding you know how much stats how much machine learning or whatever it's about how fast you're able to figure out what it is that you need to learn learn that thing and then implement it okay so let me explain why you see data science is a very interdisciplinary field it's also an extremely new field that's very very fast-paced with new things being discovered and new things being built every single week data science is also a very discovery focused field like it's literally your job to use data to figure things out so what that essentially implies is that whatever it is that you learned in school whatever it is that they even asked you in your interview is not going to directly translate to what it is that you need to do at your job what the hiring process makes sure of is that you have the baseline amount of skills and then they kind of trust that you're able to then use with that baseline amount of skills be able to figure out what else it is that you need to learn and learn those things to then implement them in your job i'm a data scientist who is product focused and what that essentially means is that my job is to figure out how to make a product better and detect any risk and mitigate any harms and what that really translates to is essentially make product better not make it worse especially if you account for the fact that part of a data scientist job is to actually figure out what good or better versus bad even means i hope you can get a sense for really how vague and hazy it really is so with my skill set in computer science statistics products science business science data science a little bit of machine learning i use maybe like 10 of that out of the box from what i learned in school and the rest of it is really just figuring out whatever it is that i need to figure out learning it as quickly as possible and then implementing them in my job for example i've done analyses that allowed me to form a hypothesis for how it is that we can take our product but i don't have any data well then part of my job is to figure out how to actually do the data collection how to conduct proper experiments in a specific domain and how to perform analyses on it when i first started off at this job i honestly knew like 20 i would say of how any of this works and the rest of it i just really picked up on the job another big one for me was learning how to present to stakeholders and do product and business leads for example i've done some analyses and it allowed me to discover some things that i think are important but at that time i didn't know how to actually present this to people in such a way that they would listen to my advice okay um today we're going to be talking about powerpoint so it took me a while and a lot of mentorship from people that i'm extremely grateful for to have mentored me what i have to do first to figure out what these product and business leads actually care about and how to kind of structure the results of my analysis in a way that they understand and it's easy for them to digest i really come a long way with this in the beginning i would just like put all these technical stuff on the slide and and just like ramble about statistics and that was definitely not the right approach another example so i've done a little bit of modeling and forecasting as part of my undergraduate and my master's degree but really modeling and forecasting can be very very specific i think to very specific domain areas so one i was essentially asked to forecast something in this specific way i literally had like no idea how to do it so the way that i learned is by reading some papers uh stealing some code and of course always defaulting to asking the guy with the math phd how to do it all in all the lesson that i learned here and what i wish i knew is to not put so much emphasis on knowledge itself if i had figured this out earlier i think i would have become a better data scientist faster and it would definitely have helped with my imposter syndrome okay quick little intermission where i want to introduce you guys to the sponsor of this segment of this video 365 data science 365 data science is an online learning platform where you can learn about data business analytics and of course data science some of you may also know that i partnered with 365 data science last year to create my first online course sequel for tech and data science interviews where i guide you through 10 full mock interviews back in march and april of 2020 365 data science provided free access to the entire content library at 365datascience.com as a result more than a hundred thousand people gain full access and watch between one and two million minutes of educational content per day which is pretty amazing they were able to help many people discover the field of data and data science for the very first time and the amount of engagement was just way more than what they expected so to cut to the chase they are bringing back the 100 free for one month initiative over the past year they've been working hard to rebrand and overhaul their entire platform so to celebrate they're going to again give free access to the entire platform all courses all exercises and exams the resume builder and all downloadable materials as well the 100 free for one month initiative is going to start on october 18th to november 18th all materials can be accessed at 365datascience.com everyone can join and enjoy 100 free access for 30 days no credit card required i think this is such an amazing free opportunity and i highly recommend that you check it out i know they've been working really hard to rebrand and build out the new platform so i'm also really really excited to check it out when their new website goes live all right back to the video number two is uncertainty so there's this joke that data scientists are usually the ones with the most like filler words and the most kind of like qualifying words for example we always say stuff like perhaps maybe it is a possibility it is likely it seems it appears i think and my favorite yes but you never know these days okay so let me give you an example this is not a real example but say that your job as a data scientist is to improve a e-commerce store so you're like okay i have all this data so let me try to figure out what variables are correlated with people who click more things on the ecommerce store and eventually buy stuff so you do your correlational analysis and you find that okay so these are the variables that are correlated with people purchasing things but you know the golden rule that correlation is not causation so you're like okay in order to actually establish a causal effect i have to conduct an experiment so in your analysis maybe you found that people who got more notification are those that go on to purchase more things so you decide to run an experiment to have a control group and have a test group where you send people more notifications so you do that and you find that there actually is a statistically significant difference between your control group and your test group at 95 confidence interval wait pop quiz time pause this video and answer in the comments below what does it actually mean to have a 95 confidence interval in layman terms so hopefully you know the answer to your question where you figured it out and you left the comment and then you present to your stakeholders okay it seems that sending these notifications is actually quite good and it seems that it increases the metric that we care about which is dollar spent for example but you never know these days because you know even when you do experiment like this you really can never be 100 sure about anything first of all you need to be okay living with that uncertainty make peace with that uncertainty because as a data scientist you are just very uncertain about everything the trick is to really know your statistics so that you're able to quantify your uncertainty and communicate that with other people with your stakeholders in a language that they understand so definitely don't just be like yep it's that sick and 95 confidence interval because like what does that even mean the third thing that i wish i knew about being a data scientist is the amount of ambiguity there is this is kind of related to the first thing i talked about in which you know they hire you based upon your baseline skill but you're kind of expected to just figure things out and then learn the things that you need to learn and then implement them so that you're able to do your job some anonymized examples of things that people have asked me what should we do for holiday season how would i know what's the sales projections how do we make the ranking better how do we decrease harm you you get them basically people often ask you very very vague things or they sometimes don't even know what it is that they want so the first part of your job is to actually figure out what it is that they actually want for example make the ranking better like what what ranking but it's your job to figure it out let's take the first example of what should we do this holiday season an example approach that you may take as a data scientist is to first be like okay what did we do last holiday season and then ask the question was that a good idea did it actually improve the metrics that we care about and then maybe you're like oh maybe it didn't maybe it wasn't very good at all so you might ask the question okay so what are other things that people have done in a similar space and answer the question were those good and then you kind of gather a bunch of different ideas that you can potentially do during the holiday season so you kind of do an opportunity sizing of how much you think you can improve your metrics and then you also have to figure out how much it's actually going to cost usually the cost is going to be in engineering hours because the engineers are going to be the ones that are responsible for building the thing that you want to try out so yep lesson learned here is that just like uncertainty you just kind of have to like live with that ambiguity and try to approach open-ended questions and open-ended things not by freaking out but just try trying to understand what that person is even asking you and then trying to find solutions in a very like open-ended research-based kind of way before narrowing it down and then quantifying it and making a recommendation to help people make a decision the fourth thing is that research and planning saves lives okay i'm exaggerating a little bit but seriously research and planning upfront please just do it there's a quote by brian tracy who's a productivity group and it goes every minute you spend in planning saves 10 minutes in execution i think in the case of data science especially if you work in a big company it's one minute planning is not 10 minutes saved it's like an hour we're two hours saved data science is super open source and you can look up a lot of source code online and especially if you work in a big company chances are whatever it is that you're trying to do uh somebody has already done something very very similar if not exactly the same thing that you're trying to do previously when i first started off as a data scientist i made the very big new mistake of somebody telling me to do something a task that i had to do and me just immediately starting to work on that task you know like grinding away doing the math writing the code doing all these things and i did that for like two weeks until i realized that there literally is a module that someone created that exactly solves the problem that i was trying to solve and i could have done that in two hours and the code was better than mine and it had covered more edge cases and it was just like generally way better than whatever it is that i wrote so lesson learned do your research do your planning really whenever it is that you are supposed to do something spend at least 10 of your anticipated time that you're going to be working on that project thinking and researching and planning about how to do it and chances are whatever it is that you're trying to do there's probably a simpler way that you can do it there's probably some way you can leverage somebody else's code or somebody else's methods the fifth thing i wish i knew about being a data scientist is to actively avoid burnout the mistake that i made is one that countless other people have made as well you start a new job and you really really really want to do well you want to impress your manager your imposter syndrome is super strong so you want to just try to like keep that at bay by working really really hard or even you just think that a problem is super interesting and you spend hours and hours grinding to solve a problem that is totally what i did and i totally burnt out seriously don't do it don't do it especially as a data scientist where so much of your job is super hazy not very well defined and a lot of it is about trying to think outside the box trying to like understand how things are and piecing things together in a very abstract way it takes up a lot of brain energy i learned the hard way that working as a data scientist really really isn't like school in school the project it is that you're supposed to do is clearly defined you exactly know what you have to do and when you have to do it so you know you can just kind of like go ham cram really hard do that and then you can just do nothing for two weeks afterwards because everything is so well planned out right like you finish one project and you know that's done and you move on to the next one and it's very clearly delineated but at work it's really not like that at all when you have one project when you finish that project chances are there's gonna be like improvements that people want you to do where you want to do yourself like you're constantly iterating and and just like improving your project and then also increasing the scope of your project the problems that you're likely solving are also very big and very complex so it's not like you can just sit there and solve that problem just cut and dry often times you discover better ways of doing things so what's really really important to avoid burnout is to treat it as a continuous process don't just see things as like an hour never kind of thing just you gotta think about it more in terms of like okay like how much energy can i actually expend how much can i recuperate and think more smartly about how you're able to last longer also taking time off when you need it trust me it's so not worth it to just burn yourself to the ground and then take time off because you would have saved more time and more energy if you just took the time off when you needed it and you come back refreshed and recuperated there's a reason why people burn out so much especially in tech so in order to make sure that it doesn't happen to you you really have to be very mindful and consciously avoid burnout i wish someone had told me that when i first started as a data scientist i worked out all right that's it for this video i hope it was a helpful video for you guys for me personally it was a really good way for me to reflect upon the past year and like three months and really think about how far it is that i've come so hopefully you guys don't make some of the mistakes that i made and hopefully learn things and figure things out faster than i did well have a great day and i will see you guys in the next livestream for video [Music]
Info
Channel: Tina Huang
Views: 84,103
Rating: undefined out of 5
Keywords: data science, data scientist, 365 data science, facebook data scientist, apple data scientist, amazon data scientist, netflix data scientist, google data scientist, what i learned as a data scientist, tech career, tech data scientist, tina huang
Id: uJE_nOIetgE
Channel Id: undefined
Length: 14min 14sec (854 seconds)
Published: Mon Oct 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.