6 Regrets From My First Year As A Data Scientist

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I am rapidly approaching one year as a full-time data scientist still feels crazy to say a man if I made exponential gains from being self-taught to do my masters this last year as an actual in the field data scientist has been off the charts growth and man I wish I could go back to 2021 or 2022 Nash and tell him hey these are the things that are going to be important when you're actually a data scientist it's not just about uni then I would but I don't have a time machine so instead here I am telling you six things that I wish I had focused on more that would have prepared me much better for my first year as an actual data scientist and I tried to make sure that these six things are transferable so it doesn't matter whether you're in a startup or you're working at a massive Fan Company these elements can still be applied to all of those different settings so here goes six regrets from my first year as a data scientist okay so let's start off with something fairly light because this is something that you could learn over the course of one day or honestly the basics even in a few hours and that is to learn GitHub if you're anything like me when you first begin as a data scientist you feel sort of like a solo Adventurer just hacking your way through the data science forest and in fact you actually feel overwhelmed at times because of how much stuff you have to learn so like when you first start off you learn how to clean your data then you watch this YouTube video then they're like oh no learn predictive modeling then you read this article then they're like no actually all you need to know is clustering and then you watch this video and they're like actually NLP is where it's at so you get stuck in this constant Loop of learning everything to a surface level level so in these situations the last thing you care about is GitHub Version Control it's like I have enough on my plate I don't need to learn that right now however I can say that when I was learning I always knew in the back of my mind that yes you can't be a data scientist who doesn't know GitHub and that one day I would have to pick this up but I kept pushing it off and even when I started working I could kind of get away with it because it is a startup so what this means is that whenever there was a data project it comes into the request for the data science department then you grab your request and do your whole data science project and present that back to the stakeholder so because you're working in isolation and you have the stress of trying to make this data project as perfect as possible and give it to them in a quick turnaround time Version Control still wasn't at the top of my list so it wasn't until I'd been working on this coding project where it was actually being developed into a tool that the whole company was used and the tech developers were like hey where is the code on GitHub we need access to it so we can actually build something on top of this and I was like I could send you my python file that's titled super important code final final final final final version and come on at that point guys it's just not a good look so I was forced to learn GitHub and honestly it doesn't take that long one of the other interns sat down with me for like an hour and literally had the basics of how to upload merge pool and that sort of thing and I can't imagine how I ever did without it like making little changes locally and trying to keep a track of everything and what worked or what didn't was gonna be changed just learn GitHub it's four hours of your time you can do it okay that one was a little bit light so let's move on to something a little bit weightier and of these six this is probably the thing I did most wrong which is focusing too much on data science or learning data science in a vacuum this is one of those things where when you're in the learning stage it is extremely difficult to communicate the value of this to you because all you'll ever hear is learn this learn that learn this one Uber specific Library otherwise if you don't know this you're a failure or just leads to you having such a wide Tech stack but with zero depth but while yes obviously your programming and mathematical skills are super important probably the most important almost no matter where you work the job of a data scientist is to help the business make more money by running more efficiently or whatever which means that you're working in the context of a business which means that your domain or business knowledge is just as important because somebody is paying you to achieve a specific result using data that will help their business so we're on this journey together so one thing that I'm doing as a data scientist to improve this element for myself is picking a domain and beginning to mold my portfolio and CV to cater for this and ideally perhaps you could argue that being hyper focused and picking one domain is ideal and maybe that's correct but I still feel I'm too early to narrow it down that far so I have three specific domains that I know and that I want to work as a data scientist in so now having narrowed down my domains I can focus on acquire firing broader knowledge in these regions and how data can be used to provide optimal results for businesses in these areas so you become a mini expert in these domains and perhaps this next point is even more important and I really regret not having learned this before becoming a data scientist is having a rigorous and disciplined template by which to conduct any data science project so in simple terms understanding the data science process from end to end very clearly see when I was a student everything was a lot simpler your professor would give you a specific data set and even if that data set wasn't clean it was already in a tabular format and the cleaning wasn't too tough there would also be a clear problem statement and for the most part they'll tell you exactly what technique they want you to apply because they want you to know the fundamentals so your approach is basically laid out for you and if your University uses something like code grade you can even tweak your code to make sure that it passes all the different stages before you submit to get your score a little bit higher but of course that's not how the real world works so of course when I was now working whenever I'd get a brief about a data project I would instantly be scrambling what data to have available where do I Source this data what techniques did I use which was the correct technique and this would just lead to massive inefficiencies and maybe the most important thing I could have learned is the importance of having a clear acceptance criteria a clear this is good enough stop now don't spend the next week trying to get your model 0.1 percent better because that's not how business value is generated at least at the startup level in real life it wasn't until I read this book think like a data scientist that it became crystallized in my mind the holistic data science process and how it could be so much more efficient in formulating the problem instantly so that there was less wasted energy and everything was directed at trying to solve this problem and if you want me to do like a little summary of the main points of that book just drop a comment and hey whilst you're down there a little like you know right number four out of six and this might be the most important yes all six might be the most important and this is a skill that I used to be pretty bad at but as I got into self-development I improved this specific skill but if I had this skill at an elite level I would be much further along in my career even already and that is becoming not only an effective communicator but an engaging Communicator see this is a piece of advice that is floated around the data Community from time to time and usually the focus is on being able to not use jargon to make what you're doing understandable to a Layman which of course that's fundamental level I mean if somebody is not data oriented they don't care about your r squared value or the difference between the accuracy and the Precision of your confusion Matrix they care about the Insight this data can give them and probably help them to improve their business which obviously leads to more revenue and yeah no arguments here super important to understand that but I think there's a level beyond that and I began to notice this towards the end of my degree halfway through the first year everybody was applied for jobs and you know what it's like being in a classroom you have a rough maybe not that accurate but still a rough idea of how good everybody is at coding and maths and the hard data science skills so before I would have thought that the correlation between those data skills and how long it took somebody to get a job were pretty directly correlated but when it came to job application season at least from the few people who I roughly knew on my course I quickly began to notice that it wasn't just the people who had strong data skills that were getting the jobs quickly there was those people that had the combination of data skills and were really good communicators or just like interesting to talk to nothing fancy just when you have a conversation with them they're engaged and you're interested in what they're saying now don't get me wrong if you think you're just gonna have a good mouthpiece and that's enough to make you a great data scientist you're gonna have a bad time okay you still lead your fundamentals but all I'm saying is that being an elite engaging Communicator is like having an ace up your sleeve you can make people much more interested in your results when you're on the job but more importantly in the interview sections you can communicate to the interviewers why your data skills will help to solve their problems so take the time to at the bare minimum just be a competent communicator and this is something that I'm still very much working on I'm not saying I'm great at any of these things I've listed so how I'm working on them is this making videos trying to see if I can become more engaging as well as taking courses on like skillshare on how to write effectively how to present effectively like I mentioned in my becoming Elite in 2023 video now let's move on to the last two okay this one I know we can all relate to especially all of us newbie data scientists we can relate to this even if you just done one project in your spare time and that is giving unrealistic deadlines now it's not that I ever did this on purpose but if you've ever done even one data size project you know it's almost impossible to predict how long it's going to take you just because a lot of data scientists trial and error and trying to figure out the right approach to a specific problem and that's made even worse in the real world when you have to perhaps Source your own data or you're given data that's very unstructured and then you have to start interpreting it but when I was a newbie starter first time working in a data environment when people would come to me with a data project I would be I would be eager to impress and I would have no clue how long a specific project would would take me but when people would ask for a deadline I'd always be like three days three days just thinking that okay I can grind it out and how hard could it be how hard could that project be I could probably figure it out in five days right but now all of a sudden I was in a rush against a deadline with a stakeholder who maybe didn't even want it that soon but now what I've done is raised their expectations that they this thing is going to be with them by Friday when maybe they didn't even need it for like two weeks and now if I miss that deadline it comes off as a disappointment when instead I could have given myself a little more room to maneuver tell them two weeks and give it to them in that one week instead so ma'am it goes back to that old adage under promise over deliver right okay this one this one might actually be the most important and this regret of mine is something that you can act on instantly so my mistake was not keeping up with the wider industry of data science and really immersing myself in the world of data science outside of my job see when you finally land their first job it can become very tempting to become hyper fixated on what you do on a day-to-day but honestly this is a recipe for becoming quite stagnant just by not learning how other people in your industry approach problems and not just approaching problems just learning about what's going on within the wider industry actually think about it this way and this is going to be quite a niche example so I will try to leave annotations on screen just because I know this might become a little bit jumbled but imagine you wanted to start learning how to play tennis and all you had was a few balls a tennis racket and a war and let's say a book with the fundamentals about stroke technique and in this example you've never played a real match you've never watched anybody else play a match and you've never had a coach so with everything that you have you can learn tennis the tennis book is the equivalent of having tutorials and reading data science books the war the ball and the racket is the equivalent of having your laptop with python installed and Sequel and all of that lovely stuff that's all you need to get started and sure at the start you're going to make rapid progress drilling your forehand your backhand against the wall but now when you enter a tournament against other people you're entering the real world you're now getting real world data and where you are used to just playing with the wall now you're seeing that players have different styles there's players who hit with a lot of power those who who love to hit with a lot of spin so now in these situations if you've never watched a tennis match or even you've never watched the pros playing that means you've never seen how other people have approached the same problem of beating this specific type of player which has different ways of approaching a specific data problem you wouldn't know that oh maybe against this guy who hits hard maybe I should return it a little bit softer so that they have to use all their power to generate the speed on their shot which would tie them out and all of these sort of heuristic tactical things that you pick up just by knowing the wider scope of tennis by watching what other people have done which is only acquired by learning your domain and The Wider context that textbook which had all the tennis Strokes it might have been an old book from the 1970s which is the equivalent of an old data science book from maybe I don't know 2015. that old book from 2015 might not have accounted for it emerging Technologies there's now different libraries that can solve specific problems that they labored on within that book or AI can help you in this specific element this is such a bad example but I know one person out there loves tennis and is like oh perfect example but bottom line if that example got away from you what I'm saying is just keep up with what's happening within the industry a simple way that I do this on Twitter follow a bunch of data people another way I do this is signing up to medium you literally get articles emailed to you every day and most of them are not applicable kind of useless to my use case but at least a couple of Articles every week genuinely open my eyes to a new element of data science and of course follow data science people on YouTube as well here are a few of my favorites but yeah those are six of the highest value regrets that I have so far after my first year as a data scientist and if you want to know what I actually learned when I was a student during my masters click on this video here where I covered literally everything that we had to learn and if you're new around here I am data Nash just a guy documenting his data science Journey from being a complete Noob as you can tell to hopefully one day being pretty Elite at data science so if you want to join that Journey grow with me or let me be your guinea pig to see if it's worth getting into data science hit subscribe I'll see you in the next one peace
Info
Channel: Data Nash
Views: 22,886
Rating: undefined out of 5
Keywords: data science, data analytics, data science job, data engineering, tina huang, study md, ali abdaal, ken jee, data science tips, 4 regrets from my 20s, captain sinbad, sundhas khalid data science, everything you should know before studying data science
Id: oGy5eLXO4Hw
Channel Id: undefined
Length: 15min 19sec (919 seconds)
Published: Sat May 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.