How do I start my career in Data Science?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone today I'm going my videos and you know that my usual videos have a lot of coding involved and in this video there is no coding there is a lot of talking so this is something new that I'm trying if you like it to comment this time I'm just going to talk as I said and it's because I get a lot of questions around how to become a data scientist I get so many similar questions that it's not possible for me to answer each and every one of them individually so I asked LinkedIn and Twitter to send me questions and I will try to answer them in future videos I was overwhelmed by the response the people gave me I got like five hundred more than 500 questions in less than 24 hours I cannot answer all of them in one video but I will try to answer each of them in a few videos that I will upload over time starting today so the most common questions like most common question is the Learning Path everybody wants to know about the learning paths learning path for becoming a data scientist obviously when I'm talking about it so where do I start from in this video I'm going to cover questions like what kind of courses need to be learned from becoming a data scientist should you do MBA or missing data scientist data science how does or does PhD even help getting into data science things like this and also some things like how much of stats should you know and how to find like how to try to get the first internship in data science something so these these kind of things I will be talking about today so basically for learning data science you are essentially learning machine learning and there are many different courses available online that you can take so for me personally I found that courses from Andrew and to be quite good and he's explaining the most difficult things and like very simple basic way and I haven't been able to find more simple explanations to stuff than him so I always say that certifications of these courses are not important but if you but you have to remember that certifications are a way to show your appreciation to your instructional instructors for these courses and the courses from Andrew hang are the ones I would really show appreciation for and then there are many other courses that you can also take in parallel so for example the course from cattle cattle learn so it teaches you machine learning with coding so I've seen some exercise like you have to read a file now believe me a lot of people don't even know how to read a CSV file but they want to be a scientist and this is like a yeah you need to improve on those kind of skills it's a for example the course is from cattle learn will teach you data science machine learning with coding which is quite good so if you're given a task and asked to code you can you would be able to do that the same is true for most of the courses from from Andrew hang and yes you need to solve exercises you should also know that these solutions to these exercises are already available online it's on github it's everywhere but you have to you have to make sure that you do these exercises honestly so without looking at these solutions and it doesn't matter how much how many tries it takes you how many times you fail these exercises in the entry will pass so you just have to make sure that you don't look at the solutions you just do it yourself and you are only going to learn if you attempt and solve these exercises on your own and you can also find a lot of different free courses on a number of different channels on YouTube and fasciae is also free course which is pretty good just have to keep in mind that any course you do don't do it just for the application do it to understand how things work and when you understand that then it's beneficial otherwise it's not so beneficial if you can just do applications then it's not very beneficial some courses will also teach you about applications right away but people drop off during the basic and under the hood applications so make sure you don't skip that so if you if you're doing it then do all parts of it not just okay I found one applicants oughta me about this application I learned that okay this is very cool now I'm just going to use it don't don't be like that try to finish the courses and there are many courses which are not free and many companies have now made it a business to sell you these kind of courses so be aware of them and be aware of them and talk to the people who took these courses and find out on your own their views about the courses so always read the reviews before reading these kind of courses and if you have if you're taking some paid courses and always write honest reviews in the end to help others and that's going to take a lot of courses course business from the market I think so some certification tab I've said this a lot of times about certification so some certifications matter most do not matter so you the best way to show your knowledge in machine learning or data science is by doing projects not the Titanic ones I mean of course doing that but not sharing about Titanic or maybe even the Titanic one if you can approach it in such a way that has not been done by anyone yet but if you're building a random forest model on some features like thousands of other people and then it's probably too best to keep this project to yourself for your future reference don't try and share it I mean it's no value sharing the same thing over and over again right and you can always open source any project that you do on github it gives you a lot of exposure but if you have a lot of bad code on github it may work against you so try to keep that in mind if you're sharing something try to share it in such a way that people can use it so don't try to copy from others so if you end up copying chunks of code from other sources provide references for example if you copy a Python function from Stack Overflow you can provide link to that Stack Overflow discussion as comment in the function it's as simple as that and it doesn't matter if you're like if you're doing your bachelor's or masters of PhD if you're a beginner with machine learning and want to do something in that area start with a course as soon as possible if your university or college has courses on machine learning and data science data mining go and try to attend them who is stopping you and even if someone is stopping you attend and learn as long as you're stopped so doing an advanced degrees it's masters PhD and postdoc in machine learning computer science data science depends it depends on a lot of factors a lot of different guys first of all you have to have some money to go and do masters from United States it's so expensive right or get into different colleges or you should get some scholarship it also depends on what you want to do in future so if you want to go into research you probably need a PhD and maybe even a postdoc if you want to be a researcher if you want to go to research institutes or if you want to be an industry doing a lot of research you do you need a PhD and if you want to get a job in an industry you can probably get away with a master's degree I've seen a lot of PhD people struggling when they join an industry it's because the way PhD guys work it's it's quite different than working in industry and I'm not saying I'm not generalizing it I'm just saying most of them so many of them are good too and if you're doing a PhD and one to stop research and go to core data science and machine learning industries then you should start with improving your coding skills your thesis doesn't have to be in line with the industry but if you get a choice of doing so you should take it because it might end up being useful and also helpful for future and a lot of people also ask like how can they learn different algorithms when they don't even know the names so one really simple trick would be if you see this cheat sheet from socket learn so it looks something like like this you remember this right and a lot of Jane has things have changed in machine learning since this cheat sheet was published but the concepts are still working so if you follow this path for example this cheat sheet it says that if you have more than a few hundred thousand samples try as GD regressor now you don't know what as GD regressor is so you google it worry about it go back to scikit-learn find the documentation of SGD regressor and the documentation comes with clear examples so take a look at those examples find similar data sets by more googling apply the SGD regressor yourself on those data set and see how they perform so that's one of the things there is another way of learning but you have to so this this what I just told you is another way of learning but you have to keep in mind that this can and should only be done alongside some course that you are doing that helps you learn the theory if not learned with the theory more and more by doing a lot of Google search before going into a CD regressor if you google what predicting a quantity is you will come to know that it's about regression so read up on regression you will find terms like classification error squared error linear regression blah blah blah and google them and read more about them see how they are implemented and implement them on your own whenever possible or wherever possible and many people these days also jump into neural networks without even looking at traditional approaches which you have to remember that there are many industries that depend on traditional algorithms and statistical approaches you can't just ignore them and these one basis everything and you will end up using many concepts you learned by learning these approaches and examples and algorithms and four stat for learning statistics my idea is very simple and I apply it to all different kinds of cases I have been doing that for many years now learn as you go so whatever I have learned so far I've learned using this trick and I still keep doing that when you start with traditional algorithms and model evaluation you will come across a lot of different statistical terms and then you should start digging into them you should for example know what p-value is what correlation is standard deviation variance linear regression the list is huge and there's a lot of different common terms which you will have to use in day-to-day life right not day to day but share from time to time and you should learn these things that if you like want to go into depth there is a book called elements of statistical learning you can try reading that and as you keep learning you should also start applying it's very important to learn the applications I've seen a lot of people who could with theoretical concepts but when it comes to applied machine learning they really fail bad and this has been one of my motivations for starting this YouTube channel and I focus only on applied not vertical and you might find a bit difficult in the beginning to deal with theory and application at the same time but as the time progresses it will get easier and you will get used to it and to find projects to work on is quite easy you can get any kind of data that you want to work on by doing simple google search you can also go to machine learning competition websites like gaggle or driven data to find problems related to what you have already learned and you should take upon these problems and create a baseline solution first and then as you learn more you should take upon more more and more problems and and also try to improve what you have already done for example if you have only learned about logistic regression try to apply logic regression in on classification problems and then improve by tuning the hyper parameters manually and some feature engineering and when you learn about random for us at some point try to replace lost a progression in your old model by random forests and see what happens does it improve the results does it not improve the results ask questions to yourself before asking others and if you don't find answers to the question you you must ask others so you can ask quotients or your doubts and depending on what they are like in discussion forum somewhere or Stack Overflow reddit Twitter Linkedin whatever you prefer but you have to make sure that you have tried your best before asking these questions and you also need to make sure you isolate part of the problem that you don't understand then face problems with before you ask some questions so every day I get a lot of questions from people who were working on problems different kinds of problems and I try my best to respond to most of them but sometimes I don't I cannot because person asking the question failed to explain what they did and this happens a lot too this happens a lot of people who are ask questions too so if you fail to explain what you did and do not ask specific questions you might not get an answer at all if you ask in a similar way on public forums people will troll you but if you were specific and tell people what you did you will get an answer for sure so link I asked this question on Stack Overflow six years ago and at the time I was I was also learning I was pretty new and I didn't know that you could do it so easily so now if I look at it it was so easy right but I knew that must be related to something like group buy-in pandas and okay so I try to give all the information make a simple example what I want and how the data looks like it's not even the original data I have to reformulate it and then I got nice answers I learned a lot from them so don't be afraid to ask questions no questions are stupid if you've done proper research yourself so when you do a project you are not the projecting when you do a project share it with others try open sourcing your code on github and write articles about it and should you know that you can create your own blog for free using github pages it's pretty easy take a look get up pages Google how to create my own block and you don't have to then you don't have to depend on and share on other media so sharing your articles on social media in different machine learning groups and you're on Twitter and LinkedIn but share work which is your own and different it doesn't have to be completely a new idea and it's also about how you present and sell your work I don't want to read yet another random for us being applied to the same old Titanic data set and what's so interesting about that so thing what would be interesting to the reader can you present the application of random forests on Titanic dataset in a much more interesting way if you can then go ahead and people will read about it and one more thing that I have always seen is people struggling dream tensorflow and paya Tosh and a lot of questions people has a lot of questions around that so do we really need to worry about what we learned if we learn them perfectly it doesn't matter what you use as long as you understand it I personally prefer PI torch you've given a choice between tensorflow and PI torch because I find it intuitive and much more easier to grasp tensor flow is something I won't use if I didn't have cars right intensive 42.0 is probably much much more closer to PI torsional but what if you get a job in a company so let's say you have been learning by heart you quite good in it well what if you get a job in a company where the company has been using tons of low for years now right you have to use tensor flow lissa then you have to learn so it's good to learn and know basics of different frameworks and be proficient then at least one of them so it totally depends on you what you choose so when you have done a few projects so I've been talking a lot of projects because projects matter really a lot you have to learn if you learn something apply it when you have done a lot of few projects you should expose yourself to the world by let's say writing good quality articles blogs open sourcing your code you will automatically come onto the radar of recruiters and people who are reading your pose and want to hire someone you so they will they will start learning about you and learning about what should what you do how you did the project and a lot of things and imagine if you do only one good project you take one kind of data set and you work on it for one entire month that's it but you have to work on that it has at one entire month and you will have 12 projects in one year if you do that if you're still studying then you have a lot of time and you can definitely do one project per month it's not a big deal even if you're working you can do one for the government it's not a big deal but by the time you finish your studies or after several months you will have a good solid resume and your chances of getting finding a job or internship is going to be much higher and there is no recipe as to how you can land a internship or job in data science but if you follow the project first approach I'm sure you're going to find something that suits you sooner than by not following the project first approach and only learning the theory theory we can take you up to a certain level but after that you need to learn applications and yet don't you know the theory you have to learn it for finding jobs and internships there are many jobs websites you can take a look at from time to time I am NOT going to talk about them obviously and one thing I've also find very useful is attending the meetups you can go to machine learning or designs made up in the city where you live and you you will get to learn something new and you will make a lot of new friends and so it's all about being social introducing your network getting know people getting letting people know that you're looking for something and you have you been working with different projects and when they know you it makes your chances higher when you finding a new role if you don't have a machine learning meter per day designs meter but you know that there are people who would be interested in such a thing why don't you start one it's as simple as that and I think that's all I want to say to cover how to start learning data and I also went towards how to try to find a job but I will do something in more detail later there are still a lot of questions which have not been able to answer and I will do that in future videos and if you if you like this kind of format - let me know by pressing the like button and subscribe if you haven't yet and you will get notified of new videos as soon as they are uploaded and I yeah I'm not going away from coding yeah I will always be coding and there will be a lot of new coding videos coming in very soon ok then see you good bye
Info
Channel: Abhishek Thakur
Views: 24,121
Rating: 4.9515152 out of 5
Keywords: machine learning, deep learning, artificial intelligence, data science, how to learn machine learning, how to learn data science, how to become a data scientist, coursera
Id: BFFM1JRo14E
Channel Id: undefined
Length: 22min 33sec (1353 seconds)
Published: Sat May 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.