Roadmap: How to Learn Machine Learning in 6 Months

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

One point that stuck out to me was that he actually solved an entire book :') Clearly I need to get more serious

👍︎︎ 10 👤︎︎ u/kushalsingh007 📅︎︎ Jul 08 2018 🗫︎ replies

I would gladly appreciate if someone can chalk out a plan like https://github.com/adam-golab/react-developer-roadmap/blob/master/roadmap.png for beginners to learn mI. I know that there exists countless wikis mentioning about various tutorials/courses to take, but not necessarily beats a visual mind map of how one should approach this field.

👍︎︎ 6 👤︎︎ u/sriharshasm 📅︎︎ Jul 08 2018 🗫︎ replies
Captions
and why'd you listen to me I started out as a sense getting a PhD in nuclear physics and I quickly realized that my career trajectory was falling along with a lot of my friends and that I was going to be transitioning from postdoc to postdoc postdoc fighting for grant money never getting grant money being sad all the time and generally just kind of following that path and so then I started doing some research around and saying okay what can I do with my degree that would be useful and not involve begging for money and so I found a the science and I started sort of digging into what do I need to know and actually to actually get into this and to be an effective data scientist and so before I did all that I worked on this but a mess of a machine which is called the star detector for those of you that are going to give me like 10 seconds a geek out on physics this is a place where we smash gold atoms together and recreate little mini Big Bang's and actually study the stuff that happens in the beginning of the Big Bang so that's your physics message for today and join you on that and so what I want to try to convince you is that there is like a six-month path to becoming a data scientist if you're willing to put in the time and that six months is a little flexible some people will need more some people will be less but I'm not going to lay out the tools that I picked up along the way and why I chose those tools to focus on so let's jump right in I'm going to get to the schedule at the end but first I'm going to tell you about each individual piece that I tried to put together but if there's only one thing that you remember from this talk so you can ignore me after the next slide if you really feel like it just build something with data and with a big caveat you're going to suck at it for a really long time so what I mean is that data science is a really hands-on sort of thing like you're going to learn more in two hours of messing around what this tutorial you found online but if I stood up here two hours and talk to you about some theory but the theory is important understanding what's happening behind the scenes is really important but at the end of the day when it comes down to is can I start with some weird data that I found online for instance the iris dataset nice call out there and then build out something that does something someone useful so that could be classifying flowers if you really in the powers that could be predicting box office from movies like these are all toy examples but they're really useful enables be good you can you're building something out with data so at risk of embarrassment let me show you my very first data project from about eight to ten years ago this is C++ none of you expected to see C++ today you're welcome again and my goal was to work with a program called J on for any of you know jams for my deepest sympathies I am glad that you're surviving and you're here and driving what involved was building out box after box after box of simulation material so I needed a bunch of water and I needed to be separate waters to track things and I knew nothing about code but I founded a tutorial online that said you can make a box by putting this thing and I knew I needed 100 boxes and these are the first 39 boxes I ran out of screen space or you would see the rest of it with some hindsight this is what the code would look like there's a slot for them that was very quickly introduced to spoilers soon after this project who twisted cow butt and this is the key thing if you look into my deepest which I don't recommend you do and you go through and you find the section on simulations the thing that actually made that was this nightmare right and that's the key thing that comes up over and over with data data projects sometimes your codes not any pretty you're always going to shoot for a pretty code that's always the goal but a lot of the times this nightmare works just as well if you're willing to put in the time so yes when you guys start digging into becoming a data scientist shoot for great code but at the end of the day if you're working on a data project the key is doesn't work so now I'm going to launch into a somewhat unfortunate truth about data science I'm going to let the next slide sort of speak for itself a lot of people don't want to get into the mat underneath and for me when I was trying to make this transition what I was trying to decide hey is data science something and do I found tutorial after tutorial after tutorial and I figured out what was what worked like how to copy and paste code how they call certain things but I never really put together why they worked and so I spent actually a lot of time later going back in and saying okay let's figure this out and I think that's a really key thing to do as you're trying to approach their design so let's take a couple minutes here look at the top left this is called clustering it's a type of unsupervised learning I'm not very good because the details of it but it's all based on calculating distances in this space and use that means that you have to understand calculating distance that means you have to know what a Manhattan distance is what do you put in distances what different spaces are it also has an underlying assumption that you understand the top right spot which is the vector spaces exist that matrices exists the feature spaces are a thing all of that comes out of linear algebra and so one of the things that you have to really dig into in order to like truly understand what's happening underneath your algorithms and not just be a person that pastes the algorithm at their algorithm and see if you can make it work is that when your algebra is key and that means you have to get into linear algebra the bottom left this is a wonderful plot that shows gradient descent and for those of you that don't know what a gradient is that's calculus basically that's a bunch of derivative stuff greater and so what we're trying to do is there's some some pinkish stuff there hanging out underneath here and you can follow those pinkish dots from the top of this curve and try to find the bottom that that optimization problem is not calculus which means that if we want to understand what's happening in gradient descent we have to understand calculus finally bottom right block we have area under the curve this is a type of metric for determining whether your classification modeling a good not going to go in the detail begins but this is more calculus and some statistics so whenever we're going to evaluate models we're going to pick on metrics and try to figure out whether a model any good we have to understand a little bit of statistics and a little bit of calculus again so as far as the math of machine learning when you get ready to try to make this this jump one of the things you're gonna have to do is attack all four of these and try to come to terms with them you don't necessarily have to be an expert but you do have to have sort of an intuition and understand what's happening if you want to know what's happening underneath and I have some great news MIT does it for you so OpenCourseWare is a great thing that's kind of come out to the past ten years and what that allows you to do is if you're dedicated you have no life you're willing to spend your evenings and weekend watching math lectures like me you can dig right into this and start reminding yourself like hey I know I saw this in college but I'm pretty sure I was intoxicated let's try this again and so you start going back through this and you can pick up the math that you were missing whether you forgot or that you've never seen before and these are all great video lecture series with homework and stuff that you can pick up and really work through so now I'm going to jump into probably the most divisive topic of this comment or the talk please throw no punches 500 is are so people have very strong opinions about this these are kind of like the two leading machine learning languages right now I'm waiting for another one to kind of come in and take their place like you got to be flexible but I get people asking all the time which one should I choose which one should I learn and the good news is they're both pretty good just pick one whichever one makes sense to you so for me I came from a C++ background - more similar to C++ than our is Python also allows you to play with microphones really easily I like microphones so I picked Python the key is that you dig into the language that you choose and actually learn it as a language before you learn is unless you mean learning tool so the like the temptation is to jump right in and say okay I want to do machine learning don't holler high water let's do machine learning wannabes that's not very useful because you don't actually learn the language like in a deep way so what I recommend is that before you jump into machine learning you take a month two months I really play with the language and learn it do interview questions there's tons of them on the internet there are tons of great resources are has a package called swirled that will let you dig into how art works python has a learn Python the hard way the ton of really cool resources witness after you've done that after you kind of make sense of the language that's why I recommend picking up sort of the one purchase you would have to make the follow my plan and most of this is open open source you can pick it up online this is where this is from the best $30 I ever spent so I went out I went to Barnes & Noble which is a bookstore not a line and I start looking through data science books that I could pick up for less than four dollars I found an science from scratch and I worked through it top to bottom so Joel Bruce I I hope have a beer if I ever meet him and then I also have gone through this book and it's sort of the equivalent for arts very hands-on at you did you what's happening it shows you how to use the language to really dig into this it's a lot of both of these have great solved examples they can really show you how to get into machine learning so after you learn the language then dive into machine learning and that's the place where a guide is really helpful so what does the machine learning pipeline would look all of the data scientists in the room are going to be familiar with this but for me this was actually a big week so I I came crack edenia where our pipeline was given to a postdoc let him disappear for 16 days and letting he might come back with an answer and then yellow hem and then he can go away for 16 more days but a machine learning pipeline is much more elegant I think so the first key is to get your data so when you're first learning there are tons of great datasets out there the iris dataset comes up a lot but there's a lot of places that are curated so in Python that's my language of choice so SK learned which is machine learning toolkit has a ton of data that's just built in that you can play with otherwise you're gonna have to learn web spring being an API like that the decimal advance you can pick that up later those are things that when you hit that point and it's problem you're like crap I feel good data you can learn that afterwards you're going to clean your data so this is the place where we spend 99% of our time it's also where we get really sad a lot you're going to try to remove Nannes you're going to have to deal with infinities you're going to find things that should be dates that are strings you're going to find string of it but aren't really strings we're going to find all sorts of weird things and so you're going to have to understand your data set you're going to explore it start making sense of it and it's pretty normal to spend more time in this section that you do with your friends love them that's fine it's just part of the deal when you start cleaning up data afterwards you're going to do some really cool things which is playing with algorithms so once you have your data set then you can start choosing algorithms you can start tuning them you can start playing with hyper parameters all of those things are a great place to go find guides so another person that I overhear at some point is machine learning mastery I actually have no idea that guy is that runs a site but I have done through pretty much every one of his little tutorials and learn to the Tantra so those are great and an underrated thing to do afterwards as visualized results so it's tempting to get a result and walk away but remember that visualization is super important if you can't communicate your result you may as well not have done it this is also a place where Google and FAQ overflow are your friends so anytime you hit a bug anytime you hit something you understand it's not a big deal just go ahead and find Google it and then google it some more and then find a Stack Overflow that's not useful and then find all the stack overflows there link to that one and you can just kind of track things down and then once you finish help us up realize that whatever you did you did wrong and start over and iterate on this multiple times and eventually you start to refine your algorithms and your data cleaning and your data sets into something useful where you can actually start making really nice results this is sort of the baseline of how to get how a machine learning problems sort of runs as far as choosing an algorithm this is another like overwhelming set when you're first getting into things this is a very rough guide I'm not by any means saying that this is the be-all end-all the real but you can use this as a guide to like where do I start next so maybe you want to do a regression maybe you don't know what a regression is knows so you start stepping through and you're like I don't have it I have 150 samples great I'm not predicting a quantity but I do want a category so I'm going to follow this one and I can go into classification or clustering and it lays out all of these algorithms that you then you say I don't know what that is but now I have something to Google you can start building out your knowledge by trying all these different things as you step through and start building up a knowledge for what you're doing so back to the timeline the key to this is to attack things and sort of bite-sized chunks to try to jump right into this is really really overwhelming I know because I tried and then I failed and I had to set out a plan so this was the plan that I actually wrote out for myself the first thing was to learn the map now for me because of my background this wasn't a two to three-month project this is like a one-month refresher but for most people it will be two to three months but the good news is learning Python coming from C++ because I am not good at naturally programming took me two months so where I have the beginning knowledge I lost our games in time I walked that time trying to understand Python but again this is a goal to shoot for you're giving yourself little targets and then you can build out machine learning tutorials and test projects this is a place where you find something that somebody else has done and you say I'm going to look at this for a little while then I'm going to come back and try to build it up myself and if you don't understand how rhythm if you take a shot at trying to program yourself who knows you might learn something along the way and then finally you want to get into these short-term passion projects so don't go out and try to solve world hunger on project two it's not a great place to start but for me I really like baseball so I attack the baseball problem could I do classification for certain types of baseball stats excellent that's the thing that took me about a month to figure out to get it working and that's the type of project that you should attack next to build up your your skill set so now if you allow me a 15 second ad what I do is I work for Medus which is a data science machine learning and like the data science machine learning classes budget bootcamp slash whatever that six months is something you can do entirely by yourself but it takes a lot of discipline a lot of effort and realistically no time in a bar or with friends so what we do is we take students in and we kind of build this up for them with them work together to get them to this level in 12 weeks so if you're interested in that compositing afterwards a few last notes you're going to fail a lot it's just going to happen you're going to find bugs you're going to find algorithms that don't work the way you think that's normal and in software so no one really cares right you turn to you try to run your script the worst thing that happens is it runs forever and you have to figure out a batch command to kill it the worst possible scenario so just go ahead and women point to it's really sad the first time that you get a non predictive model so if you're following tutorials right you almost always get a really nice tutorial where at the end you had 90% accuracy I can really nail down that that flower is blue like I got it but most of the time not was the time but a lot of that you're going to find algorithms don't do as well as you hope and that's normal that's fine if you get a null result that's just as good as a non null result if you are sure you did everything correctly because it can be just as interesting that these things aren't poor later or predictive as it is that they are predictive or poorly and then finally I highly highly recommend that you track your project and github and like keep an online presence of things you're doing because if you're trying to break into the data science space and you can go to employer and say shut up the thing I did it actually looks really good it's the point where you can say hey look I'm doing this on my own and I'm getting results I'm doing really cool things so keeping an online presence and learning github are really like really strong tools in your toolkit so thanks for let me not get Unicom if you have any projects you're considering or you're working on something cool let's chat about it this is sort of what I do like to talk about data so don't fly backwards Thanks [Applause] thank you that we have about seven minutes for questions okay it looks like no one has questions going once going twice oh so you're coming from the background of science writing you're lettin data science or needing six months it's very interactive so how much you know obstacles you'll face during this process a lot there was a lot of I won't say tears because it makes me look bad but close the tears when you're trying to figure things out and you just have like a small bug and you don't really have the knowledge to figure it out right there's a lot of banging your head against the wall and then on top of that coming from academia a really major shift was industry so for people who aren't familiar with like the way industry works how job interviews work when you're not just like hey I know that guy in that University I'll go work grant there's a really big shift and so the things that I found the most challenging were a not having the knowledge to solve the problems I ran into and be trying to adjust to a whole different culture but there's like a lot of challenges along the way some formulas in fact if you go to like an interview in a country like the background you have kids eating a nuclear and then pure your what motivated don't ever ask what kind of question for you to have using your interview oh man so I've been given a large range of questions but I think the most consistent one is why like how the physics prepare you for a good data science type position and really the answer that I came back on is like I have done data analytics but the key difference is that physicists do data analytics in a really dumb way in general we have really really complex equations we're trying to solve and so we have to do the analytics in a very very simple reproducible way like we can't use latest technologies and so trying to explain to them like yes I have done this but I haven't done it in a like production environment is one of those questions that came up over and over that you had to kind of try to overcome as I was transitioning between the kids so back so you are your Macaroni the physics right great that's where very mathematical so you are fortunate you have the background so the suppose for the people who are from liberal they hate math and you need to work together with the community at a team and on machine learning and the you are at the theater can you find something for him or her to do and you guys work together happily I mean what I mean is that a guy doesn't need to be evolving a lot of the mathematical stuff yeah so the reason that I jumped into the math so much was because I was interested in how the machine learning works but if you can get somebody that can pick up a little bit of coding language and you can kind of teach them just the basics of exploratory data analysis they can become really valuable right away especially if you can teach them to make a few plots and understand like how to interpret pause which can be done pretty quickly and so bringing people from all sorts of like strengths especially because I like creativity in general I think that's just not my strong suit somebody that can creatively look at data and like understand like hey these things might be interesting to look at become really really valuable right away and then when I like learning to like just give them the base knowledge to bring in algorithms and start working with the data there's only a few weeks effort a lot of time to get them up to speed include like a useful point and you can immediately start bringing value out of that collaboration thank you very much [Music] his outfit work hey thanks for the talk miles a comfortable physics background which is kind of the BS level I'll kind of look again the day of science but I'm also curious to once you have that data science perspective did you then kind of solve any physics problems in a different ways yes so I'm really excited about this actually so when that giant behemoth of a machine that I showed earlier when you that's actually made up of millions and millions of detectors like literally millions of inputs and outputs coming out all the time so we're generating literally terabytes of data every second and reporting anything like that and one of the things you have to do to work with that is you have to simulate the response of the detector to understand like hey this particle hit here and I recorded this what should I be recorded if it was actually true and it turns out we're not very good at simulating things and so right now I'm like working with neural networks to try to make classifications but can I tell if this is simulated or real data and then can i massage the simulation too with more and more like real data so I can get more and more accurate representations of things and so right away something that I couldn't have done because I have you know 60 inputs I couldn't have looked at the patterns of those inputs and understood simulated data without more advanced techniques like neural networks I can now apply that because I'm not as limited in my techniques that I can use and so bringing a data science perspective back to some of the hard sciences it's really cool like Avenue to explore that I'm like sort of really excited to get into hey you measured the magic program really curious random chance can we get involved our remote to do our design so we are working on launching online course work and that is coming up pretty soon and we're starting to do we're getting set up for like live streaming courses as well so if you'd like to get more into that we can talk afterwards there's about some of the opportunities
Info
Channel: IDEAS
Views: 496,290
Rating: 4.9411569 out of 5
Keywords:
Id: MOdlp1d0PNA
Channel Id: undefined
Length: 23min 42sec (1422 seconds)
Published: Sat May 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.