Episode 1.1: Intro and building a machine learning framework

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone so thank you very much thank you very much for joining me and in the last three four years I got like 1500 subscribers and this my very first video and I'm thankful to everyone from all the love even before the start the idea of this channel is to learn machine learning and data science together so for all of those of you who don't know me I'm Abhishek I'm a data scientist any type of fan and recently I think four or five months ago five months probably I became Grandmaster and three of the levels they used to have and this is what my profile used to look at that time and now they have announced a new level of T is that's where you share the data sets the point of that and just how a file looks now I hope it will get better soon so a lot of people ask me there are so many channels on YouTube about machine learning data science is the buzzword you know yeah is everywhere so why this channel on YouTube so over the years I've shared a lot of useful tips and tricks on title and a lot of other web sites like LinkedIn and someone Twitter and now I want to share the same with a much bigger audience so another motivation for me to start this channel is to help beginners amateur data scientists and if you're a professional you might know everything that I will talk about in my videos but I will make sure that there is something interesting for everyone who is watching the real motivation for me to start this initiative is to learn myself and if I have to teach something to someone I need to understand fully beforehand I think this is the best way for me to learn revise that keep me up to date with everything that's going on in this machine learning world what I've seen that Wigner's tend to give up quite easily some get scared of these humongous the assets that they come across have no clue where to start from if you have 20 different CSV files floating around and you have to munch features you have create features you have to create it as it simple leaders its from all these different data sets and people have no clue where to start from and some people they just don't even have the time they don't have the time to go through different books like 300 pages books you know and there's a very interesting thing that all of these people have in common it's just theoretical knowledge they are all well-versed and know how things work and Jerry but when it comes to applied people fail people struggle a lot they don't know where to start with so given that I want to keep my videos and tutorials specific to applied machine learning and data science and I would also share a lot of tips and tricks I would get stuff done fast how to make the best views of Python you know so when I started back in the very beginning of 2014 as a data scientist it was very difficult for me to find my first job first of all machine learning was something I had never studied at the University so I had to read about it and learn everything on my own I missed ten months learning machine learning and then I got my first job and went came to applications I was so worse at the time so the first thing that I did was to learn by solving problems so I found this website called capelin I took a problem from there I was working on image processing and the problem was to identify emotions and images and I had very simple things that I learnt at wage processing I've failed failed miserably but I didn't give up I studied what others did and what I did not think about but I had nothing to show more resuming so I start building a portfolio of a designer machine learning projects and present in a good way you know because everyone has done the same hair projects you talk about the Titanic problem in cattle thousands of people so what's different was what are the different thing that you have done this is something this is something that that people are looking for when you're giving interviews in your blank for jobs so in this video I'm going to show you how to make best use of github and cattle and how to organize your projects you have a machine learning project organized as well so that you can reuse moves of the code that you have written then when someone gives you CSV file ok this is a classification where the wall you don't have to spend more than 30 minutes on that and within 30 minutes you can build a very good morale can present graphs and everything you can build basically the whole presentation so just what I'm going to show you this video and this is also helpful when you're working in industry you know working with like pouring models to production you know and that's very useful there so let's start you so the first thing where your code so let's say you have you have a server probably the blue ass or gtp whatever you're using and then you have to call something and because very tedious to another chord there so what I do is I use course over even locally sometimes so we go find for server and this is the link and then you scroll down scroll down scroll down you have download binary and there is this file I'm going to use for Linux and I copy the link location go back to my terminal and I will say a period that's a file so I hope that doesn't take a lot of time okay so what course ever does is Cortana gives you an IDE or a lot of people also call V as cold as a very intelligent accelerator so but this is basically I think it's basically an ID and you got the file here now and that runs right inside your browser how cool is that so let me expand this code server and there we have it so I'll just go to this folder but Singh is see that there's a file called cold summer it's fine refund so what I'm going to do is I'm just going to run it and no ash go server and it's sad so I have code server running on or a DAT follows so and it gives you a password I want like if you if you want your own password can have your own password so what I do is usually this right one improperly gold-silver boost for 10,000 okay okay oh it's already use okay I'm going to do it so now I have good server running on all IDs and I can access it from anywhere I want if I have access machine and lands on the same network I can access it on the same VPN so let's go back to Firefox and see what's happening so I have one two seven zero zero one try thousand and just possible ok I put in was a chick no I'm gonna cry her name right okay there we go Oh fully-fledged vs code not fully fresh if you have everything you need you don't need anything more than that I think so go here I say okay open the folder that we created and that was in workspace machine on a project on fate say okay oh no doesn't work okay there we go okay so here we go so we have everything we don't have any files we don't have anything and you have this welcome thing your case chart a new file and change okay so I will call it training Delphi training the eye I'm just gonna remove this one okay what else do we need we need any image what else what else really we need some power Network star 5 so we already started building our own like very first oceanic project some have M plate that you can use anywhere you want like all the generic product and I need something like create formal suffice that she needs to raise the pole somewhere then you have predicted and chose lead some kind of like Oreo never never can you need something a generator Alec to keep them separate so you need a TAS at that time and then your names of a tidally the ossified some kind of u2's that don't go anywhere and she need probably a feature generator eternal generator and you also need some kind of dispatcher I will explain why I use this and you need maybe everyone calls attention so I'm just gonna call the engine that contains your training and evaluation functions you need that now what else do you need I cannot think of anything at the moment so you have these files looking at school and tripping off and start a new terminal okay what do we have here the workspace template you see it tells me like which branch I'm on everything I can show you probably in another video how to do that I liked like a really cool stuff it can can use a lot of different things here I think it'd be a lot of information so I have all these files here but I don't want them like that so I'm just going to create another trick because source and I'm going to move all the files oops we start at pi/2 source so I don't have any left here and now I need I need more more more stuff here so I'm going to create input I'm going to create a sorry I'm going to play polo other input I'm going to create models where I'm going to say the models I'm going to create I'll get ting now so if you just touch the pond this creates an empty fun so this is what I have found you can see everything that I have so now let's go back to our IDE in Firefox and now you see everything is arranged properly now and put so much bottles empty until empty a lot of files and servers and live to get never so that's the first thing I do figure out the critical so github provides you with some sample getting notes get ignore pipe you have the sample get ignore here so I'm just I'm just going to grab everything so you can see it's from get up and if you go here it has a CC license you can do word whatever you want to do with it daily so I'm gonna copy this thing place to chain so what do we have here so you have like a few reading and distribution or like some kind of package for it you need to add these files you know you don't need to take distribute these files so you don't want them in kit and then you have iPad no checkpoints you don't want them yes I'm what showing warm and stuff so let's have let's add some stuff here so I don't want anything from input data so either going to be huge so data and and models so I'm also say like I don't want any data files right start or CSV or I don't know s5 also don't want any kind of I just had this sticky edge okay I think that should be good enough for hit ignore I think so yeah okay so I'll let me just commit this first so I'm gonna send everything to get out and see just a guess tells me out here you have a lot of files in this folder and get ignore so I'm just going to hide everything good at dot how much - and first comment just gonna call it first comment it tells me okay yes good you want to have this since I'll never use it on my machine I'm going to say ok just by email address and all no and this one fake use it up name so my username okay so hope you're on set and then go back to comet ah it shows you all of us anyone's um and then this now going back to Firefox can see that I have everything in here nothing is there that's a good first step but all of these files are empty so we have to fill them up let's do that so now we need some data to start with next steps okay let's go to cattle and see okay just sign in so I'm signing to cattle now and now I need to find the competition where I can download data from so let's see let's see let's try an easy one okay cattle picture encoding all data is categorical okay if I get a pass let's see if I okay this one is done no I just won so what do I have the three files I change it from here you let me move that party place input okay back to the terminals so what do we have here now so leave a little input see that we have three paths we just downloaded sample submission hey I don't need zip files anymore I'm just gonna move them so we have only three parts line okay okay I have an ID polymer avatar good columns I have a bunch of categorical poem okay but if you if you want to see it you can also see it and our box here so click on the Train have everything in here I don't need anything more than that so let's start with it to start recording it will go back to our Firefox refresher on it okay so put half the training panel change OCSP you want to know that you can roll it here look at a CSV file but we don't need all that first thing we need to do you probably want to look at or the distribution of the tablet I've already done that so I'll just go ahead and start putting so I have input on us and cycloid in the model selection module okay what we need is I need a lot of things here so I'm going to do some spaced-out I run the script this would be executed I have a data frame read CSV so I'm I'm working related to the base folder I will just say good slash frame dollar seriously and I will make a call Capel assign a value of minus 1 and then and this can go shuffle the data so this will pinch this is Chris going to shuffle it again wait you reset all the indices and x-axis now now what I want next is the key force okay so let's see okay KF stop the tune now if you look at the shrimp are capable documentation what you have here stroke a fool you have this but sharpen the random state ok yes coffee of them come back a butcher like have a party to go everything that's five okay I have that now what I want this or occult creating index validation index and then you have GF cannot very well to stratify it oh just look here okay split that split so you have x + y strife a crippled okay so it's pretty easy no thanks just for him why is he yes now you got this again just a just some Jack and we're running this gold we know and just assign the cables here so validation by DX and your code name is people this close to old and that's it you're done now it's time to save the file - training pose let's say on the index leave an empty line how are you pick up so I went back to the terminal now I know the source so I'm just going to say oh I'm going to and island- am then I'm running a module 2000 votes I say what happens okay yeah it gives me a warning that you can probably no more for now and you have to report thousand trainings external validation and awesome so now just say like show me and holes oh you can see there was a careful column and that's what we're going to use when you're trying our models so now to start training one let's start the train and then we need to define like we need to leave the training leader so what you can do is upon us you need to hold partners have to train it up here now you need a challenge as such so you can see like for people that have in it isn't a list I'm capable to put it on hold for evaluation so you need a mapping and you also need fold so let's say we create overlapping which goes like this so every next key the value is everything else proposed and then ok that's it for now also we need training data is not or not and since we already know what the tally palomar and ID and capable economy don't need them so we just drop them yeah most of the data now I just do some kind of something like this so the order of variables is the same you know me that change it the same for this specific Rome and now you need is to encode the variable so this what I do so less metal encoders and you have creating columns so go through each column and initiate a very pure class and then from cyclone and then you wait on all the training and valuation samples and then you transform them and usually I append I call even encourage this one so I will change it a bit and call up name okay I have to snap now is ready to train before that you need to take care of some new some other things so bringing happen on obviously so where do we get that from let's get that from a normal variable [Applause] scope the same thing you can hold before you want to try on okay yeah anything now now you need to train a model so we'll start with something basic let's try on some of what not like very basic and what we're here okay don't care about the parameters probably just jobs nice one I just wanna see the frictions are let's get the probability so this is care about probability for one of the classes a great photo and you have okay what do we have now okay go back you know this is not going to work none and everything is done so I'm gonna create other file run daughter swish no damage let's chair move and I'm going to define some variables like hey Anita is home see for the sir okay let's go feeder is I'll take their hair and just write quite a - a massage see top right you see the environment variables and time let's say if the Botox I mean let's like after so does my person on time okay so we need to fix this should be an end at all that reason spring promise fix it so this is what I did in the background actually a file run the passage put the variables and I decided this thing run another session and in my training function actually interested and thought should hold should be an int and now everything works and I think I'm getting some results but I don't want to cover those metrics that's helpful a DLC and a skewed so friend see C score so when your latest from skewed that's screwed but if you use our CEO symmetric so you have added up my mother my father should have a bridge since okay oh it software I like this everything is there I should be okay yeah so it's raining fine so just know I'm gonna use yourself on seven three nine and now what happens like I increase the number of sq meters - there's I think initially it's hard see the pot okay so I'm gonna take one huh Oh typo on the exam I've already fixed it in the background and now we try to run it again okay so we influence your thumbs 172 them 0.746 almost I would say okay so we have created kind of our framework we can use any different kinds of all that they want how the different kinds part of so it depends on what you want you want to do so well what I do usually is like at the sparkle it is pasture this is where I create the model so this one would say like models dictionary and random forests and it will return just one some other purpose it also have something like fries now I'm going to change the run decision of it so I'm so that's an argument now so other corpses shake when you run run another search yeah have this very will you have to say which mall you want to run run that we need change but we need to change some things so we got everything and you get in one room from dispatcher or this bedroom and I have to just say like I want to run rather for a so much room whatever I want I can have just the parameters here so doesn't have to mess with the training code anymore so as there's a standard is forest okay even some others on tip okay I'll kiss me you guys at it and pour it here okay so going back to the terminal now we are going to run this first so addictive dispatcher has no actual model okay it doesn't there's a reason for that make it smaller you can see it's not all its models I made a mistake there so I'll go back fix it let's run it again ah now it works so now it's training random forest we should get the same result as before yeah okay so it's similar everybody some redness let's see for please and that runs to a little bit better result okay so now we are done with the training almost not put it you need to save stuff so what I do is I just use table and I'm going to save the things that I've created like how to say they will include us so I'm going to say then once / I would say okay the a string a string you fight something and then you can say which one is it water underscore folder take this one and then you're able to save model itself one over the score half sorry that was it's a change should pass fun okay oh oh yeah then let's find a model during script and we are saving like all the models and this model portal that we have let's say let's see if that works so I'm just going to install it oh I already have it so I'm just gonna run that time and wherefores myself everything works straining the model all of us there yeah everything is done let's go and see how it looks like so everything should be inside this folder and there it is right over no stop the car this is a model drivers label after the label input that was used with Ron of course so you have everything now now we have to build the inference part and that's also very quite easy and if you do it this way that's much more fun and it's arranged properly so let's see that you
Info
Channel: Abhishek Thakur
Views: 56,791
Rating: 4.9118543 out of 5
Keywords: machine learning, python, vscode, code server, kaggle, framework, training, model training, validation
Id: ArygUBY0QXw
Channel Id: undefined
Length: 46min 52sec (2812 seconds)
Published: Wed Dec 25 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.