Machine learning tutorial for beginners: Model automation using R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone so today I am going to talk about h2o or timer package in our which is a very useful package for question learning using our so let's say that you are a person that you don't have a very good or very big knowledge in machine learning then this package will be a very useful word to you at the same time this this is a very unified interface so that means you can run a variety of machine learning and deep learning models using this package at the same time the other advantage of using this package is that you can automate the process of training for large number of kinetic models at the same time since this is an interface with or permit automated process you really have to tune the hyper parameters in your model for an example if you run a gradient boosting algorithm or a deep learning algorithm then you have to specify what are you high parameters and what are the ranges for you - ah - but are you seeing this h2o or towable package you don't have to specify those those things will automatically are done by your h2o package for you so you don't have to worry about those things alright so now I'm going to show you how to use this package using arm so first you need to install this h2o package into your studio so if so in order to install that you you can go to this package and then you can click this install button and here you can type h2o and you can install this after that you need to load this package into yours to do so so here in order to demonstrate the usage of this package I'm going to use this Titanic messenger in data sent from a kegel website so first you have to go to this website and download this test dot CSV file and trend or CSV file so after uh downloading the data you have to import this data set into your our studio then I'm going to look at how I train data set looks like so here you can see that there are some columns where it was treated as an integer but I need to convert them into factors for an example this survived variable over here and PC class variable over here I need to convert them as factors and also I had to do the same thing for this embark'd variable also and I need to repeat the same thing for my test data as well and after that in order to initiate this h2o package first you need to create an h2o cluster so in order to create s2 cluster you need to run this h2o dot in it come on like this then it will automatically create a cluster for you so one of the most important thing is that you need to install a java in the into your machine in order to create this cluster for you so so if you have got it correctly then you should get a success message like this all right so another thing that I need to remember or I should tell you is that in this data analysis I am going to remove the variables which has a substantial amount of missing values so so for an example here you can see this age variable has a lot of missing values at the same time I am going to remove some variables which does not make any sense for an example you can see this a passenger ID and this a name variable and this ticker variable which corresponds to a ticket number and this kind of variables and also this cabin variable so these kind of variables will not make any sense to our analysis so I am going to create h2o frame by excluding those variables so these indices correspond to the variables that I am going to exclude from by a trend data set and test data set so the next thing that you need to create is this h2o frame so I am going to create this h2o frame using the only using the variables that I am going to use in this analysis so so like like this you can create your h2o frames by using this s dot h2o for sure so now you can see that first I initialize my stro cluster then I assign manage to a friend's to my training data and a test data so the next step is to run these machine learning accurate so to run this machine learning algorithms I am going to use this h2 ordered or 2 ml function so inside this h2 ordered auto ml function you have to specify this for required parameters so desaad these are the required parameters that we have to specify and the other parameters are the option parameters so that means those optional parameters you can specify if you want so the the so out of these required parameters first clear specify your response or your dependent variable so so in this analysis I am going to do a classification problem so my response variable should be effective arable in this case it is equal to survive variable the next step is to define your training frame so so here my training frame is this trend underscore D which is h2o frame the next one is this maximum run time so this argument will specify the maximum time that autumn process will run in your computer so then you can specify the maximum numbers models that you are going to run so this can be done using number of this so for an example here I am going to run this logistic regression algorithm so in that case you have to specify this logistic regression model using this include underscore Elvis option so here you can fit a variety of machinery models for an example if you specify this include dot include underscore elbows as drf then you can specify you can run random forest algorithm or if you want to run XD boost algorithms then you can specify it as XD boost offer an example if you want to run these deep learning algorithms then you can specify it as deep learning so I will come back to it later so my next parameter at that time specified is this stopping metric this is an optional parameter but but anyway I'm going to specify this so here you have specify what will be your stopping metric since I am doing classification by stop in metric will be a Miss classification error so in that case I'm gonna specify my stop a metric as miss classification so if you go to this help page you can see that there are a lot of stop email click static you can specify for an example if you are doing a regression problem then you can use MSE or our embassy or ma each correspond to mean square error or if you are doing a classification problem like this then you can specify this stop a metric as a UC or a UC PR likewise also I'm going to define our one additional parameter which is called these n Falls so here I'm going to use Phi for cross-validation in order to tune the parameters so to do that I'm going to choose my number of Falls as five all right so now I'm going to run this command so you can see you are totally straining over here and after after training has done you can see the results so in order to see the results what they have to do is first you type the model name then type at and after you type a hat you can see these options you can access so if I use this leaderboard so here under this leaderboard you will see the all the machine learning models that you have fitted using this AML object and here you can see based on this model UIUC is point eight two and a you see P R is equal to points seven six likewise so let's say now I am going to feed our two version learning modulus which are GL m and gradient boosting so in order to do be so as an additional parameter so if you run this object it may takes some time because now I am trying to machine animals so it will take comparatively much time to compare to the previous model so after you run this model and if you run this command which is a male one at the leaderboard you will see the results of the all the models that you are fitted here and if you want to find the model with the best results then you have to specify you can obtain the results of the best model like this so that means if you type a male 1 at leader till gives you the results of the best model so here you can see the parameters of your best model for an example here this best model is a gradient boosting algorithm which has 40 trees where these minimum depth is 6 maximum 37 C value is point i5 and these are the results of your confusion matrix likewise so let's say that you need to run all of these models which are specified over here and you need to compare the results in that case what you need to do is we have to simply remove this kind of code from your code then if you run this model again you will see the results based on your all machine learning models but this will take some time because here for an example you are going to treat a deep learning model also so so depending on your data this may take like an hour or more than an hour so one last thing that I have specified is that how to apply this model results for your test data that means in these first two steps that are in this in these two steps what I did was I feel train the data based on various machine learning algorithms and after that using this little method you can find the best model based on your training and using this best model then you can apply your base model a fiber test data in order to predict the results so to do that you can use this edge to ordered predict version so inside this h2o dot predict fashion first you specify your best model like this then you have to specify you you then you can run this code and it will predict the results based on your base model for your test data then you can see the results based on your test data so that's how this package will work to fit various machine learning models without having a very big knowledge in machine learning so if you think this video is useful to you please subscribe my youtube channel and I will bring more useful videos in the future thank you [Music]
Info
Channel: machine_learning 2019
Views: 548
Rating: 5 out of 5
Keywords: h20 automl, h20 automl r, h2o automl r example, machinelearningmastery, machine learning in r, machine learning in r tutorial, deep learning with r, deep learning with r for beginners, machine learning tutorial for beginners, machine learning tutorial r, machine learning h2o, automated machine learning, automated machine learning tools, automated machine learning platform, automatic labeling machine learning, automated machine learning in r
Id: g2WIZf7rwsQ
Channel Id: undefined
Length: 17min 40sec (1060 seconds)
Published: Wed May 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.