Machine Learning Crash Course With ML.NET

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hey everyone i'm john wood and i'll be showing some machine learning basics and building a model with ml.net in this crash course and if you want to see more about what mmo.net can do please visit my channel in the description so to start off let's go over a little bit about what is machine learning and there's this quote here about aurelian cheron and i hope i pronounced that correctly and he wrote a book from a rally called hands-on machine learning and if you want to learn more i highly recommend this book is tons of great information in there and he says machine learning is the science and art of programming computers so they can learn from data and it's a science because there's a whole process that you kind of follow when doing machine learning and you produce mathematical models using algorithms but it's also an art because there's so much you can do for pre-processing the data for instance or which algorithm you shouldn't use that it kind of takes some experience and pretty much some trial and error in order to produce a better model so the first thing is there are different types of machine learning systems and there are more than these two but these two are probably the most often that you'll probably see and the first one is supervised and that's where in your data set you have a column within your data that you want to predict so if you're working at a bank for instance and you're given a data set of customers and you want to see which of those customers are most likely to default on a loan you would have a column in your data set called as default and it will tell you which customers have defaulted and you would use that within your algorithms to produce a prediction on the other hand you have unsupervised and these are where data sets where you don't have that label or answer column for your algorithm to predict and this is where you want to predict certain patterns within your data so continuing there are certain types of each of these algorithms so within supervised you have regression algorithms and these are used to predict values such as house prices or salary and then you also have classification which are used to predict classes say you want to predict a true or false class since there are two of those that is called binary classification you want if you want to predict more than two classes and that would be multi-class classification and a good example of multi-class classification is doing image classification where you give your model an image and it tells you what's in that image whether it's a dog cat car or person and then unsupervised you have something called clustering which is used to predict different groups to have kind of the same patterns and examples of this are where you want to group out different customers based on their shopping habits so you can give them different marketing than other groups then you have something called anomaly detection which is basically predicting items that don't really fit the rest of the data best example this is kind of credit card transaction fraud where they can find out this one purchase doesn't match all the other purchase that you've done in the past so we're going to flag this as a fraud and then get to you make sure that it is a fraud or not or there's something different that you you're doing and now let's go to the machine learning process and the first part of the process is where you have your data now sometimes you are given your data or other times you have to actually produce your own data either way you need the data you need lots of it and i would also argue that you need a lot of good data as well you may have heard this old programming adage garbage in garbage out and so if you have bad data you're not going to have a good model it may perform well but it's not going to give good results when you put it out in production and people actually use it once you have your data you go through data prep or data cleaning here you would do things such as removing or updating missing values or putting categorical or text data into numerical data since these machine learning algorithms take in numerical data and can't really do anything with text data next you will go through and create your model and this is mostly where you decide which machine learning algorithm that you want to use and within each of those categories like regression or anomaly detection there can be tons and tons of different algorithms that you can use each one of them have pros and cons and it depends on your data which one works best so it's kind of a trial and error thing there and then you will evaluate your model to determine if it performs well on some kind of new data that it hasn't seen before and if you give it on the data that you use to create the model then it kind of knows that data already you want to give it data that it hasn't seen before so it can generalize well to that new data but oftentimes you will have to kind of iterate on creating the model and evaluating it if you get a bad performing model when you evaluate it maybe trying a different algorithm and then evaluate it again and see how well that performs then once you have a decent model and you can deploy it and then once that goes out into production systems for people to use but then you would often have to go back and do this process again because your data could have changed completely and which just totally invalidates your model and degrades the performance of your model so you have to go back to the data and do this process all over again and so now let's focus on what mmo.net is and to put it simply it is machine learning for net developers and notice i said net developers instead of c sharp developers even though we are going to be using c sharp for this but you can also use f sharp for this as well and a couple of benefits of using mo.net versus some of the more popular frameworks in python is first is we kind of alluded to it earlier but it is built for net developers emo.net itself is built in c sharp and it is a just another nuget package that you can download and use within visual studio and another thing is this performance and we have a couple graphs here from the ml.net website and there's two different performance measures here first is the accuracy and it's using an area under the curve accuracy which is performance metric for classification algorithms and we see here emma.net actually performs better on the same data set in the same pipeline than the python frameworks of socket learning h2o and then the other metric here is the kind of training and testing time that it took and emma.net took a lot less time to train and test data than the second learner h2o framework did and i would also say that deployment is also a benefit pretty often you'll see these machine learning tutorials and they'll show you how to create models which is great but then once you create a model what do you do with it now and it's not any useful if it's not deployed to where other people can use it and tensorflow probably got popular for a reason because it actually came with something called tensorflow server which helps you deploy your tensorflow models but ml.net also comes with ways where you can easily deploy your models to use within web applications or desktop applications and mod.net actually comes with a ui tool that you can use so you don't have to do a lot of coding to generate your mo.net models and this is actually built into the newer versions of visual studio and it's called model builder and it is just kind of a wizard-like ui where you can give it your data and it'll do the training for you and behind the scenes is using something called automl which stands for automated machine learning and that just takes your data and it goes through all the different algorithms and measures the metrics on those algorithms to give you the best performing model and with that let's build some ml.net models now before we get code in here let's take a look at the data set that we're going to be using and i'm here in kaggle which if you're not familiar with kaggle is kind of a social network for data scientists and machine learning engineers also a place where people can upload some data for people to use to practice their machine learning skills and the data set we're going to be using is the california housing prices here and it's actually used in that hands-on machine learning book that i mentioned earlier let's take a quick look at the data here so if a few columns the longitude latitude total rooms total bedrooms population of the area and some other features of the houses within a particular area but one thing we want to but what we want to do with this data is predict this median house value since we have our item within the data set here we know that it's supervised and we're predicting a monetary value at the house that tells us that it's a regression algorithm that we need to use all right so i'm here in visual studio 2019 i have a console project loaded here and first thing let's add in our data we downloaded that data from kaggle and it gives us a csv so i just drag and drop that in to my solution explorer and i'm going to look at the properties and i'm going to tell it to copy to the output if it's newer and this allows me to use that data set without having to go to a bunch of different paths to get to it and so like i mentioned before amor.net is a nuget package so i just right click and manage nuget packages and it is called microsoft.ml here we go and we're using version 1.5.1 here and now the first thing we need to do for ml.net is we need to create something called the ml context and we do that by creating a new instance of the ml context and this in ml context allows us to do pretty much everything we need to do with mmo.net for instance we need to load in our data and to do that we do context dot data load from text file and this is a generic method so we need to give it a type to load to and i'm going to call this housing data and i'll create that in a minute but first let's finish out this method here the first parameter takes in the path of where the data is so i'll just call housing.csv next there are a couple of optional parameters such as if it has a header or not and we can double check here it has the headers in the file so we can say true then next we can tell it what the separator character is and we can look at that at the raw data for that and we see that they are comma separated so we just put a comma in here now this housing data will generate a new file for that and this is just a new class and this is how we define our input schema for that data so we just need to put in properties that match this data schema into a c sharp class and i'm just going to paste these in here these are all properties i also added some attributes to these properties and these low columns tell ml.net how to load in that data the numbers within the low column parameters are basically the column numbers which are zero based and those are what columns these properties need to match up to and we also have the remember the median house value is the column that we want to predict and so we also give that a column name of label and this is telling ml.net that this is our label column but we are calling it the median house value we have our data loaded next i'm actually going to split our data using context data that train test split and pass in our main data and i'm going to give it a test fraction of 20 so this is going to split our data into two different data sets a train set and a test set and it's going to take 20 percent of our data into our test data and why do we need to split well when we create our machine learning models we use that training data set and that's the data it's going to be used to train on but remember we also evaluate our models and we want to evaluate on data that our model hasn't seen before which would be everything in the train data set so we give it our test data set to give us a more accurate picture of how well our model does so now that we have our split data i'm going to create some features now features are the columns that we give our machine learning algorithm to use to make the model to do this i'm going to use our split data our train set and i'm going to get the schema from that and i'm going to use some link methods here first is going to use the select method which is similar to a javascript map method and i'm going to take the name of our column here of our schema and i'm going to use the where method to filter out different columns so the column name i don't want the label name which remember in our schema here that was what we called our median house value then i also don't want to use the ocean proximity column the reason for that is ocean proximity is a string our features that we want to use right now are all numerical are all floods so we're getting all of our numerical features here and then i want to set those to an array next let's create our pipeline and this is going to be everything that we need to do to create our model including our pre-processing steps and our choosing of our machine learning algorithm first we do some pre-processing we use that context and we have some transforms that we can use and remember that ocean proximity is a text column or string column so i'm going to use the text dot featurize text method and there's a few things such as normalizing the text and then creating numerical values for the the words in the text so the first parameter is kind of the output column that we want to create from this and i'll just call it text and then it takes an input column name which is going to be that ocean proximity and i can append as many other transforms as i want to and then within this pipeline the next one i'm going to use another transform now i'm going to concatenate into a features column and remember all of these features up here i'm going to concatenate all of those together into a single column called features and i'm going to append another concatenation where i pass in the features from this previous concatenation i passed that in and i also want to pass in the text column from this featurized text and that's going to go into a new column called features and then next i'm going to append our machine learning algorithm that we want to use so we use the context regression and then the trainers also called algorithms and i'll use the this poisson regression and you don't really need to know the nuances and the algorithms to build machine learning models within ml.net here all right so we have our pipeline and now we can create our model using the python.fit method and we pass in our split train set and once that runs we have our model and what we can do next is that we can create some predictions using the model.transform and we can pass in our split test set so what it's going to do is our pipeline fit is going to train the model based on our train set but the model that transform is actually going to use the model on our test set to make predictions and we can get some metrics from it here using context that regression that evaluate method to evaluate our model we pass in our predictions and because our label column name and our score call name which mo.net does behind the scenes passing the predictions since the other parameters are using the defaults and now we can look at our metrics so we do console writeline depending on the algorithm that we choose there are different metrics for regression r squared is one that is used pretty often and we do matrix r squared and now let's run this and make sure nothing fails and we get a decent r squared there you go so we have an r squared of 54 and that's not real good and you want to get closer to one in your r squared what we could do is we can change our different algorithms and see how that works or we can do some additional pre-processing on our data to help give us some better algorithms and this is where that art of creating machine learning algorithms comes into play but instead of doing all that i'm going to show that model builder that i should mention earlier in the slides and to do that in visual studio 19 it actually comes with it and to enable it you go to the options in the environment section go to preview features and near the bottom you want to click enable the ml.net model builder and from there in any net project you right click add and you have this machine learning option here now if you don't see this after checking that enable and restarting visual studio you can go to extensions and manage extensions make sure the ml.net model builder is installed if it's not you can search for it and install it from there so let's add machine learning here and the model builder let's just do quite a few different scenarios here we can do classification items and value prediction which is what we're going to be using for our regression data you also have image classification note here that you can you can use all these locally but for image classification because images have so much data in them you all kind of need a gpu to to train this model a lot faster than you can do on your cpu so you can have the option to use azure machine learning service where you can use that gpu and then recommendation algorithms as well and there's some limited ones like clustering and anomaly detection that we mentioned earlier so we're going to use value prediction i'm going to train locally and i'm going to use that same file that we did for our housing data so there's our data there and we tell it what column that we want to predict and that was at median house value you can remove some columns if you want to if you already know that's not going to be helpful for the machine learning algorithm such as like id columns if you're getting from a database next we'll select train and here we select the number of seconds that we wanted to train and this kind of depends on your data but for here i'll i'll just do 15 seconds here and we click start training you see the output here that's going through all sorts of different algorithms and we got the r squared metric that it's getting okay so that's finished and it says top five models explored and the top model is this fast tree regression and we see we have an r squared of about 86 percent and that's a lot better than that 54 that we got when we built this model by hand next we can evaluate this and this gives us the opportunity to put in our own data here and select predict and it gives us a prediction and then last it generates the code for us so we can click code and add projects to our solution here and see it added our our two different projects here and we have this ml model that zip and this is that saved the model from and also shows us how we can consume it here all right so i think i'll end things there i hope that gave you all some good basics of machine learning as well as how to build machine learning algorithms in c sharp with mo.net and once again if you want to see more inbound.net please check out my channel down below in the description and thanks to traversing media for giving me the opportunity to do this guest video thanks everyone

Info

Channel: Traversy Media

Views: 92,130

Rating: undefined out of 5

Keywords: machine learning, ml.net, c#, c# machine learning, c# ml.net

Id: XBSvp43EQhA

Channel Id: undefined

Length: 20min 58sec (1258 seconds)

Published: Fri Sep 04 2020