Getting Started with ML.NET Auto ML in C#

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi there i'm matt eland and i want to talk to you about machine learning in c-sharp using ml.net ml.net is microsoft's machine learning framework it runs in c-sharp f-sharp vb.net code anything that really runs on.net.net framework uh or net core really any net.net code for the last 10 years can run ml.net so ml.net lets you do most common machine learning tasks using your own machine and your own services and there's really no cost associated with it so i have a sample application here and i will put the link to this repository in the description and a shortcut in our video here but this is a sample application that that's built to uh classify video games that have not yet been released you give it information about that game such as whether or not it contains virus violence gore nudity etc and it's going to classify a machine or it's going to train a machine learning model that can then be used to predict the esrb rating of a game that does not yet exist so you can use this prior to submitting a game on the market and get a predicted score plus account a predicted group plus a confidence rating so as a console application i'm going to hit train a model and i'll show you how this actually works in a little bit i'm just showing you the the sample here so i'm going to train for 10 seconds and it's going to try a bunch of different machine learning models i don't necessarily care about the algorithms that it's using auto ml is what we're using under the hood here and it actually knows to try a number of different things and the one that it ultimately found uh was this uh sg sgd calibration ova and so this has a you know fairly decent uh performance it's about eighty percent performance uh for this it does miss classify some things uh we could look at this cloud confusion uh matrix in a little bit more detail if we wanted to but there's other videos out there on that but this this looks like it's it's going to be a fairly accurate prediction so let's try to use this to predict a few sample video games and see how it comes up with so teen side scroller it thinks that that's probably going to be a teen rating with about 77 confidence uh kind of stuff that's a game i actually made uh it thinks that's it probably an e but it's not very confident on that so it might be an et everyone 10 plus earthlings are coming that's another one i'm built uh it thinks that's an everyone uh a game with about a 75 confidence score that's probably accurate uh shoddy surgeon simulator thinks that's probably mature uh we'll take a look at why that is when we take a look at our program but mostly due to gore and blood and all that uh then assistant to the lawn service manager 2022 uh thinks that's probably an in everyone 10 plus and then intense shoot-o-rama that's definitely going to be a high level mature only title so if we wanted to we could save this machine learning model to the disk and if we had came in later we could load it from the disk and that actually just creates a zip file on my disk with that model in it but let's take a look at how this is actually working because that's really what you're here for so i'm going to look at my esrb rating predictor which is the class that i use to really manage my model and again the repository for this is available on github you take a look at this i also have a comprehensive article outlining the steps by step for this if you'd prefer that but here i've got a train model so when you're building a machine learning model you're spending a lot of time up front to train that model and that's what this code here is doing we're creating a pair of eye data views using a machine learning context this ml context class up here we're taking this ml context we're using it to load some data from a file uh here we're loading some data i grabbed from kaggle from a data set on kaggle.com it's a very wonderful open source uh data science community that happens to have a lot of free data sources out there and this was one on games so i have a file specifically for training my models and specific files specifically for validating the performance of those trained models um these models tend to perform about eighty percent no matter how long i train them uh so i suspect that there there either is a lot of variance or bias in this in the data set or that there may not be some good separation between these the training and the validation sets but i'm opening it up as a comma separated value file so i'm supposed to separate specifying the separator character of comma by default it's going to be a tab i'm specifying that there is a header row so the first row is something we should ignore i'm also saying allow quoting this is because the data here will have some quotes and commas in some of these um and some of these these first columns here uh so here we go uh this uh sukuna right is the ruin uh this has some quotes around it there are some things i think there's some warhammer 4 or 40 000k um that it will not be able to parse unless you set this allow quoting to true so this is a pretty good setup for a comma separated value file and we're using this for both of the training and the test data in order to do this we have to have a game rating class that's just a plain old c sharp class with a few attributes here so this uh has a bunch of properties that are really related to that comma separated value file i just showed you a second ago i have to tell it what column everything corresponds to and i tend to do these things in order this is about the one part of ml.net i don't love is i have to really define the columns uh by index and i wish that there was a little less tedious way of doing that um so that's that's about uh the extent of training the uh the model or or sorry of loading the data set so now that we have our data we want to go in and we want to actually run an experiment now this is a classification experiment and this is specifically a multi-class classification experiment because there are total of four esrb ratings that we predict uh everyone everyone 10 plus teen and adults now that's contrasting with a binary classification where something is either something or not something ml.net supports both as well as many many other things but i'm just showing you a multi-class classification experiment here just as a demonstration of ml.net auto automo so i'm saying hey i want to create an auto machine learning experiment it's going to be a multi-class classification experiment here are the settings i want you to use um the most important one here is the max experiment time in seconds uh microsoft has some wonderful guidelines on how long you should test uh train your trainer data um this this size of a data set it recommends about 10 seconds which is why i showed you that earlier you can customize the optimization metric um choose what where to to cache the data either on disk or in memory uh you can also tell that you want to exclude certain training algorithms from consideration so very similar to what you can do in azure machine learning studio but this is just the ml.net version of that using c-sharp or f-sharp or whatever you want to write it in that supports net the next step here is this is the really critical one this is where we're actually going in and we are training the model so we have our experiment and we are now executing it we're telling it hey here's your training data here's your validation data set you don't have to provide a validation data set but it's much faster if you do because it's not trying to split and k-fold uh on your data so if you have a good data set you can omit this validation data or you can keep the validation data in place this one tends to perform a lot poorer if you provide both the trained data set and the validation data so that's an interesting little anomaly here you tell it what what name of column it should look for for the value it's trying to predict so here we're trying to predict the esrb rating column so that is this last column here if you don't specify that it's going to look for a column named label so i could have specified here a column name attribute and told it the name of this column is the label and then i wouldn't have had to specify the label column name here but i find that this is fairly uh it's it's nice to name the name explicitly name what we're trying to predict when we are uh executing our experiment here uh you can also optionally provided a progress handler this is a class that's just going to report the progress to the user so you can update a progress bar you can log something to the console this is a very simple one it just takes in the current run detail and it's going to just log out to the console what model was just trained how long it took to try to train that model and the accuracy it achieved you have a lot of different metrics available inside of validation metrics i'll let you explore those because they're going to be different based on the type of experiment you're running but keep in mind that your validation metrics may be null if ml.net automl says hey i want to just abort this training we've reached our time this is taking too long i need to stop this is a synchronous operation i do not believe there is an async version of execute available i may be wrong on that but i didn't see one but once we have that we have the experiment result and we can get out the information about the best performing version of that model uh this is the actual eye transform that we can use to uh to predict uh values in the future uh we can also store the schema which is useful if we're gonna be saving this uh this model later the schema comes from the the training and validation data uh finally you do have the validation metrics you can get on performance uh what i'm doing here is i'm just just getting the name of the algorithm that ran that was the best they all automatically best performing algorithm and i'm also generating a confusion table that's that little table we saw of that had that strong diagonal line to it that showed what things were being predicted as what values i'll return that as as to string uh and that's actually how we generate our our model and that model here is an eye transformer uh and we can take that i transformer and we can use that to uh to predict things that takes a long time to train but we can actually save it to this we'll take a look at that in a second but once we have it it's very quick to predict the value of the of our label column our esrb rating column in this case so the way we classify things is we are going in we're creating a prediction engine we so again we're using that our ml context we're calling it model create prediction engine we're giving it the type of input that's our the same type that we loaded up here into our training and validation sets and we're telling it what type of prediction it should come up with this is another plain old c-sharp class so it's a lot simpler you have a i hear we have an esrb rating and a score you do need this column name predicted label that seems to be what ml.net needs so i chose to keep i could have named this column here predicted label but i felt it was clear for my own code to use the esrb rating here instead of the column name but this float here this float array score this is this is for my multi-class classification result so if you're doing something different like a binary classification you might not use a score here but this is telling me hey what's the what are the what's the probability that this is any one of those given classes so e e t t m in this case and the largest possible one of those is going to be my the ones the most confident on so i'm actually giving us this convenience property here this auto property to get the confidence which is the largest possible score in that score array so we create this prediction engine we tell it what machine learning model we want again that's the eye transformer that we trained in our training set it can take in an input schema uh the schema for the the data set that you're using which is why i saved that to the schema variable before the skew of field it seems to not need that but if it likes that i'm going to give it to it because it could improve performance because it might not need to to parse things out i'm not entirely sure what i was doing with that schema but i had it so i decided to provide it to this method and once you have your engine you can then loop over a collection or just send it a single one call predict and it's going to generate whatever class of prediction you gave it in this case we're giving it an esrb prediction so this is going to have the predicted label as well as the probability scores for each given label i'm returning here a tuple you don't need to do this this just worked well for my application i'll show you my sample code in a second i have a couple methods here for loading and saving a model so if you have a uh if you want to load a model from disk it's just a zip file uh and you can call model contacts model load you tell it what path you want and you can also specify the schema so it's going to load the schema from that model as well but this is going to store your eye transformer which is your actual machine learning model as well as the schema save model very similar context model save you give the trained machine learning model and the schema schema here is optional you can provide null if you want but there is no overload that doesn't take in a schema but it's fine to provide at null but you do need to tell it where to save it from and that's going to actually take that model and basically zip it up on disk and save it in that file so that's 117 lines here to to run a simple machine learning algorithm again i don't need to know anything specifically about the individual machine learning algorithms that are being used under the hood that's what auto ml gets me it tries them all it finds the best performing ones it tunes hyper parameters for me and it spits out a good model for me the way this works is i have a program here that i'm just creating my predictor um i am then asking these what they would like to do if they tell if they'd like to train i'm going to call my training method which is going to just say hey how many seconds do you want to train um it's going to do some validation here and then it tells you we're training we go ahead and train passing in the two the two files that we're using the training set the validation set pass it in the number of seconds to train and then once it completes we actually generate out the uh the debugging information there if we were to choose to save or load we'd do cal handle save model handle load model which would then call those two methods the save and load on our transform prediction uh we're getting a list of games from a list of sample games i'll show you those in a second those are just plain plain old c sharp classes uh but then i'm going going into our to our our predictor our esrb rating predictor i'm calling the classify games method that's the one that we showed you earlier that had the uh predict method call and for every game it's going to uh it's going to give me back a prediction and the information about the game this is a c-sharp tuple and then we're using that uh to log the prediction as well as the the name of the game um just so we have some information about both sides of that you can't really add a game uh to any more information to these this class as far as properties or you start to encounter errors so that's why i'm using a tuple here um but structure your application in whatever movie makes sense you don't need to do a tuple here it just made the most sense for my application i could have created another class if i needed to and i'm just displaying the percentage of the confidence score here so a very simple application i encourage you to play around with it take a look at the documentation for ml.net there's a lot of cool stuff you can do with it it's really just scratching the surface there's also some no code solutions that'll actually generate some code for you as a starter code but even generate an asp.net application for you which is which is neat if you want just wanted to deploy an api um if you are really adverse to code instead of ml.net i would advise you to check out azure machine learning studio's automated ml features uh which actually have a lot more bells and whistles than the ml.net stuff here and they're built for somebody who doesn't necessarily understand code um but this is great for incorporating into existing or new dotnet applications regardless of whatever language or platform you want to deploy them on so let me know what you'd like me to cover next uh let me know what questions you have let me know what you've built with this because i'm always really interested in machine learning solutions um happy coding and have fun [Music] you
Info
Channel: Matt on Data Science
Views: 4,956
Rating: undefined out of 5
Keywords: .net core, AutoML, Azure, Data Science, MLNET, Machine Learning, automated machine learning, automatic machine learning, automl tutorial, data science 2022, data science for beginners, data science projects, introduction to machine learning, machine learning for beginners, machine learning model, machine learning project, machine learning projects, machine learning projects for beginners, machine learning tutorial, machine learning tutorial for beginners, ml.net tutorial
Id: LIaMHK5wrDE
Channel Id: undefined
Length: 17min 49sec (1069 seconds)
Published: Sun Dec 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.