ML.NET: Machine learning from data to production in less than 30 minutes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we have briekman and luis cantania and i think we can hear them they're going to be dialing in remotely and they're going to be talking to us today about ml.net machine learning from data to production in less than 30 minutes bree and the stage is yours awesome done so yeah so our talk is ml.net machine learning from data to production in less than 30 minutes hopefully my name is brie achman i'm the pm for ml.net here at microsoft i'll let louise introduce himself hey folks uh happy to join you uh my name is luis and i'm a content developer working on docs for email.net diner for spark and azure ml so i'm happy to be here great so uh first we'll get into some slides for some of you who may be new to ml.net um and then we'll get into an awesome demo and i have control right now so let's see okay oh and here's the agenda which i just kind of said i will go over what it is why and when you'd use ml.net what does the ml workflow look like what are some challenges bringing a machine learning model to production and then we'll show our demo where we do go from kind of a business problem to data to production in a maui application and then go over the roadmap and what's coming up so first off what is ml.net it's an open source and cross-platform machine learning framework for net so a bit of the history it started off as an internal framework actually um microsoft research uh project for text mining and search and over 10 years it evolved into this internal framework called the learning code and then at build of 2018 we released the first external version rebranded to ml.net made it a bit friendlier open source and cross-platform and that's how we came to ml.net um and so a little bit more about it if my arrow keys will work here louise i need you to do the slides right cool um so when would you use ml.net or why would you use it it allows you to stay in the dot-net ecosystem to build train consume and deploy custom machine learning models so this means if you already know c sharp you already know f sharp you want to stay in visual studio bs code whatever it is you can stay in the dot-net ecosystem to build these custom machine learning models with ml.net because ml.net is dot-net another great thing is that there's no data science or machine learning experience needed to get started um the tooling that we offer which we'll show you in our demo really makes it really really easy to train a machine learning model and then also because it is.net again it's easy to go from kind of that experimentation or training process to production as long as it's a dot-net application um or you can deploy it right so ml.net you can deploy anywhere you can deploy.net essentially and we're going to show that to you in a maui application so here's a bit more into machine learning in the machine learning workflow and kind of the steps that we're going to take in our demo so the first thing is machine learning is all about data you need data for training a model the data that you give to the algorithm to get this model to predict on new data so there's a lot of different ways that you can collect that data a lot of different ways that you can store that data a lot of considerations you have to take make sure you know privacy make sure you're ethically legally able to get this data and use this data and so on the next step in this workflow is preparing the data so the data has to be in a specific form to input into an algorithm so usually that's some sort of numeric form so say you're getting in um text you would need to find some way to turn that into numeric vectors so usually with some sort of data transforms um there's other things that you might need to do to the data such as um getting rid of um missing missing rows or missing data and so on so there's a lot of different things that you can do to prepare your data and before the training process once you have the data prepped and ready to go this is when training happens this is when you fit a model to your data so you choose an algorithm you tune algorithm options or hyper parameters in order to find those patterns in the data to then get your model your trained model and it doesn't end there once your model is trained or in that experimentation process of training your model um you have to evaluate your model and see how it's performing depending on the scenario that you're using um you know whether that's classification some sort of regression or value prediction there's going to be different types of evaluation metrics that you'll look at in order to um to get the to base the performance of your model and so for instance classification you look at accuracy and so on this process is very iterative so you may find that the you train a model and you're not really happy with the accuracy um so then you would have to go back and try and improve that model which there's a lot of different ways to do that you can add more relevant data you can try adding more columns and more relevant features so for instance if you are trying to predict uh the house of a price right based on different characteristics of the house if you tried to do a number of rooms number of bathrooms and location that might give you a pretty good model but there might be other factors such as is there a pool is there air conditioning and so on that would affect the the model and so if you added in those you might get a better performing model so this is again that iterative process you you try out different things you experiment until you have a model that you're happy with once you have that model that's when you can deploy the model as i mentioned is with ml.net you can deploy it as any net application so if you wanted to deploy it as part of a desktop offline application you could do that wpf or winforms um you could even just run it as a console app which is what a lot of people do and they'll do the well you would train in the consulate and then you would deploy it somewhere else but some people do deploy just as that console app and you can also deploy it as a web app you can deploy it as a web api azure function and so on there's a lot of different ways that you can deploy an nml.net model and then comes inferencing this just means using the model to make predictions so you have your model deployed you then want to use that to make predictions on new data so these are kind of split up into two sections you have the model training section and the model consumption section and there's kind of this big question this is problem in the industry right now in the machine learning world as a whole it's how do you go from model training to model consumption um if you've heard of interactive notebooks a lot of data scientists work in kind of these interactive notebooks or jupyter notebooks and then it's like they hand that off to a developer and it's like okay well what do i do with this um or how do i take this this thing this model and deploy it and consume it and put it into production um and so i have a bit more about that so again here's a few of the challenges of bringing machine learning models to production um just some statistics from various um enterprise type surveys on machine learning and data science even though machine learning budgets are on the rise i mean you can see tens of billions of dollars are spent on machine learning products um in this this period that they they kind of did the survey for only 22 percent of companies using machine learning were able to successfully deploy a machine learning model into production additionally it takes most companies on average between 31 and 90 days to deploy a machine learning model and then um there's a few reasons for this right so if you have just code it's kind of easy to implement that into your devops pipeline get that into production iterate on that and so on um but machine learning is code and data so you have your data you have your model you have different versions of that model if you train it once you deploy it you may have to retrain it and so on so it makes it a bit more complicated and the scale like if there's a few different areas here in scale but for instance the amount of data that you have or the size of your model um how many predictions you have to make and how quickly you have to make those predictions and then there's version control so that's going to be a bit different for uh models and data versus code versions and then model reproducibility and there's also model explainability and responsible ai that kind of ties into that as well so these like i said this is kind of an industry-wide thing that people are trying to solve is bring how to easily bring machine learning models from that experimentation process to production um and in our demo we want to show you how easy that is with ml.net and net so with that um we'll switch over to visual studio louise will help me switch over to the visual studio and i guess the title of our talk was a little misleading because we're actually going to train and do all this in 20 minutes but yeah so let's go ahead and show our application i'm going to talk about the business problem that we want to solve uh with machine learning so we've got this music application and we want to use machine learning to predict missing notes so you might notice um let's let's select a song we're going to call it or we're going to select a song that's called uh i think it's yeah damage it's a damaged song and you might be able to see um it's a little bit small but you might be able to see there's these kind of dotted lines so that's where we have missing notes and we want to train a machine learning model to predict what notes are supposed to be there so let's go ahead and play this so you can can hear it and see what i mean by a damaged song [Music] so yeah so it's a damaged song we have some missing notes and where those missing notes are we just put you know smashing the keyboard massaging the piano a different type of keyboard um just to show you where those missing notes are um and so so yeah we want to train a machine learning model to be able to predict those notes and put in those notes and then have a real song that sounds better so i'm going to switch over to visual studio and i think we can stop this for now let's see if there we go so we're going to stop it um and what i'm going to do we have here in the solution explorer you can see there's quite a few things this manufacturer is um it's a library for being able to visualize the notes and put it all out nicely so we're going to skip over those but then this test music viewer is our maui application so if i scroll down you can see we have quite a few things which louise will go over but i just want to do the machine learning part like i want to be able to hit you might have seen in the application this repair button and be able to repair the song with machine learning so i'm going to right click on my project and i'm going to add a machine learning model and so this will open up the add new item dialog some of you might have seen model builder before in visual studio if not it actually just g8 and vs 2022 which is awesome and so i'm going to call this a note predictor and so this is going to add this mb config file to my project um the mb config file is really just a json file that keeps track of state of the ui which will uh come up in a second here and else yeah so you can see here that mvconfig and so i don't have a model just yet i haven't trained anything i just have this md config that has the state of the ui and in this case i want to be able to predict missing notes so i'm trying to classify as this note c d e f whatever the note may be in the range of possible notes so that's going to give that's going to be a data classification scenario there are a few other scenarios that you can choose based on your business problem um which again is one of the first steps in that workflow before you even collect your data and louise if you could click on the data classification for me thank you um in this case there's a few different options depending on the scenario that you have for training we're just going to train locally on in this case louise's cpu um and go on to the we'll see if it lets me louise you may need to step through the ui for me thank you uh you have two options you can choose from a file or from sql server we're going to choose our data set from a file and in this case we just have this text or i think it's a csv file yep corals modified and once that loads in we'll get a preview of what the data looks like so you may notice it looks kind of weird right now we have a header in this data set and we're going to go through the data set in a second but it didn't actually catch that in model builder so it's really cool um if you haven't used model builder recently or haven't used it yet that we have these advanced data options that if it happens to be parsing correctly you can fix up that data formatting so in this case we want to indicate yes we have a data header the column separator and decimal separator are all good and once we save that it will update our data preview in the ui and so this is the first 10 rows of our data set we have about 4.6 k rows and 24 columns and so this coral this first column here this just indicates the song so all of these in rows 1 through 10 are a part of song number one key indicates if there's uh sharps or flats so it's values one through four and then we've got measures measures are kind of just chunks of music that are separated kind of in the score so in this case the first two here are part of measure one or that first score uh sorry that first section um these next three notes are part of measure two so that's the second section and so on and there doesn't have to be the same number of notes in each measure as you can see this next row is going to be really important it's the note so you can see we've got g g d b and so on it's the note letter this is what we want to predict so right now we have it labeled because this is the training data we're going to use to train the the model but this is going to be our label right so what we're going to predict once we have our model so i'm going to go ahead and hit uh or i'm going to have louise hit column to predict to be the note and so looking at the other ones 60 61 62 and so on so these are actually representing notes as well but in a numeric format like a midi numeric format and the zero or one is kind of a boolean here so zero means that this note was seen with this other note in the same measure so in this case 67 is indicating a g so you can see in measure one uh there's another g here so we're going to have one as in yes it is this note g indicated by 67 is in the same measure if we go to the next one d you can see this is measure two we've got um 69 and 71 which represent a and b so if we go down here in the data set a and b are found in that same measure so showing us like essentially when we're going to train the model it's going to say hey like this note sounds good together in the same measure with these notes so when we use that model on unknown data it will choose a a note that will sound best with the given notes in the same measure so that's how kind of the logic is going to work so i have my data i prepped it i'm ready to go let's let's train it for let's say 20 seconds and start training i don't know why now i'm able to click in the ui but hey so as soon as i hit start training you can see in the output window what's happening is model builder is iterating through different models so different algorithms different algorithm options it's using this thing called automated machine learning to kind of automate it that process of training and evaluating and so on that we saw in the workflow what's really cool is if i had stopped training before the 20 seconds were up i would be able to use that model uh the the best performing model so far where previously in model builder you would have to start over if you didn't have if you wanted to cancel the time beforehand so in this case in the 20 seconds i gave it it found a model with 48 accuracy using this fast tree ova algorithm and was able to explore 17 models total um and so you can see here it generated some code behind files under the mb config which are over here and i'll try my best to navigate so um there's a few files that it generated here you've got this dot zip file notepredictor.zip that is your actual file it is a zero serialized file that is yourml.net model so this is what we're going to use and louise we'll show you in a bit we've also got this training.cs which um contains the actual pipeline code so i can actually click into it to show you real quick it will show you what was the what was the um algorithm that was chosen what was the uh data transformations that were chosen so you can see here it kind of shows this map value to key it shows a few different algorithms so there's that ova algorithm here and in this case you can see there were actually algorithm options that it iterated through as well so number of leaves for instance was set to 18 33 and so on so in that process of training it was trying out not only different algorithms but different parameters or hyper parameters of the of the algorithm the next thing um that is created is this consumption file and this creates all the consumption code necessary to use the model um you could write that yourself but we just made it really easy by generating this file for you and you just call predict method but with that i'm going to pass it off to lily's to show you what we actually do with this model now to integrate it into our application and bring it to production awesome thanks for yeah so you just did basically the hardest part i just need to bring this model into my application uh place it somewhere that i can access it and then go ahead and um you know just use it to make predictions and repair our songs so uh before i get started though let me start up the application because it might take a little bit uh and we're still working with emulators i think i saw the windows subsystem for android earlier demoed uh but in this case we're using an emulator uh if you hadn't noticed yet so we're gonna give that a little bit of time now what we're gonna do here is as we mentioned we have this no predictor.zip file which is our serialized version of our model and what we can do is we can just copy that over to our resources folder inside of our uh value application and you can see there's a no predictor that's it right and then we just want to make sure that we uh copy it in terms in our build actions we just use that or in the build process uh we say that this is going to be in a better resource right that we're gonna it's gonna be packaged with our application and we're gonna be able to access it um so at that point though the next thing that we wanna do is we wanna have some way to load this model in so that we can then go ahead and make predictions with this model and to do that we have this music repair service which as the name sort of implies its job is to repair the songs using the model so the main thing we're going to do is we're going to load the model given some sort of model path which is going to be the the model that we have in our resources folder um and then once that's loaded uh we're going to go ahead and you know call this repair method now this repair is going to take in the measure so as we mentioned right those measures were sort of the different parts of the song that were that were shown and we're going to just pass in the measures and then we're just going to go ahead and use the model we're going to create a prediction engine prediction engine is a convenience method for making predictions on a single instance of data just get rid of that for a second and let me get this too just to make it easier to see so we're going to prediction engine that get prediction engine again is just create prediction engine and again that's just going to help us make predictions and then what we do is we iterate through each of the measures and then what we do is we try to find which nodes in those measures are damaged or missing and the way that that's sort of represented is if there's a zero if the the number of the note is zero so remember the numbers i think was 60 through 79 if that number is zero in our measures then it means that the note is missing once we've identified those notes that are missing we then go ahead and take a look at the known nodes so the notes that are also part of that measure that are not missing right and then now because if you if you recall the notes the no column was actually a it was a uh it was a string right so we're going to need to go ahead and build the feature or transform our data so that it's in a format in this model input format right and again that model input is just going to have the coral column the key as well as those 60 through 79 columns right so once we go ahead and build the features we're just going to use the prediction engine and call the predict method using that input and we're going to get a prediction out so uh no a string right at that point in order to represent that in the ui we need to convert that note the string value back into a number as well so we're going to take that prediction convert it to a number and then we're going to assign the new value of that uh to our node column here and then we're going to set this flags to say hey ui this node is repaired again this is all ui driven the main thing that you need to focus on is the predict aspect of it now if you had built maui or xamarin applications in the past with ml.net right maybe the way that you did this was using some sort of web api because at the you know as of a couple of months ago uh ml.net did not support running on armed devices i think the thing that you really want that i want to stress here and sort of you know hopefully it becomes evident is this is all running inside of maui this is all running natively on the device right this is not you know we're not calling out to some other service this on an armed device right so that so that's super important um okay so i think by now our application should be up and running so we're going to go ahead and select the song all right and there's damaged and again there's those uh sort of missing notes that are represented by those dashed lines we go ahead and click play you're going to hear that they're sort of that missing note let's go ahead and stop that and when we hit repair what's gonna happen is again there's that sort of event handler uh we're gonna go ahead and call the repair method pass in the measures for the song that's been selected in this case the damage song um and then we're just gonna you know use the model to to repair this song and then there's other ui things that happen to you know just render it make it look pretty so we're gonna hit continue and at that point you see that now the notes are filled in [Music] and at the same time there's none of that uh you know jarring sort of uh you know uh chord that gets played let's go ahead and do that same thing on a very damaged song so let's go to very damaged here and you see it changed it's a brand new song and this one's got a lot of missing notes pretty much every other note is missing so let's go ahead and stop that hit repair again we're gonna hit that break point we're gonna repair our song it's gonna display nicely in the ui [Music] and uh that's that so you can see now that we used this model that we just trained in 20 seconds uh we copied it over to inside of our resources folder just like any other file any other resource inside of our malware application and then we just loaded that model and used some of the built-in functionality like the prediction engine to make a prediction and repair these nodes uh so the hardest part was making maybe making the conversions there but again right you know in terms of creating a model as well as deploying it inside of your application embedding your application was super easy and we still have five minutes left here so uh maybe we can switch back to slides here and finish this off so that's really in 10 minutes right yeah we're beating our own record so let me switch back to the slides here okay all right um yeah so a little bit about what's coming up um some things that just happened i talked about model builders now ga and vs 2022 which is amazing um we also have ml.net is now on the dot net release schedule before it's kind of doing its own thing but we want to make sure people know that ml.net is a part of.net it's here to stay and one of the ways we're doing that is it's now on the dot net release schedule so ml.net 1.7 just released and then um if you've been keeping track or ever use the pfi api it's used for model explainability um we we heard your concerns and saw how you know how hard it was to use and we actually made some simplifications to it and improvements which make it a lot easier to use um if you go to the next one coming up in november so we actually have uh the uh what is it ml.net virtual community or hackathon community hackathon i can never remember the order there's a lot we'll just simplify it to virtual ml.net hackathon that's coming up right after net conf um it's just meant to be a really fun thing you know it's you know meet other people from the ml.net community um try out ml.net for the first time last year's winner he had never used ml.net before and tried it out during the hackathon and ended up winning um and you can try out really cool demos and projects like the one that we worked on for this uh top um we're also adding another scenario to model builder we're adding time series forecasting um we're you know we're trying to get all the scenarios that are part of ml.net that you can build with the api and choose the algorithms yourselves we want to make sure we also cover those in tooling with automl you go next um coming up um later on so we just made a lot of improvements to automl um in collaboration with some microsoft research teams using flammel and nni so we want to make sure that you know our tooling is up to date now but we need to make sure that the automl api also uses that implementation so we'll be working on that working on adding another scenario to model builder which is anomaly detection and then if you haven't heard we're working on our deep learning story so soon you'll be able to author neural networks from scratching.net but as part of that we're also as part of our deep learning kind of plan we're making onyx consumption easier using the ml.net api i think that was the last thing i had oh and beyond so like i said we'll be implementing the rest of the deep learning plan which will include more transfer learning scenarios being able to author neural networks from scratch or build neural networks from scratching.net and then making sure that we have a great story for data prep wrangling and exploration and then we have a roadmap of public roadmap which was very recently updated i have i put the the link there um and yeah i think that's it make sure you know we're really excited about the hackathon hackathon um so make sure that you all tune in and we can maybe switch over to show what the repo looks like you really just have to um you join a discord and then you just put what you want your project to be or join another project um and yeah there's a ton of you can really do anything any project that you want so um yeah without i think we have a few minutes left i'm wondering if we have any questions free louise absolutely awesome ml.net demos we do have some questions for you there was a lot of interest were y'all using live share um what what was the cursor icon they were collaborating okay yeah actually working on the demo we were using live share in this case i just we were just using the teams like request control type thing yeah although that would have added another you know that would have been cool but we were we were already you know try it yeah we were already nervous about our demo so the cursor is no yeah right the cursors that were sharing your icons next to each other on the same screen that was um teams i guess yes that was oh very cool i think everyone was kind of getting envy on integration and team sharing it was pretty cool okay yes absolutely so we do have some um questions that we got on twitter um one of them they were really interested in sort of the instrument demo that you had so one question was um where do you start for predictive modeling where where there may be more complicated scenarios so what if you had multiple sheets with different instruments you want to take that louise yeah that's a that's a good question um i think that you might be able to so you know how you had the measures and you had the notes perhaps maybe adding another column where you maybe had uh you know something that indicated like hey this is the instrument that this particular no in this measure uh in this song belongs to right so maybe it might be as simple as adding another column i will sort of call out that i am not a music expert uh that's why you saw me just clicking the buttons um because yeah i'm not a music expert i couldn't say anything more beyond that but if i were to model it that's probably the first thing that i would try you could also have multiple models per instrument i mean it might not be as efficient but you know there's a few different ways i think you could solve it i like louise's approach though of having a kind of a column feature that's like instrument um i think that would work maybe something we can add on to it another thing that we wanted to do was train the model on different genres of music and then see how that affected the song so that might be something we work on as well and add to the demo or add to the repo that would be very cool another question i'll throw your way is ml.net supporting the python machine learning library and tensorflow do we have some integration there yeah so the integration that we have right now currently there's two so if you train a model in tensorflow you can consume that tensorflow model in ml.net um you can also consume onyx models within ml.net and the other kind of tensorflow integration is ml.net has an image classification api so it allows you to train custom image classification models using a process called transfer learning and that's built on top of tensorflow.net tensorflow.net is a psi sharp open source scishart project um that is dot-net bindings on tensorflow so there's no dependencies on python or anything but it is kind of a those bindings on top of tensorflow so that's and so a part of our product a part of ml.net um the image app uh classification api is built on top of that oh so consumption and training side there's some integrations there yeah all right well thank you both so much breen louise love hearing from the ml.net team
Info
Channel: dotnet
Views: 10,311
Rating: undefined out of 5
Keywords: .NET
Id: C-lnYdAR9UI
Channel Id: undefined
Length: 29min 59sec (1799 seconds)
Published: Thu Nov 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.