Revolutionize Your Coding: C# Machine Learning Essentials with ML.NET!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
artificial intelligence chat GPT natural language processing machine learning the AI apocalypse is coming resistance is futile so if you can't beat them why not join [Music] them let's learn how to do machine learning in C but wait don't you have to be a data scientist to understand how to do machine learning thanks to ml net which has been around for some time machine learning in c is now more accessible than ever with the ability to do things like classification which is what we're going to look at today sentiment analysis which is where you can look at the positive and negative aspects of different pieces of content often used in things like customer reviews You can predict what things are inside an image so you can identify items within an image and lots of other use cases and things we take for granted today all run on framework works like ml.net but before we start don't forget to like And subscribe to this channel it helps me to bring you lots more Great Net content so this video aims to give you a basic introduction to machine learning inet and an introduction to ml.net using the use case of item classification classification allows us to take in a bunch of data and using a trained model identify specific categories that could exist Within that data classifying the data in my use case I'm going to be taking in a bunch of email subjects so I'm going to create a console app where we can type in an email subject and we're going to use our train model to decide which department in a business the email should go to based on the content so for example we may have an email subject of I didn't get paid um and that could go to either Finance or HR so not a good example good start new employee contract okay there's an example of something that could go through to HR or we could have um potential customer lead that could go to the sales department we could also have my printer doesn't work and I don't know why I still have a printer and that could go through to it we're going to start off by creating a model that ml.net can ingest and then we're going to train that model and tell it to Output a prediction of what department an incoming email subject should be assigned to and one important point I should make is that ml.net isn't designed to be just a standalone machine learning framework in that sense I mean that you just use it in isolation you can do that but what ml.net strives to do is make it so that you can integrate machine learning into existing applications so you know you might you might have a simple e-commerce application where you um like I said before you're using a review system you might have customers which are adding reviews for your products and you want to have something which is able to ingest that data as it goes and maybe predict sentiment so you'd be able to identify if somebody was particularly dissatisfied with their purchase and then maybe you could take some action based on that let's jump into visual studio and build a console application with a classification example so a fire. visual studio and I'm going to build me a console app uh and I'm going to call this my classification no sorry let's call it something better email subject classifier okay there we go so it's 2024 guys I'm going to use net 8 I'm going to create that console app and there we go so we've got our bug standard console application I'm going to get rid of that stuff and start from scratch so before we start writing any code we're going to need to have a model or a training set so it means we need some initial sample data some examples of some email subjects and some of the Departments that they would would correspond to ml.net supports the use of tsv files tsv being tab supported value files which is as it says on the tin basically just a bunch of columns separated by tabs so Microsoft has an example of this on their getting started tutorial which I can show you here and this is a raw example of um one of their training sets now this is for a different example this is where they're classifying GitHub issues but you can see here that we've got the colums so we've got ID area title description and they're just separated by tabs so we can just put this data into the file and we can use this as part of the machine learning process so we need to create an equivalent of that for email subjects and departments in the real world if this was going to go into production then we'd gather some existing emails we'd go into our mail server we'd get all the different emails and their subjects and we may even manually classify them so that we have our initial training set I ain't got time for that so I went to my good friend chat GPT I salute my future robot overlords and I asked it to create me a tab separated file with some typical email subjects that might go to departments such as it Finance sales and HR and this is what I got so it's a lot simpler than the example that I've just shown you from Microsoft but the principle is the same we've got a subject column and then a department column so here for example a subject of service outage notification would go to Department it if I scroll down a bit further we can see some more examples so profit and loss statement review goes to finance again customer satisfaction survey sales and job transfer request process HR so I'm just going to go to the project and add a new folder and I'm going to call it models and then I'm opening the folder in file explorer and I'm going to copy in that model subjects model and there we go and inside we can see we've got our tsv file one important point on this is that we want to set a property on this file as part of the solution now on the right hand side it might be difficult to see because the writing is a little bit small but where we've got copy to Output directory we want to set that to copy if newer by default it's set to do not copy but copy if newer will mean that if we did update it then we can make sure that the latest version is always in the solution cool so there's our training set but this is useless if we don't have ml.net installed so let's install that from newat I'm going to go to new get package manager manage new get packages for solution and then I'm going to search for microsoft. ml.net and it would help if I was actually pointing to newg get.org there we go and I'm going to take the latest stable version and in install that onto my project and at the very top I know I've got top level statements installed but just for clarity I'm going to put a using statement for microsoft. ML Zoom that in a bit okay so now we need to talk about loading the data set into memory so this is where we need to create a class which can represent each of the rows in our tsv file so I'm going to create a new class I'm going to keep it in this program.cs for now just to keep it simple so public class and I'm going to call this email subject then I'm going to put in properties for each of the columns in the tsv file so I'm going to create a public property called subject and then a public property called department on its own this is not enough to say simply load in the tsv file and have ml.net understand how to reference it we need to use some attributes for this so here above the subject I'm going to tell it to load column so this is part of the microsoft. ml. DAT namespace uh and because I'm not using that at the moment it's giving me a red squiggly so I'll just add that in so ml. data and then I can pass in the index of the column in the tsv file so this is just me mapping the subject property here to the subject column in my tsv file I'll do the same thing for Department which will be one now we need some Global variables so we need to be able to globally reference the location of our tsv file so I'm going to create a training file path this can just be a string uh I'm going to call it training file path and then I'm going to copy the location of this file so just pop that in there and there we go as part of this as well I'm going to be creating a new model from the training set so I need to specify a location to which I want that to be saved I want this to be saved in the same folder as the tsv file that I already have and it's going to get saved into a zip file when it's created so I'm going to copy again this path but at the end I'm going to create a file called model. zip so it's going to be a string and model file path is the name of the variable copy the path again but at the very end I'm going to say that it will be called model. zip we also need a global variable for our ml context now an ml context is very similar to what you would use in Entity framework for a DB context it's a context for the machine learning operations you want to perform so whenever you're performing machine learning in ml. net you're interacting with an ml context and so we only need one of those and we want it to be globally accessible so I'm going to create a variable of type ml context and I'm going to call it you guessed it ml context that can just stay there for now so now I've got a path for my training set I've got a path for my future model that I'll be creating and I've got a reference to an ml context now I'm going to initialize the ml context now so I'm going to say ml context equals new ml context but I can also pass in a seed which is a sort of classic machine learning terminology it's not anything I'm going to go too deep into it can get quite complex but we're seeding that context with some random data in order to create a random seed we can say for the seed and passing in zero so now we're ready to actually load the data into memory so that we can manipulate it we can pre-process it and turn it into a model through training uh and then obviously then do a prediction now now there's a specific type in ml.net that allows us to visualize and manipulate the data from the tsv file so it's a type of ID data view so I'm going to create that here so I data View and I'm going to call it uh test or training data View and I can initialize that now using ml context which has a function called load from text file so I'm going to be setting training data view equals I'm referencing my ml Contex text. dat. load from text file and then I put in a type which will be email subject so it's loading it in and it's going to project that into a type or a class which is as we created email subject so then it expects us to supply the path which is easy we can pass in the training file path that we created at the top and then it allows us to specify some options now one of the options that I like to use is has header it's by default default false I'm going to set it to true so that we can say this has column headers inside it so so far we've done most of the easy work we've established our ml context we've said where the paths are for various different helper files and we've loaded the data into uh a i data view so that it can be pre-processed so now it's time to do that pre-processing what the pre-processing is is a means of creating what we call a pipeline that can later be used to predict against when we've got incoming data so that will involve extracting the data from the data view that we've created and then transforming it in such a way that it can be used as part of this machine learning process so in order to do that there's several things we need to do but I'm going to put them into one method called process data now as I said we're going to be returning a pipeline and this pipeline has a specific type it's of type I estimator of it Transformer doesn't sound very straightforward but trust me the documentation is really good at explaining this stuff we know that's the typee we want to return so that we can use that as our pipeline so I estimator of type I Transformer is what we're going to create and the function we're creating is process data so we're simply going to create our pipeline to be returned and this involves as app pending lots of different values together so the first thing we want to add to the pipeline is a map to the actual value we want to Output so in terms of our email subject class the thing we want to Output is the department but the machine learning pipeline will know this as a label so we need to map Department to label so first of all I'm going to create a variable for my Pipeline and then I'm going to say again on our ml context so it's in transforms. conversion so we're going to be converting using map key to Value using map value to key so we're going to be mapping the department value to a key called label because we're labeling these email subjects with a department so the input column is called Department as per our tsv file and the output column name will be label so we don't close that off yet because we've got a few things to append to this pipeline stick with me this can get a little bit complex trust me it's going to be fine the next thing we're going to do is we're going to append some featurization so this is where a little bit of knowledge of machine learning helps but it's not too complex it's stuff you can stay high level on but featurization is essentially a means of transforming text into numerical vectors so we're simply representing this text the values of the text in the tsv file as a number which means that it's easier for a machine learning pipeline to look at the differences between those values because it's just looking at numerical values or vectors if you want to get really deep into this you can look in into Vector databases it's a big part of data science and a big part of artificial intelligence in general but because we're using ml.net for the most part we just want to stay high level so for the purposes of this example it's just featurization being converting our text into a numerical value so we append this functionality to our Pipeline and to do the featurization for the column s we again go into transforms and this time we go into text and we call featuri text and similar to the last we have input column name so we want to featuri subject and that will be outputed into column of our choosing we can call it email subject featurization values into the features category for the overall pipeline that probably doesn't make much sense but it might make more sense once I've written it it's probably something that would make more sense if you had more columns but really I'm only using one column in the featurization the subject but think of it as I'm creating a features object that comprises all of the featurization even though I've only got one so again ml context do transforms. concatenate so this is concatenating one or more input columns into a new output column we only have one but that's fine we can still do it so it's going into features and the columns are all the feature rzed columns which I only have one so email subject featuri now we could close that off but one thing that Microsoft does advise is that we append a cache checkpoint passing in our ml context apparently that's supposed to help performance honestly I don't understand how that is whether it really applies to my use case but the documentation does advise it except if you're using very high volume data sets or large data sets that's where it could actually negatively impact performance so for a small data set like ours it makes sense to just put it on there's a caching system somehow improves the performance so that was a big one now that we've done that we can then just return our pipeline so now going back up into our what would be our main method we can actually call this now and say I'm going to create my pipeline which is the result of process data okay so at this point now we've got our pre-processing done we're ready to actually train the model and build it so that we have that models. zip file in our directory again stick with me it might become more clear as you see me write the code but this next function is very similar to what we've just done with the append that we've just done for the pipeline we're going to create a function called build and train model which will take care of everything that we want to do for the training so again this also returns an i estimator of type I Transformer and we're calling it build and train model now this time we're going to pass in a few things that we've already initialized so we're going to pass in our data view so that's the visualization of the data that pulled in from the tsv file and we're going to pass in our Pipeline and those two things are going to be used to build and train the model so I data view training data View and I estimator of type I Transformer pipeline so those are two things we've already created in that first part so again another pipeline we're going to create called training Pipeline and this is where we're going to append into that first pipeline we created so this is where we specify what kind of classifier we're going to use so there are two types of classification that we can do we can do what's called binary classification where we say that something is either this or it isn't it's one thing we classify it as that and it can only be that if I say that a giraffe is an animal it can only be an an animal it can't be a mammal as well the opposite of that would be what we call multiclass classification multiclass classification ation it's hard to say but it also means that one thing could be one of several things so we could say that the department could be it but it could also be Finance for example we want to make sure that we use multiclass classification for this use case so we do this by saying ml context. multiclass classification we can access the trainers and we're going to use this is the algorithm that we're going to use which is sdca maximum entropy SDC CA maximum entropy if we look at the intellisense for this it's expecting a label column so we're going to pass in the one that we actually created before and also the features column which is what we concatenated after featurization and what I want to append is uh the inverse of this map value to key I actually want to do map key to Value I'm going to call that key predicted label so that basically allows us to map the result into column called predicted label you'll see why that makes sense later on so ml context. transforms. conversion. map key to Value predicted label so it feels like we've created about a billion pipelines but this is the pipeline that we're now going to use to actually train so to train in ml.net we can use the fit method the fit method is part of the training pipeline that we've just created and that expects us to pass in the data which is represented using that data view the I data View and this will return the trained model so we're going to want to keep this in a global context just so that we can reference it elsewhere if we need to so the trained model is of type it Transformer so I'm going to create an IT Transformer and call it model that then means that I can initialize that here and say that's equal to the value of training pipeline. fit on that data training data View and then I can return that pip the training pipeline so I've still got so we've got our model now now we've trained the model but it's only in memory I want to keep that model persisted somewhere so once it's been trained it's actually been written to the disk so that I can reference it elsewhere if I need to so for that I'm going to create a method called save model as file so I'll do that here say void save method as file and ml context has a functionality for this already so I can say ml context do model. saave so then it will ask me for this specific model which we've just initialized globally underscore model it wants the input schema now this is basically related to the data view that we've used the the view of the tsv file so training data view. schema so that's the layout of that view and then stream would be the place that we want to actually persist it so I'm just going to reference the path that we're saving to so calling that should allow us to then take that train model and persist it on dis now there's one more class that we need to create uh we've already created a class to represent each row of the training set we haven't created one to represent the output so I'm going to create a class underneath the email subject class I'm going to call it Department prediction so this will be the predicted department so it's going to have just one property I'm going to make it nullable and it's going to be called department and hopefully you've remembered that I created a column called predicted label and I said that it would make sense later on well this is it this is where we use an annotation or an an attribute to specify that this is column name predicted label so we told it that the output should be mapped to the key predicted label which will then map it into the Department property of our department prediction class because we've now got the email subject class for the input and the department prediction class for the output we can create create what's called a prediction engine that will allow us to actually do the prediction based on our train model so I'm going to put this up here and I'm going to say this is prediction engine of type and you can see we've got a t source and a t destination so I hopefully this is starting to make sense around why we featurization which will be our prediction so email subject is our source and Department prediction is our destination and then I'm going just going to call this prediction engine so here's the good part then we've done all the prep we've got all the data pre-processed we've created our ml context our training sets our prediction engine now it's time to actually write the code which will do the prediction and return the result I want to return the result as a string simply the department string so I'm going to create a function which returns a string and I'm going to call it predict department for subject line and that will take in a subject line parameter so now the first thing we do is we load the model that we'd trained and saved to disk so I can say VAR model equals context. model. load we reference the model path and this requires us to specif by an out parameter so I'll create a new variable called Model input schema we then create a new instance of our email subject class again to represent the incoming email subject so I'll create V email subject equals new email subject uh we don't have a Constructor on it for it so I'm just going to say uh subject equals subject line we don't need to specify the department because obviously we don't know the department we're just going to say say the subject is the incoming subject line then we can initialize our prediction engine so we can set prediction engine equals ml context again most things come from the ml context model. create prediction engine of type email subject Department prediction and then that requires us to pass in our model then we can get the result so we'll say VAR result equals prediction engine do predict passing in email subject the class and of course that creates a department prediction type object so we can return result. department so we created the prediction code two things I've missed uh after process data I showed you how to create the methods for building and training the model and then saving the model to disk but I never called them so let's do that so our training pipeline equals build and train model passing the training data View and that pipeline from process data then I'll save save model looks like I've created it as save method as file okay my bad save model that's file and then let's just give this a quick test so bar result equals predict department for subject line and I'm going to say the subject line is new invoice and then I'll put a console. read line there so it keeps going and then I'll breakpoint it here and we'll step through it so we're creating our ml context loading the training data creating the pipeline creating the training pipeline so we've trained the data that's why it took a little bit longer save it as a file if I look in solution Explorer we can see I have model. zip now where I didn't have it before and then I'll do my prediction look at the result and it's saying Finance which is what I would expect so let's make this a little bit more interactive let's make it so that we can type in an email subject and get the result so I'm going to create a variable called keep running set it to true then I'm going to Output some stuff to the console so that the user can understand what's happening enter subject lines to predict and then we want to be able to exit it so I'm going to say type quit to close the app so then we're going to use a while loop to keep this going so I want the application to keep running and so that you can keep typing in email subjects and have them predicted until you quit so while keep running I'm going to say we've got to we're going to capture the subject line so that will be equal to console. readline if the person's typed in the word quit in upper case then we'll say keep running equals false I need to make that equality there we go else we're going to write out the prediction so console. right line will output the result of predict department for subject line and then what passing in whatever they typed in so let's take a look at that and see what we get I'm going to get rid of that break point Okay so we've got enter subject lines to predict type quit to close the app so printer not working let's come back as it I'll just Zoom that in um new employee contract to be signed HR so so far it's looking pretty good customer uh report see what that gives us sales okay good uh market research results sales okay so this is looking pretty good I'm sure we could break this somehow let's see printer for new employee HR so I mean at the moment because we've got such a small data set there's still room for crossover and the model is obviously quite opinionated towards HR for that one so that's an example of something that would need further training to make the accuracy better so you could take this to the next level and say okay I'm going to build in something which could ingest new subject lines and add those subject lines to the model to the tsv file then I retrain it and do the process again there's lots of different things you can do to take this for forward but I think this has created the basis for any classification project that you would build using ml.net that was pretty full on the documentation is really good I would definitely encourage you to research more on learn. microsoft.com if you do have any questions then leave a comment below don't forget to like And subscribe to the channel for more great content like this and until next time stay safe in that artificial intelligent saturated world
Info
Channel: Nick Proud
Views: 9,096
Rating: undefined out of 5
Keywords: .NET Development, AI Development, AI in C#, Algorithm, C#, C# Tutorial, Code, Coding, Data Science, Data Training, Developer Tools, Email Classification, IT, ML Model, ML.NET, Machine Learning, Predictive Analytics, Programming, Software Engineering, Software Engineers, Tech Education, Tech Tutorial, .net, dotnet, visual studio, artificial intelligence, ai, deep learning, neural network, csharp, c#
Id: R3kRf7hNVMg
Channel Id: undefined
Length: 31min 2sec (1862 seconds)
Published: Fri Jan 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.