>> Hey, we're here with Bree, and we're going to learn about
machine learning with .NET. [MUSIC] Hey, welcome to
another episode ON.NET. Today, we're going to
be talking about ML.NET with Bree who's one of the
people working on that. So how about you introduce yourself, then we'll get into the topic? >> Sure. I'm Bree, I work on the.NET team. Currently, focusing all my time on ML.NET for about the past six months. ML.NET was actually released at
last year's Build and this year, we finally went into 1.0. >> Yeah. It's always
fun working on that 1.0 release because there's usually all these foundational decisions, some of which will never get undone, that's the one we're going to
go with for the rest of time. So I imagine, there was some of that that went on leading up to 1.0? >> Definitely, yeah. A lot
of stabilization in the API. We also added a really cool
new tools that are in preview, that I'll talk about
a little bit later. >> Can you talk a tiny bit about, I know there were a bunch of
different teams that were involved, was there a bunch of give-and-take or was everyone moving
in the same direction? Or a little bit of both? >> A little bit of both definitely. I think different teams have their different opinions
and especially with all the different offerings, we have at Microsoft for AI and ML, it can can get a little bit
difficult sometimes, but in the end, it really came together into a really great product and
worked really well together. >> Awesome. So what do you
want to talk about first? >> I guess I should explain
a little bit about what ML.NET is. >> Good idea. >> Basically, it's just a free open-source cross-platform
machine learning framework for.Net developers. Our goal is really to
make.Net great for machine learning and a lot of
different types of scenarios. >> I was going to ask
a question first, >> Right. Go ahead. >> Which is, so there are these
other machine learning things. So I know there's TensorFlow, there's a bunch of
stuff in Python Space, you might say some of their names. Then there's this onyx thing. Then there's probably
other machine learning AI things that even Microsoft
makes available. So before we get into
the details of ML.NET, where would you say it sits? Is it competitive with all of
those things or complimentary? >> I wouldn't say it's competitive. What makes it unique is that it brings machine learning
into the.NET ecosystem. So existing.NET
developers can then use their existing C# and F# skills to integrate machine learning
into their.NET applications. So they don't have to go and
learn a new language in order to do this thing with
machine learning, which before, they didn't have to. A lot of times, when they
had to do that and move to other languages are tech stacks, it would be difficult
to then integrate that back into
their.NET applications. >> One more question related
to that hesitant though, we're going to go into the winds, but there's also this
TensorFlowSharp thing. One can describe that
as bringing machine learning into the.NET ecosystem, but I'm somehow guessing
that TensorFlowSharp and ML.NET are not exactly the same. >> They're not. Well,
especially because ML.NET is being worked on by Microsoft. So TensorFlow is actually
for deep-learning, whereas ML.NET right now supports classical machine
learning scenarios. That would be things like sentiment
analysis, price prediction, and deep learning gets
a little bit more into image classification,
object detection. So what TensorFlow on Onyx do is
they were able to extend them all of them and use those in
order to add those scenarios. >> What I specifically
meant was TensorFlowsSharp. So there's this library, that is a C# binding
against TensorFlow, but your answer is probably
still fine for that. Sounds good. How about
we look at your table? >> Sure. So I've already talked about how this is
built for.NET developers. So you stay in the.NET
ecosystem, but not only that. You don't really need existing
knowledge of machine learning, and I'll get into that
a little bit later, but we have these
really cool tools that abstract away the data
science from it. So I'll get into that later. It also just makes it really easy to create
custom machine learning models. So right now, if for instance, with cognitive services
or things like that, they're pre-built models
that you use on your data or you used
to make predictions. So what ML.NET you can do is make your own custom model
with your own data. So it makes it more
specific for your scenario. Right. I already talked
a little bit about how it's extendable with TensorFlow and Onyx, and things like that from
even more scenarios. What's really cool is that ML.NET
is not actually just a year old. It was actually used in the company, started by Microsoft research
for the past eight years. It's used internally, a lot of huge products Power BI,
Key Influencers, Outlook, Meeting Insights
being suggested search, and the list goes on. So it's been used here
for a really long time. What ML.NET did or what
we're doing with ML.NET, is making the API friendlier and open sourcing it so that other
people can then use it. >> Makes sense. So one high-level question
based on what you just said, which is you said there's some pre-built models that
you can just use as is, but you can also train. You can build new models
based on your own data. Well, imagine I've got,
I'm team terabyte. I'm just making up
that number SQL database. It's got all this data in it. How do I think of connecting
all these rows and tables in SQL Server to ML.NET? >> It's actually you can. You can load in your data from a
file or from SQL Server. What you do is you load
it as an innumerable. >> Okay. >> So you can do it from
streaming sources like that. >> I see. That works pretty well. >> Yeah, it does. >> I don't know if you
have any numbers probably, something smaller than
I'm team terabyte. Do you have any numbers that
talk about how long it takes to train a model with say
a certain size of SQL Server? >> I don't think of the numbers
off the top of my head. I know we did train, I think a terabyte of data and
it took a few days. >> Okay. >> That was with Model Builder,
which again, will be coming up. >> Okay. So that
gives people a sense. Now, if your training, this is that activity
that one to two day thing for that large that terabyte,
that's pretty big. That's not something you're
running in like a CICD Flow. That's something that's more of
a one-off activity that you do. Then you get this model out of it and then you
check that in somewhere, and then you run with that for a while until you
decide to replace it. >> Yeah, exactly. >> Okay. >> I'll actually show you
a little bit of code here. First. So Sentiment Analysis is a really commonly used example
to show machine learning. So I'll show you this.
Actually, this blazer app here, which has real-time
Sentiment Analysis. If you say something like
"ML.NET is awesome." You can see the slider goes up. If you say something like "That is rude," you can see
the slider goes down. So what this is doing is using an ML.NET model in
this Blazer application. We'll go into the code here. Lots of it. Lots of
times [inaudible] . >> After we get through
this, I want to try one. >> Sure. >> Lets go through this first. >> Sure. So what's
really cool is the steps are the same for every time
you train a model. The first thing you want
to do is create this ML.Net environment or MLContexts. >> Sure. >> It's like DbContext and
Entity Framework conceptually. Then what you do is you
load in your data and this point it's just
this yelp_labelled.txt, which I'll give you
a little preview that. You have your text here and
then your sentiment here. So one is positive and
then zero is negative. >> Right. So this is our Data source. This is like the SQL
database in this sense. >> In this case it's just
a text file but yeah, this is our Dataset
right here for training. So then what you do is you
load in from a text file this, where you have your Datapath here, you're specifying that
there's no header. If we look into the SentimentData- >> This reminds me of mail merge. >> Yeah. I've actually
used that very recently. >> Okay. SentimentData,
you can see here that all it does is it's
a schema for your free Datasets. So you have your SentimentText which is a string
and then your label, or your sentiment that you
want to predict is a Boolean. So that maps out here and
you're loading in that data. We're right here. Then DataView is a way that data is
represented in ML.net. It's really flexible
and efficient way for looking at tabular data
that's rows and columns. So what we do is load
it into that data view. Then what we do is we add
data transformations. The way that it is now with the text, that actually can't be accepted
by machine learning algorithms, it has to be featurized into numeric vectors which will be accepted by the machine
learning algorithm. So we've added this featurized
text data transformation here and we've added that to
what we call our Pipeline. Then what we do is we choose
our algorithm and this case you can see that we have quite a bit to choose from
for binary classification, which is our task for
sentiment analysis. Right now we'll just choose
SDCA logistic regression. >> I'm guessing GetHashCode
isn't one of them. >> Yeah. Then step four,
you just train your model. So as of now before you
call this fit method, it has a lazy approach where you're just adding things to the pipeline. Once you call this fit
method on your data, it actually starts
the model training. >> Right. One interesting
thing is often we talk about a sync with a lot
of the dominant product, but these look like these APIs
are all entirely synchronous. I think they were intended
to be run in this like stand alone batch process
of the model. So that's why they're synchronous. >> Yeah. >> Yeah. Then this is an optional step that usually
you probably want to do, you want to evaluate your model. So what I've done is taken
a separate Dataset which is, it looks the same but
it's just reviews from Amazon instead of Yelp, and use that to get
evaluation metrics. So you load in from the text file, you make predictions on that test data and then you
get a variety of metrics here, and in this case, it
prints out the accuracy. So then what you could
do is save the model at the end and then use that in any other of your applications. So that can be Web App. What else? You can do Console Apps, Desktop Apps [inaudible]
microservices and containers. >> Yeah. Mobile app. >> Yeah. >> Right. So another takeaway is, I take it the training app and the app in which you're consuming the model are always
going to be different. >> Not always. >> Okay. >> They can be the same especially
like for instance you can make a single prediction in
the Console App that you are training to model but in
most cases they will be similar. >> Okay. That would be
not the most common case. >> Right. Exactly. >> So most of the time
they're separate apps. >>Yeah, exactly. >> This is either going to be
a Console App or just like a very minimal UI app that has
probably relatively few buttons. >> Right. Yeah, exactly. So what we've seen so far and with all of our samples
that we have on GitHub, all the training is
done in Console Apps. >> Yeah. Actually back to
my question about CICD, I guess I can imagine you and I
set up this company together. We're like super pro at this, and we want to have the best trained
sentiment analysis. I could imagine that
every night we have this batch job which there's more data that came
in through that text box. We just rerun all of our models and then we just see
if they have better results. If it has a significantly better result then sync on or maybe we
should just deploy this new model. >> Right. Yeah. >> Meaning it doesn't have
to be the case that you only run this like once a quarter. >> Right. Yeah, you don't have to do it that way. Some
people will do that. But if you're getting better data, it's definitely better to add that
into your Dataset for training. >> Right. I mean even if it takes
two days to run like you said, I guess maybe we could
just do it once a week. >> Yeah. Then get a better model. >> Yeah. Okay. >> Yeah. Definitely. So I'll actually show you what this
looks like when I run it. You can see right now this
ML models folder's empty. So we'll go ahead and
start running that. Maybe. There we go. So you can see we added
a few console lines there. We actually printed out the accuracy
which is about 75 percent, and then we saved the model. >> Okay. From some exposure
to the sun the past, 75 percent is probably good
for the amount of time we took on that but 75 percent
in the general case is bad. Or is that the wrong way
to think about it? >> It's actually not. It definitely
depends on your scenario. It helps to pair the accuracy
with also trying out predictions. In this case if you try
out a lot of predictions, I know you wanted to try
another sentiment one, and it's a pretty good Dataset. So it'll do pretty well. Once you start getting into
the negations like it is not good, that's where it has
a little bit of issues. >> I see. >> Yeah. That's common
for sentiment analysis. >> Okay. >> But you can see the model here
is just a serialized that file. If we actually come up back
to this program.cs where we actually consume the model or used the model and make a prediction, we've ML.net is awesome
and that is very rude. So, I've already made a reference
to the generated libraries here, the class libraries here in
our predict sentiment project. I'm actually going to
drag this up here. So that it can use the model. >> Right. That we just built. >> That we just trained. Then we'll go ahead and start that. Then you can see ML.net is awesome. It predicted it as
a positive sentiment. >> I see. >> That is very rude which
is a negative sentiment. >> Right. So I'd that you could build like X unit test for example, that did something very similar. >> Yeah. Definitely.
We definitely can. >> Is that what people do to validate the model or I guess
since the training, yeah, I guess how do people test? >> So there are those evaluation
metrics that I mentioned. If we go back to, I believe it was here
where we trained our model and we come
to metrics.accuracy. You can see that we have
quite a bit of different metrics, which of course some of
them might not make sense, but we do have in our docs, explanations of what
these things mean and they are Common Data Science. >> Right. They're all useful
for different reasons. >> So that's one thing but that
is also the way you explain, it's also a way that
people can do it. I haven't quite asked around yet to see how
people are really using it. That's our next step is to see the different ways that people are using in
the different scenarios, how they're testing it in
all the different cases. >> Okay. That makes sense. So we're going to try my- >> Yes. Let's try. >> So it is, you're
being obtuse. Okay. I thought that might break it, because I think "obtuse" is not. >> It's probably not in the data
set that we used, right? >> Yeah. Well, and "obtuse"
is kind of an obtuse word. >> Yeah. Any other ones? >> That was me being clever. >> Yeah, any other
ones you want to try? >> No, that was the only one
I wanted to try. >> So that's just one
of the many scenarios. I'll actually show you here. Some of the other scenarios
we have: product recommendation, price
prediction, segmentation. We have all these samples
on our GitHub. If you clone that, you can just try them out of the box. >> I have another one. >> Sure. Let's try it. >> He is acutely aware
of his intelligence. Wow. Apparently I'm good
at breaking this thing. >> Yeah. We may have to add that to the dataset now,
input it on the dataset. >> Totally. >> So another really
cool example that I like showing is object detection. This is one of my favorite ones. You can see here. >> Right, so this is
the bounded box scenario. >> Right. >> I assume if you were doing some of these programs where
you have a photo collection, will say, "Oh, these are all the pictures of Bree
in my photo collection." I assume as the basis of that, they used something like this to
figure out where the person is. >> Yeah. I'm pretty sure. What this is using is
an Onix model, actually. I believe Onix Yellow Three
is what it's called. This is actually in our- Sorry, this is in our GitHub repo, so you can download this
and try it yourself. I changed out the pictures, but you can try it on
your own pictures. You can see here that it's
located this TV monitor, this bottle, and this chair
from my focus room. We've even got my living room here
with a sofa and a potted plant. >> Right. So the way this program is running is the idea that
you click that button, and I guess those are already
resident on the server. But this is the pre-object detected
version of them on the right. Then you're getting
those images served back to you with the object
detection put in them. >> Right. Exactly. It
uses that trained model. I think objects that
it can identify are like sheep and sofas
and dogs and cats. It's trained on
specific things to recognize, specific objects to recognize. For instance, it might not recognize grass or a table
or things like that. But I'll show you
another one, just to show that you can choose any image here. Here you can see it
actually identified a boat from just the images
that I had there. >> Okay. It's the sort
of thing where, at least with my
understanding of this stuff, if you trained it
exclusively on white boats, and then all of a sudden it saw an image with a blue boat
or a red boat, then it might get confused. >> It might, yeah. Definitely,
it's better to have a variety. Also just giving it images of
not boats, if that makes sense. Having just a variety of both is
the best for training the model. Yeah, those are the demos I had. It's pretty crazy,
the different scenarios that our customers have
been using this for. >> Right. So you have customers? >> We do. It's great. >> Because you just
released your 1.0 just now. Is it the case that you just have this growing set of
customers along the way? Or do they come mostly at
the end or a little bit of both? >> We actually had some
before we even hit 1.0, which was really cool. I'll actually get into a
few of them because it's some of my favorite stories here. My top favorite one is
Evolution Software. I think they started
with version 0.4, something like that, and they've
been upgrading ever since. But essentially what they do is, they do commercial hazelnut drying. So this is a commercial
hazelnut drying. >> It doesn't sound like
a software company. >> No, it doesn't. This image here holds 50 thousand
pounds of hazelnuts. The business problem
they are having is, hazelnuts have to be at a certain moisture level
in order to be profitable. So if you over-dry them, they shrink and you lose money. If you under-dry them, they can
get moldy and you lose money. The way that it works now is, people have to climb into here as
it's drying and do the sampling. They take a bucket, and
they take out hazelnuts. They manually test to
see the moisture level. They have to do this every so often, and the conditions
are less than ideal. It's 120 degrees Fahrenheit, 100 mile per hour winds.
It's just not great. So what the people at
Evolution Software wanted to do was eliminate
this manual process. They used sensors to gauge
temperature and pressure. Then they used the sampling data that they had before
as training data. So what they can do is predict the moisture level of the hazelnuts
based on all of that. They actually created
this application here. They use [inaudible]
for real time updates. They use their own ASP.NET Core. Because they have a lot
of.NET in their product. >> They're totally all in. >> Yeah, they are. So
they actually created this for the operators that says, "Hey, this batch is ready. You should go get it," or "This is already too much," or "This
is how much you have left.". >> It's in the danger zone. >> Right. Exactly,
in the danger zone. I did not know that it has to be between eight and a
half and 11 percent. It's the ideal moisture
level for hazelnuts. >> Yeah. It sounds a little bit
like a humidor that people use for cigars that
you sometimes see in stores. >> Yeah, it's similar. Maybe someone will use
ML.NET for that one day. >> I'm actually not a cigar smoker. >> Yeah. Then another
case that we have, Brenmore is really cool. They do surveys for patients after
they come to do doctors visits. They collect that data and try to
improve the patient experience. What they had was all these surveys
and all the survey data, and they have
these free-form comments. They found that it took a really
long time to parse through them manually and then direct it
to the correct personnel. So what they're using ML.NET for is classification of those comments. Both to say if it's
toxic or non-toxic, so is this a good or bad feedback, but also to put it into
categories such as Experience, Facility, Provider, Staff,
and things like that. Then it'll automatically route
to the correct personnel. >> Sounds a little bit
like GitHub issue routing. >> Right. Exactly. That's
actually the one that we use. We have that sample in our repos, using multi-class classification
for GitHub issues. One of their quotes here, actually, was- they mentioned AutoML, which I haven't talked about yet, and I want to get into
that a little bit. But they used one of our new tools called Model Builder that uses AutoML or automated machine learning. What that is is it, essentially, it generates a model for you
based on your data and your task. I'll show you with the sentiment
analysis one right here. The way that it works is, you download the
Visual Studio extension, and it calls Model Builder. You open Visual Studio. All I've created is.NET Core console application.
There is nothing else. Just "Hello, world." What
you do is you right-click, and you say "Add Machine Learning." You have a few different
scenarios right now. It supports classification
and regression scenarios. So we'll go ahead and do sentiment analysis because that's
what we've been doing. You can do it from
file or SQL Server. I've got a file ready, so
we'll go ahead and do that. Wrong file. Here we go. What you can see here is
a little data preview where it has the sentiment and then the sentiment texts, just like we saw before. It's just a different dataset. What we want to predict
is the sentiment. So once we get this model, we want to be able to feed it
sentiment text or comment texts, and then get the sentiment back. That's what we're
going to predict here. We're going to move on
to the "Train" step, and we're going to
specify a time to train. We're going to leave it at
10, which is the default. If you have more data or you
want to use it in production, usually you want it
to train for longer. What it's doing right
now is it's using AutoML to iterate through
different algorithms, data transformations, and
algorithm options to give you the best model or
the highest-performing model. You can see here. It's all the models that
it's going through, the one that it's found
as the best so far, and actually the accuracy
of that model. If you go to the "Evaluate" step, it shows you a few evaluation metrics. Accuracy is a pretty
good one to gauge. You can see that it shows
this Averaged Perceptron Binary, which is actually different
than the one that I had before. But I don't know much
about algorithms, so it takes care of that for me. >> It all sounds good. >> Yeah. What's also really cool is, once you go to the "Code" step, it will actually generate the consumption and
training code for you. So you add those projects, and you can see here it
adds these to the right. You open this Model Builder, and you can see the steps
are very similar to what I had manually written before,
but it just generates. I don't know how these
are happening here. Then this one you can actually
run your model with Program.cs. Lots of red, perfect. It generates those
class libraries for the sentiment and sentiment text, and the predicted label here. >> Nice. >> So then what you can do is go
back in here and use your model. If you go back to Model Builder, it actually has that code that
you can just copy-paste over, and you're able to use your model. >> Right. So this is, obviously, a Visual Studio plugin
that gives you this experience. Are you thinking about any outside of Visual Studio experiences
for the model building? >> Yeah. We actually
have that ML.NET CLI, which does the same thing. Actually, ML.NET CLI is behind
the scenes here in AutoML. This is just a UI put right on top
of that. I'll actually go back. >> Because, obviously, if
we're back to our example, if we're wanting to run this
once a week on an ongoing basis, we probably wouldn't want to do
it manually in Visual Studio. >> Right. Yeah. We have
that option, and it's really good, especially if we're
getting started, but we also have it there
on the command lines. So both ways. >> Awesome. >> In that way, it's
cross-platform as well, so Mac, Linux, Windows. You can use automated
machine learning. >> Awesome. Okay. Do you have any closing thoughts or
things you'd like to share? Where should you go if
you want to get started? >> Yeah. To get started,
it's very, very easy. I'll type it in here for
you, but dot.net/ml. That will redirect you to the pages
I just showed you, maybe. >> I think maybe
those red squiggles were maybe due to some Internet
connection problem. >> Yes, I'm sure. I'll
just bring it back here. There's a big "Get Started" button. This will lead you through how to install Model Builder and get
started with it, and also the CLI. So whichever way you want
to get started there. Then another great way
is to just go to our samples on GitHub
and just try those out. A lot of people, the way they
started was downloading one of the samples and then just
adapting to their own scenarios. Any of those ways, it's
super easy to get started. If you have any feedback, please, on the GitHub, let us know. >> Yeah, a file, an
issue or whatever. Okay. Awesome. Well, thanks
for being on the show and teaching us about ML.NET. >> Sure. Thanks for having me. >> Okay. Well, this has been
another episode of On.NET, and I hope you learned something
about machine learning. Thanks. [MUSIC]