ml5.js: Train Your Own Neural Network

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello and welcome to another Beginner's Guide to Machine Learning video tutorial with ml5.js. Very excited about this one. I'm typically excited about the video tutorials I make. But this one, I'm particularly excited about because I'm going to look at something that has recently arrived in the ml5.js library. So first of all, use version 0.4.2 or a more recent version perhaps. But that's the version I'll be using in this video. And I want to look at this functionality in the ml5.js library called ml5 neural network. It is a function in ml5 that creates an empty or blank, so to speak, neural network. Most everything that I've showed you in this video series so far has involved loading a pre-trained model, so a neural network architecture that's already been trained with some data. And in this video, I want to look at making an empty, a blank slate, configuring your neural network, collecting data, training the model, and doing inference. And the context that I want to look at that is with real time interactive data. So I'm going to come back and maybe use some more traditional data sets. There's a data set that's on the ml5 examples with the Titanic survival data set. I have the data set for my color classifier series. So I'll come back and show you some examples of those as well. But in this first video, I just want to do something very generic, which is create a blank neural network, use mouse clicks to train it, and then move the mouse around for it to make guesses or predictions. And that might sound like a weird thing to do. Hopefully, it'll start to make sense as I build the code and step through all the processes. I also want to highlight for you the Wekinator project, which is a free open source piece of software created by Rebecca Fiebrink in 2009 for training machine learning models. And I would especially encourage you to watch Rebecca Fiebrink's talk from the Eyeo conference in 2018 where she talks about creativity and inclusion in machine learning, and goes through some demonstrations with Wekinator and processing and other pieces of software. So a lot of the work that I'm doing with ml5 is entirely based on recreations of many of the example demonstrations that Rebecca Fiebrink made and has done research about for years and years with the Wekinator project. So in fact, the examples that I'm going to make in this video and the next one and the next follow ups are direct ports, in a way, of some of the original Wekinator and processing examples. But I'm going to do it all in JavaScript in the browser with P5 and the ml5.js library. There's also a fairly lengthy history of creative artists training machine learning models in real time to control musical instruments, a performance, a visual art piece. And I would encourage you to check out some of these projects. Our guide for figuring out how to write the code is going to be the ml5 website. And there's a page on the ml5 website for the neural network function. But before I start diving to the code, let's take a minute to talk about what a neural network is. Now, by no means in this video am I going to do a comprehensive deep dive into what a neural network is and how to code one from scratch. I will refer you to many other wonderful resources where you could do that deep dive, starting with the three 3Blue1Brown video, What Is a Neural Network and some of the subsequent ones. I also I have a 10 to 15-part video series where I build a neural network from scratch in JavaScript based on a particular book called Make Your Own Neural Network that is in Python. I have other videos that are guides around machine learning concepts where I talk about different kinds of neural networks. So I will link to all of those in this video's description. But here, I'm going to use the whiteboard over here just to give you a very zoomed out, high level overview of what I'm talking about. So a machine learning system, in the most basic sense, involves inputs and outputs. So let's say for a moment that the goal that I have is to train a machine learning model to use my body as the input. So maybe how I moving my arms and legs and head, that will be the input. And the output would be a musical instrument, a note that's being played. So I could somehow play different notes based on how I move my body. This is an area that's covered in great detail in Rebecca Fiebrink's course, Machine Learning for Artists and Musicians. One way that I might boil this idea down into its very simplest version is, think about a 2D canvas. And it's very convenient that I'm using p5. Because that's the thing that exists in p5.js. And what I'm going to do is I'm going to say there is a mouse in that canvas. And the mouse is going to move around the canvas. And based on where it is, it will play a particular note. Now of course I could program the same idea with an if statement. But this is really what it means to work with machine learning. Instead of programming the rules explicitly into code, what I'm going to do is give the code a whole bunch of examples and have it learn those rules. So I want to demonstrate that process in a scenario where it's very obvious how it's working so that then we can build on that into much more complex scenarios. The steps are, collect data. Two, train model. Then three, I guess we can call this prediction. So that's also sometimes referred to as inference. That's really when we're deploying the model, we're making use of the model. Right here, this is my representation of the model. So in this case, if I want to start with a classification problem-- and I will show you demonstrations of classification and regression-- I'm going to have two inputs-- input one and input two, often referred to as x's in machine learning context. Those inputs are going to go in to this machine learning model. The output is going to be one of, let's say, three different categories. So I'm going to have three outputs-- C, D, and E. So two inputs and, in this case, three outputs. My diagram looks a little bit weird. So I'm going to fix it up for a second. Now all this time, I've just been putting the letters ML in here for Machine Learning, or maybe referring to this as a model. Because in truth, other things, other kinds of algorithms, other types of ideas beyond a neural network could slot in here. But the ml5 five generic blank machine learning model that you can train is a neural network one. So if I were to try to zoom into this for a moment, what I would actually see is something that looks something like this. This is my zoomed in diagram of really what's going on in here. A neural network is a network of neurons. Technically, this is a feed-forward multi-layer perception. Because the inputs, which are represented right here, get fed through these connections, they're weighted connections, and get added up altogether and arrive in this layer, which is known as the hidden layer. And there can be multiple hidden layers and different kinds of hidden layers. But the data is summed and processed through a mathematical function called an activation function, and then fed out of the hidden layer and into the output layer. And the output layer, after all of the hidden outputs are summed and passed through an activation function, we get a set of numbers out. And those numbers might look like this-- 0.2, 0.7, 0.1 meaning a 20% of being category C, a 70% chance of being category D, or a 10% chance of being category E. So there's a lot more details of what's going on in here. And I certainly, once again, would refer you to the various things that I'll link in this video's description. The idea here is that a neural network is a machine learning model that can be trained. It can be shown a lot of examples of inputs with their correct corresponding outputs. And it can tune all of the weights of all these connections so that when it later gets new data, it can make appropriate predictions. This is everything that the ml5 library will take care of for you. And for us, we're going to really just be working with the inputs and outputs, collecting our training data, training the model, and predicting outputs. Now that I've gotten through that explanation, I really want to just write some code and show you how this all works. Sorry to interrupt. It is me, from around four days in the future. I did actually continue this. This was recorded during a live stream. And I did go all the way through and make this example. But I made some pretty significant errors in how I use the ml5 library. So I've come back to rerecord and try this again. I'm sure I'll make other mistakes and things will go wrong. But if you want to watch the original version, I'll link to that video description. But I'm going to start over right now. Quickly want to point out that in addition to the p5 libraries, I've added a reference to the ml5 library version 0.4.2 in index of HTML. In my blank p5 sketch, I can add an ml5 neural network. So I need a new variable. I'm going to call that model. And I'm going to set that model equal to ml5.neuralNetwork. Whenever I create a neural network object, I need to configure it. I need to give it some information about what's going inside there. And the guide for doing this is the ml5 website. So here on the ml5 website, you can see this quick start which has some sort of sample code, documentation of the usage of the neural network function, and a bunch of different ways of initializing a neural network. And all of these involve a variable called options. The idea is that I'm creating an object that's going to store the properties of the neural network. And then when I call ml5.neuralNetwork, I pass that object in. So for example, looking here at my diagram, I can see there are two inputs and three outputs. That's something that I could configure in the options. Inputs, two. Outputs, three. At a minimum, this is all you need to configure an ml5 neural network-- how many inputs, how many outputs. If I scroll through the ml5 documentation page, you'll see there are a variety of other ways that I could configure a neural network. For example, I could actually give it a data file. So ml5 has functionality built into it that could take a CSV or Comma Separated Value file or a JSON data file and configure inputs and outputs based on what's in that file. So I need to come back to that and cover that in a separate video. But actually, the way that I want to do it here is actually this particular way. Instead of specifying the number of inputs and the number of outputs, it's much more convenient for me to name them. So here are the inputs, and x and the y. And here are the outputs of label. Now this is not something that the actual mechanics of a neural network uses. Inputs don't have names. Outputs don't have names. These are just numbers flowing through this feed-forward network. But for us from a zoomed out view, we can maintain the neural network and use it more easily by naming things. Adjusting that in the code, I'll have an x and a y. These are my inputs. There's two and they have names, x and y. The really confusing thing here is what to do about the outputs. So while, technically speaking, the way I diagram this is correct and there are three output neurons, each scoring a probability for all three categories, from the zoomed out view we can think of it as the neural network ending up with one singular result-- what is the label it's classified the input data in. So ml5 is going to handle the number of categories and how to score all those things for you. So if we're naming stuff, I can actually just write here, say, label. And as I start to create the training data, I'm going to come back to this and explain where that number three comes back in. There's one more property to the options that's very important for me to specify. And that is the task that I want the neural network to perform. And in this case, the task is classification. The other task that I could've specified is a regression. And I'll come back and do other examples that use a neural network to perform a regression. We'll see what that is and how that works in future videos. In this case, though, it's a classification because I want the neural network to classify the input into one of three discrete categories. So now, we are ready for the first step, collect data. And this should really say, collect training data. Collecting training data means I need to have a set of inputs and their correct corresponding outputs. And in this case, I want to collect that data through user interaction. And I'll do that with mouse clicks. So I'm going to write a function called mousePressed. And I'm actually going to get rid of the draw loop. And just to get started, every time I click the mouse, let's draw a circle on the screen, mouseX, mouseY with a radius of 24. And let's also put a letter in the center of the circle. So now as I click, I'm collecting data points. I'm collecting x and y's that go with the category C. But I also want to collect x and y's that go with different categories. So let me do that through a key press. I'm going to create a variable called targetLabel. And I'm just going to give that a default value of C. The I'm going to add keyPressed. And I'll just set targetLabel equal to the key that I press. Then I'll draw the targetLabel instead. So here's a bunch of C's. Now I'm going to press D. Here's a bunch of d's. Oh, it's lower case. Let me add to uppercase. Let's make sure this works. A bunch of C's, some D's, and E. Now of course, I could do any letter right now. So I probably would want to add some kind of error checking or determine what I want to let be the actual targetLabels. But for now, I'm going to just let it be a free for all and just restrict myself to C, D, and E. I didn't actually collect the data though. I'm just showing you a very crude user interface for indicating where I've clicked and letter I've pressed. So let me create a variable called trainingData. I'm going to make that an array. Then whenever I click the mouse, I'm going to say my inputs are. And now because I named the inputs when I configured the neural network, I can create an object with an x and a y, and also another one with a label. You'll see. X is mouseX. Y is mouseY. And then I'm going to make one called outputs. And really, though, these are the outputs but this is called a target. That's a word I can use here. Because this is the target output that I want the neural network to learn given these inputs. So I'm going to say target equals label. And the label is the targetLabel. And actually, I don't need this training data array. I was thinking I wanted to use that so I could store all of the data myself in an array. And that could be very useful in a lot of contexts. Actually, all that I need to do here is just say model.addData (inputs, target). So this is a function in the ml5 neural network class where the model can accept training data as pairs of inputs and target. And I need to make sure that the inputs and the target match up with how I configured the neural network. And they do. Because I have an x and a y and a label. And now I have an x and y and a label here. Now rightfully so, you might be asking yourself, what happened to the fact that we were restricting ourselves to three possible categorical outputs. And this is where a higher level library like ml5 comes in. It's just saying, I know you're going to do classification. I know I'm going to give you a label. I know you're going to give me some training data. So just give me all the training data. I will count how many possible outputs there are after you finish giving me the training data. So as long as I give it a bunch of examples with a C, a bunch of examples with a D, and a bunch of examples with an E, it will configure itself to work with a limit of three possible labels as the output. Let me quickly test to see if I get any errors. C, D, E. Seems to be working. Good. I am ready for the next step, training the model. This is a really easy one for us. Because the ml5 neural network class has a function called train. So I can just call that train function. Certainly it would make sense for me to build some kind of user interface here. But I'm just going to keep going with my keyPressed method. And I'm going to check and say, if the key pressed is T, then call model.train. When I train the model, I can also give it some options that set various properties of the training process itself. So I'm going to create another options object. Information on what those options are is on the ml5 documentation page. I'm going to just use one option. I'm going to set something called an epoch. So I'm going to set the number of epochs to 100. So what is an epoch? If I look at my training data, I have 30 data points. And I'm going to feed all of those things into the neural network. I'm going to say, hey, neural network. Here's an xy. That goes with the C. Here's an xy. That goes with the C. Now importantly, one thing that's important that ml5 does for you behind the scenes is it shuffles all those into random order. Because the neural networks not going to learn so effectively if I send in all the C's, and all the D's, and then all the E's. I want to send them in a random order. Sending it all 30 of those is one epoch. Typically that's not enough for the neural network to learn the optimal configuration of weights. So you want to send it in again, and again, and again. So if I have 30 data points and I train for 100 epochs, that's sending stuff through the neural network 3,000 times. Now in truth, there's more to it than this. There's something called a batch size because I might consider the data in smaller batches out of those 30. And that can affect how I adjust the weights. But we don't need to worry about that. Setting the number of epochs is a good starting point for us to start thinking about the process of training a neural network. So I can pass to the train function the options. And then the train function also has two callbacks. One is optional. But I'm going to use them both. There's a whileTraining callback and then a finishedTraining callback. The idea is that there's a number of events happening while you're training the neural network. The whileTraining callback is executed every epoch. So that'll get executed 100 times. And the finishedTraining callback is executed when the whole thing is finished. So let me write those functions. In finishedTraining, I'm just going to add a console log. whileTraining actually receives information about the training process and receives two things-- an epoch and a loss. These callbacks really work as debugging tools for me to look at how the training process is going. Oh, that epoch finished. What was the loss? And I need to come back and talk about what loss is. But I can really look at what's happened while it's training and then know that the training has finished. ml5, however, has built into it a visualization functionality that's part of TensorFlow.js itself, particularly its library called tf.vis. And I can enable that by adding one more option to my neural network configuration. And that is debug true. If I add debug true, I'm going to get much better tools than what I've got here with my own callbacks. There's one more thing that I've missed here. And that has to do with normalizing the data. What does our training data look like? Remember, I've got this p5 canvas. Maybe it's 400 by 400. Any given input is a mouse click into that canvas, like here. And so this might be the mouse location 100, 100. So that would mean the literal number 100 is being fed into the neural network. But neural networks are generally tuned to work with data that's all standardized within a particular range. And there could be a variety of ways you might want to use one range versus another. But in many cases, you always want to squash your input data into a range between 0 and 1. And that process is called normalization. So that would be really easy for me to do myself. Because if I know the width is 400, I could just say 100 divided by 400 is 0.25. So I could apply this normalization math myself in the code. But ml5 has a function built into it that you could call right before you train the data that will look at the minimums and maximums of all of your input data and normalize it. So coming over here, I can call that function right before I train the model by saying model.normalizeData. An now I think I am ready to actually train the model for the very first time and complete step two. Let's give it a try. Let's add a console log to know that the training process is starting. Let's collect a lot of data. Now I'm clustering all my C's and D's and E's together because I want to create a scenario that should be easy for the neural network to learn. Again, I don't need a neural network to figure out that there's C's in the top left and D's in the top right and E's on the bottom. But if the neural network can learn this scenario, more complex and interesting ones it could possibly learn as well. So again, if you remember, my exciting interface was to press the T button. So I'm now going to press it. And this is the debug view that pops up. This comes up because I put debug as true. And what you're seeing, you saw the epochs being console logged here. You see that it says finishedTraining. But this is showing me a graph of the loss. The x-axis is the epoch. And the y-axis is the value of the loss. So what is loss? If this particular data point that I'm sending into the neural network is paired with the target of C, when it gets sent in, the x gets sent in, 0.25. I should make the y something different. Let's say the y is 200. The y gets sent in as 0.5. The neural network is going to guess C, D, or E. Then it has to decide did it get it right or did it get it wrong. Was there an error? So if it happened to guess C, it's going to have gotten it right. You can think of the error as 0. If it's gotten it wrong, if it guessed a D, then there is an error. Now what the value of that error is actually has to do with the scoring that it's doing based on its confidence that its one label or another. Maybe it was like 99% sure it was a D. It's going to have a big error then. But if it was only like 60% sure it was a D and 40% sure it was a C, then that error is going to be smaller. But that error, another word for that error is loss. So as it sends all of the data over a given epoch through the neural network, it's going to summarize all of the errors into a loss value. So as the neural network trains the model with the training data over and over again, epoch by epoch, that loss should be going down. It's getting better and making more correct guesses over time. Based on what the graph is doing here, this indicates to me a couple of things. One is it's learning kind of slowly. So one possibility could be just give it more epochs. So maybe what I actually want to do is go back into the code and give it 200 epochs instead of 100. The truth of the matter is I have a very, very, very small data set here. So with a little bit of data, I need a lot more epochs and it kind of makes it feel like it's more data. Another way that I could tackle this issue is by adding another property to the options object when I configure the neural network. And that property is something called a learning rate. The learning rate refers to how much these dials should turn based on the errors that it's seeing as the neural network is looking at the training data. So it got an error. Should I turn the dial a lot or should I turn it just a little bit? If I turn it a lot, I might get closer to the correct result. But I could also overshoot that correct result. So this kind of hyperparameter tuning, which is a fancy word for saying, I want to try this learning rate on this number of epochs, these are the kinds of things that you could experiment with by training your model over and over again with different properties. But for me right now, I'm going to leave the default learning rate and just experiment with the number of epochs. And maybe in some future videos, I'll come back and look at some of the other parameters and what might happen as I tune them. Let's try one more time. Collecting data and training the model. Oh, wacky. So this is really good. I want to see that loss go all the way down. We can see at the very end here it gets down to 0.096314. And you could also see that it's starting to level out. That indicates to me that if I'd given it more epochs, maybe it would squeeze out a tiny bit more accuracy. But this is pretty good enough. And that, my friends, is the end of step two. Guess what. Only one step left. And this one is prediction, meaning the idea now that I've trained the model is I want to give it new inputs. I'm not giving you a target. I'm not telling you what the answer is. Neural network, you've learned. You've thought about this a lot. I've taught you all I can teach you. Now make some guesses for me. To implement this, I think something useful could be for me to create a variable that's like the state of the program. And at the beginning, the state would be collection. And when the state is collection, that's where I want to set the targetLabel and add the data to the model. When I press the T key, then the state is training. And then when I'm finished training, I could say the state is prediction. And ultimately, I think I just want to draw those circles during the collection process. Because if the state is prediction, then I want to ask the model to classify the inputs. The idea being that there are no targets. The model is trained. Here are some inputs. Classify them for me. So where do I get the results back? I need a callback. I can define a function called gotResults. Just like all the other ml5 stuff that I've looked at in previous videos, there can be an error or there could actually be results . If there is an error, don't do anything. Otherwise, let's take a look at the results. Let's try this again. I can't believe I have to collect the data again. Wouldn't it be nice if I could save the data so that I don't have to collect it again if I've made changes to my code? In fact, it is. And I will come back and do a separate video all about how to save the data. And in fact, I could also save the trained model so I could load that trained model onto another sketch, or all sorts of possibilities. And I will also come back and look at saving data and saving the model and reloading those things. But for right now, I'm just going to-- because I have the time to do it and I can speed through this when you're watching it, I'm just going to constantly recollect the data over and over again. The model is trained. And if I did things correctly, it'll now show me some results when I click into the canvas. I'm going to click over by the C's. This is good. I got an array back, that results array. And it has three objects in it. What are those objects? The first one is the label C with a confidence score of 88.5%. That's good. That's what I should have gotten. The second one is D, confidence score of 7%. Third one E, confidence score of around 5%. This is exactly what I expect. These confidence scores are what the neural network is really producing behind the scenes. But the ml5 library has taken those confidence scores, associated them with given labels, and then sorted those labels in order of confidence. So I will always have in results(0).label the label it thinks it should be. So what I can do is grab this drawing code. And I can put it right down here. Maybe I'll change it to blue with an alpha. And instead of targetLabel, I'm looking at results(0).label. Now the truth is, I shouldn't be using mouseX and mouseY here. I should be actually saving those inputs maybe in a global variable or passing them. But I think the prediction is going to happen fast enough that I'm not really going to be able to move my mouse. So I think it'll work out OK. Let's give this a try. I've got to collect all the data again, and train the model. Now I should be able to click here and see a C. I did. And a D. I did. And an E. Let's move along here and see. When does it change to C? It changed to C there. That's like a decision threshold like thingy. You know what would be interesting to do? We could draw a map of what all the decisions are across all the pixels. That's a project you should do. I click over here. I should get a D. See, what's over here? A D, a D, a D, an E. Oh, this is fascinating. I love that this works. So guess what. This is actually done. But the whole point of what I wanted to do here was to have it play sound. Because I want this to be the beginning of an idea that I could maybe play musical notes by moving my hand around. Imagine, again, the inputs being instead of the xy of mouse clicks, the xy of some type of hand tracking. Or I could actually have more than just two inputs. Because I could take the xy of this and the xy this and the xy or this, or whatever kind of other data. Maybe you've hooked up a whole bunch of sensors. And you've got a bunch of different force sensors and an Arduino. And those could be the inputs to your neural network. So many possibilities. But I'm not going explore all those possibilities right now, at least. But I do want to show you the output of playing a note. I made a video tutorial that you could go back and watch if you want about how to use a sound envelope with a sound oscillator in p5 to generate a tone. And so I'm just going to, for now, just grab this code, all of this, the oscillator and the envelope. I'm going to paste all that in setup here. And this, by the way, has changed to the full word, envelope, since I last made that tutorial. And now whenever I click the mouse, I should be able to say envelope play. So if I do this, [NOTES SOUNDING] when I click, you hear this note play. But I want it to play C, D, or E. And honestly, I could make a lot of notes right now since I can have more than just three categories. But let's just stick with C, D, and E. What I need to play a particular note is the frequency of that note. And I can find this on this Wikipedia page about frequencies and piano notes. I'll start with C4, D4, and E4. I'll make an object that's a lookup table for those notes. So I have C, D, and E all matched with their frequency. I should make sure that the envelope and the wave are global variables. Because now when I play the note, I could set the frequency to notes(targetLabel). This will look up the numeric frequency associated with that label and set the sound oscillator to that frequency. [NOTES SOUNDING] C, C, C. [NOTES SOUNDING] D, D, D. [NOTES SOUNDING] E, E, E. Now if I press F, it's not going to change because I didn't put F in that lookup table. But you could add more notes. Add more notes, more notes. Then guess what. All I need to do is take this exact same code. After prediction, instead of the targetLabel-- let me just put this in another variable so it doesn't look so insane. Results(label), draw the label, then play the note associated with that label. I think this project is done. Let's try it. I'm going to collect the data again. I'm really coming back and showing you how to save the data. Time to train the model. Moment of truth. Time to do prediction. [NOTES SOUNDING] I kind of want to do this as I just drag the mouse around. Oh, guess what. This has been such a good demonstration of a regression. So the next video I should do, I just come back and do this exact same thing but with a regression. What would that be? Instead of having categorical output which you could only have a C or D or E, a regression would have numeric output. You could think of it as a slider. So maybe if the frequency of C is around 262 and the frequency of D is around 330, maybe in between I'd play a note. I'd play like C sharp, that's in between C and D right over here. That would be a regression. But I'll come back. I'll explain all of that and do that in a separate video tutorial. So there's too many directions you could go in just from watching this. At a minimum, if you're watching this video and you want to try it, see if you can recreate this process. You're going to find it quite frustrating to have to collect the data over and over again. So you might want to skip ahead if I have those videos ready to see how to save the data you've collected. But certainly doing this with more notes and thinking about the input in a very different way would be a good starting point for you to try. So thanks so much for watching. Many more tutorials about the ml5 neural network functionality to come. And I'll see you in those, where we train more models.
Info
Channel: The Coding Train
Views: 81,227
Rating: undefined out of 5
Keywords: machine learning, tensorflow, tensorflow.js, ml5, ml5.js, neural network, training, classification, music, p5.js, p5 sound, JavaScript, coding train, daniel shiffman, tutorial
Id: 8HEgeAbYphA
Channel Id: undefined
Length: 34min 48sec (2088 seconds)
Published: Fri Nov 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.