Hot Dog or Not Hot Dog – Convolutional Neural Network Course for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Is this a hot dog? Or not a hot dog? What about this one? In this course, you will learn about convolutional neural networks. These are a class of deep learning neural networks that are particularly effective for classifying images. CNNs are also used for other applications such as natural language processing and time series forecasting, but they're most commonly associated with image processing. Kylie Ying teaches this course. Kylie is a software engineer and she is passionate about machine learning and artificial intelligence. So let's start learning after you determine if this is a hot dog. Welcome to this introductory course on convolutional neural networks. In this course, I'm going to be talking about how our computer can look at an image with a dog in it and say, Hey, this is a dog. So the secret behind that is exactly this convolutional neural networks. And that's what we're going to be discussing today. From me looking at this, clearly this is a hot dog. I don't think that's a hot dog. Can't really tell what that is. hummus, hot dog, garlic bread, hot dog, hot dog, oysters, and it looks like sausage and waffle. And now if we look at our labels, one means that our model thinks it's a hot dog. Zero means that it doesn't think it's a hot dog. So for the first row, we get yes, no, no. So we get hot dog, not a hot dog, not a hot dog. And then we get yes, no, yes. So hot dog, not a hot dog, hot dog. And then yes, no, no. So hot dog, not a hot dog, not a hot dog. And I think that's pretty awesome for a model that we are going to learn in this video today. It's ready. Please, God. What would you say if I told you there is an app on the market? We're past that part. Just demo it. Okay. Let's start with a hot dog. Oh, shit. Yes. How's that? My beautiful little Asiatic friend. I'm going to buy you the palapa of your life. We will have 12 posts, braided palm leaves. You'll never feel exposed again. I'm going to be rich. Do pizza. Yes, do pizza. Pizza. Not hot dog. Wait, what? That's that's it. It only does hot dogs. No, and not hot dog. If you guys have already seen a few of my previous free code camp machine learning videos, feel free to skip ahead to the section that actually discusses convolutional neural networks and then the co lab where we will be practicing how to actually implement this on a very exciting project. Okay, for those of you who are new here, stay tuned. We're going to cover the basics of machine learning and we will build up to understanding how a convolutional neural network works. Alright, so let's get started. This is the introduction to convolutional neural networks presented by me, Kylie Yang on behalf of free code camp. Make sure you guys go and check out my channel Kylie Yang for beginner programming content, as well as future courses on artificial intelligence. So with that being said, let's dive right in. What exactly is machine learning? Well, machine learning is a subdomain of computer science that focuses on algorithms, which help a computer learn from data without us programmers explicitly programming certain instructions. So basically, we want to be able to train our computer to understand or to comprehend or to draw some sort of conclusion to learn from certain data that we feed it. So there's a few different types of machine learning. The first type is known as supervised learning. In supervised learning, we use labeled inputs, which means that each input has a corresponding output label to train models and to learn outputs. So let's look at some examples. Basically, here, we have a few pictures, right? We have a picture of a cat, we have a picture of a dog, and we have a picture of this looks like a gecko. Now, our computer doesn't actually know these labels ahead of time, so we as humans have labeled these items. And we are familiar with pictures of cats, dogs and geckos. So we can label these in our head, but our computer only sees pixels. And so what we are doing in supervised learning is we are actually constructing a data set with these labels attached to them. So when we feed these pictures to our computer, we're saying, hey, this top left one is a cat. This right one is a dog. And this bottom one, okay, I labeled it as a lizard, but you get the point. Basically, we're feeding it a label when we pass it into the computer. Now, there's also unsupervised learning. And basically, this uses unlabeled data to learn about patterns in data. So here, if we have these images, well, you know, if we have multiple different images of cats, multiple different ones of lizards, geckos, these things, and then multiple pictures of dogs, then our goal is our computer would be able to learn from all of these and be able to pull out similar features and say, hey, you know, these all seem like they're one type of category, this seems like another category, and this seems like it'd be another category. So that's unsupervised learning where we don't necessarily provide the labels. And our computer is trying to draw conclusions from similarities that it finds in our data set. Now, the last type of machine learning is known as reinforcement learning. So in reinforcement learning, there's an agent that's learning in an interactive environment. And it learns based on rewards or penalties that it observes while it does some sort of action throughout this environment. So, for example, training your dog is a type of reinforcement learning. And we're essentially replacing this dog with a computer. So when we train our dogs, you know, if our dog sits, then we feed it a treat. And if our dog barks, we might yell at it and get angry and the dog will eventually hopefully stop barking. Sometimes my dog is not like that. But anyways, based on these rewards and penalties, the dog basically picks up intuition about what actions will be able to get future rewards and what actions will lead to future penalties. So that's reinforcement learning, we train our computer in a very similar way. But our reward for our computer might just be something like points. Today, we are focusing on supervised learning. So let's talk about supervised learning a bit more. Basically, in machine learning, we have some sort of inputs. So here we have inputs one through n. So these are all of our samples, all of our examples, and they go into some model, which we'll talk about in a second. And then it leads to some sort of output, some sort of prediction. Now, the terminology here is that all of our inputs, these are known as our feature vector. So each input that we give our model should be in the form of some sort of parsable data, which often means just a vector of numbers. Now, these features can be qualitative. So that typically means categorical data, there are finite numbers of categories, or groups. So one example of that might be the traditional way of defining gender. So female or male, this is qualitative, because there's only a certain number of groups. Another example might be, okay, what country do you live in? Well, there are only a finite number of countries out there. And so this is qualitative data, this is categorical, there's a finite number of categories. So this is known as nominal data, because there's no inherent order, it's not like a happiness rating, where, you know, zero is unhappy, and five is very happy, there's no ranking between either of these. Now, the other type of qualitative data might be something like age, right? So these are different categories of being, you know, an infant, and then a child, and then a teenager, an adult, etc. And as I just said, there might also be happiness ranking. So you know, 12345, five being really happy, one being not that happy. And this is known as ordinal data, because there's an inherent ordering to this data set. But basically, both of these are known as qualitative data sets, because they are like categorical, there's only a certain number of categories there. And now you might be wondering, okay, doesn't that encompass all of our data? No. So the other type of data is quantitative data. So that means it might be numerical value data. And it could be discrete or continuous. So some examples of that are okay, how long is something? You get a number, right? Like, my desk could be 5.289 feet long. Sorry, if you don't use the American measurement system, which is the rest of the world, but you get my point. It's an infinite length, it's a numerical system. Now, it can also be temperature. So for example, this might be, I don't know, 200 degrees Fahrenheit. Again, sorry, if you do not use Fahrenheit, which happens to be everybody, not in America. Or it could even be a discrete numerical value. So for example, if we're on an Easter egg hunt, well, it looks like we have something around like maybe 10 or so eggs in our basket. So that number might be 10, but that could be zero all the way to infinity, right? The basic point of this is that it's quantitative. So it's numerical data. And these two values are continuous, because you could have like, pi fee, right? But you can't really have pi eggs. So that's why this is discrete. It means that it only follows like 123, like counting numbers, whereas continuous doesn't. Okay, so those are our features. And so now let's talk about the types of predictions, the outputs of our model. So the different types of outputs that we can have the first type of task is called classification. And this means we're going to go and predict discrete classes. So for example, if we have a picture of a hot dog, a pizza and an ice cream cone, classification would say, okay, this is a hot dog, this is a pizza, and this is an ice cream cone, it give us three distinct categories and try to map something into one of those categories. This is known as multi class classification, because we have more than two types of classes. Now, if I were to classify these into hot dog, and then not hot dog, then that becomes binary classification, because there's only two options. So it's one or the other multi class is more than two. So I could have like 10 different types of food and try to classify into these 10 different types. Other examples of classification. So binary classification, if you have like positive or negative sentiment in a paragraph, or a picture might be a cat or a dog, or, you know, an email might be classified as spam or not spam. For multi class, you might have cat, dog, lizard, dolphin, etc, all the animals in the animal kingdom, you might have orange apple pear, or you might have all the different species of plants in the world. Now, the second type of supervised learning is known as regression. So in this case, we're trying to predict a continuous value. So here, we might be trying to predict the price of an asset, such as I think I took this a screenshot from like Ethereum or something. But we might be trying to predict the price of Ethereum, or we might be trying to predict how much snowfall we're going to have on some certain day. Or we might be trying to predict the housing market, how much will this house cost, you know, in two months, or two years, or 20 years. So these are all regression tasks, because we're trying to predict continuous values. Now let's dive a little bit into the model. Before we talk about the different types of models, let's kind of discuss how do we actually make this model learn? How can we tell whether or not it's actually learning? Let's talk about that. So let's take this data set. For example, this data set is a data set that I found online, it is a Pima Indian diabetes data set. And this was originally provided by the National Institute of Diabetes and digestive and kidney diseases. Let's talk a little bit about what we're actually looking at. So here we have the labels for the different columns, right? So number of pregnancies, glucose levels, blood, blood pressure, skin thickness, insulin, BMI, age, and then outcome, whether or not they have diabetes. Each row here represents a different sample in the data set. So each individual that this data was collected from, that's what each row represents, right? So this individual here had one pregnancy, and these were her glucose values, blood pressure, skin thickness, insulin, BMI, age, and then the outcome, whether or not this person has diabetes. And this row down here might tell a different story, it might be a different person. Well, it is a different person. Now each column, well, this is a different feature. So this specific feature is the blood sugar, or sorry, blood pressure feature. So this measures all the blood pressure levels amongst our entire data set. Except for this one over here, this is our outcome. So this is our output label. And here, specifically, our output label is ones and zeros, because we need to transform yes and no into a language that our computer can understand. Our computer is very, very good with numbers. So in this specific example, we're coming up with zero being negative, no diabetes, and then one being positive, which stands for they have diabetes. And this is a very, very common way of actually encoding yes or no for our labels, or actually for even our features as well. But anyways, this is our output label. Everything, all of our features minus our output label, this is what we would call a feature vector. Now this is what gets passed into our model. And then this over here, this is what we call the target for the feature vector. So essentially, if I pass my model these values, then I would want it to get as close to the target or the actual value, because remember, we're doing supervised learning, we would want these values to get as close to this target as possible. And same with any of these other samples, if I pass this value into our model, I would hope that it predicts zero, if I pass in this row, this feature vector in, I would want it to predict one. And when we put all of these feature vectors together, we call this the features matrix x. And this is only really important when, you know, you might be studying a bit of linear algebra, if you go on into more in depth machine learning. And this over here, this is our labels, or our targets vector, also known as y. So again, this is just something a little bit of terminology. So let's actually visualize this as a chocolate bar. So here we have this x matrix, which has all of our feature vectors. So imagine like, a row of chocolate is a feature vector. And these are all of our targets, the corresponding targets for each of those feature vectors. Alright, if I take a feature vector from my data set, and I pass it into this model, well, my model is going to output some sort of prediction. Right now, how do we use our actual output label in order to help us, you know, determine a better model. So what I can actually do is I can take the output, the actual value, the actual outcome, because we have that information, since this is supervised learning, and I'm going to compare my prediction with the actual output given to us in our original data set. Now, what I can do is I can take, you know, some sort of difference between these two say, okay, how far am I from this desired output, I'm going to use that data, and I'm going to train the model using that. Now, if we're inputting many different vectors into our model, getting the outputs of those taking the difference and training our model, then our model over time, as we train it, we'll get closer and closer to predicting, you know, the actual output down here. So this is our supervised learning data set, it's this chocolate bar with over here, our features vector, and then our output labels. What we normally do is we actually split it up into three different types of data sets. So we have this like training data set, which will be most of our data, we use our data in order to train our model. But now how do we actually determine how good does our model predict on stuff it hasn't seen yet? Because what we would ideally want to do is take it out into the real world someday, right and say, hey, here's a picture, what's in this. So in order to do that, we also need a testing data set. So that testing data set is just some data that we've removed from our original data set. And we're just going to use that in order to see how well our model can do on data that it hasn't seen yet. So it's part of our original data, but we're removing it, holding it off to the side so that when we finish our model, we can say, hey, now look at this new data. How well how well can you perform on this? Okay, so what is exactly is this validation data set here in the middle? Okay, so when you're actually building the model, suppose that you have a bunch of different models that you've come up with, how do you actually choose which one is the best? That's where the validation data set comes in. So imagine you are out buying a car, right, there's many different types of cars, some of them are just slightly better than the others, some of them just have a few things here and there change, just like small bells and whistles, right. So this is a test drive where you're going, you're testing out all the models, and you're picking out, okay, this one is the best and this is the one that I want to keep. Now, what's the difference between validation and testing? Well, validation is used on all the models, so you can pick out the best one, and then testing, you're using that model, and you're saying, okay, well, how well does this model do on data that I haven't seen yet? And these two, we both keep aside. So like, they're not used for training, but they have two different purposes. So remember that validation is test driving, picking out the best model, testing data set is used on that best model in order to get your final accuracy number or final metric about how well your model does. Okay, so let's visualize each of those. This training data set is passing the model, and the model produces some sort of prediction for each of the feature vectors in this training data set. So this training data makes some sort of prediction, and we're actually comparing it to the actual output. And when we take the difference, comparing the two outputs, that is what's known as the loss. Now, this loss can be used to actually make adjustments, which is known as training to this model. Now, this validation set, again, is used as a reality check during and after training to, first of all, ensure that the model can handle unseen data. And then also to say, okay, which model do we want to use? So again, this is a test drive. Here we have model A has a loss of 1.3. Model B has a loss of 1.5. Model C has a loss of 0.5. And Model D has a loss of 0.9. So which one is the best model? It's Model C. Now, finally, once we've selected Model C, we take our test set, we pass it through Model C, and we get our final performance between these two, our output and our actual label. And our test set is used to check how generalizable the final chosen model is. So that means how well can our final model perform on data it hasn't seen yet. All right. Let's talk about this loss function. Loss is basically a metric of performance, it tells our computer, hey, this is how well we're doing right now. And so if our prediction is further from the output, the greater the loss will be. So for example, this pink one, it's a slightly higher loss than that brown because it's further from that chocolate bar, right? Or if we have this blue, then that's really far from the chocolate bar, and we have a lot of correcting to do in our model. So these are what's known as loss functions. So for l one loss, basically, we have this absolute value shape. And l two loss, we have this quadratic shape, let's talk a little bit about what these actually mean. So here, the x axis is the difference between the predicted and the actual value. So if we go back to, you know, you know, these, this is talking about how far apart are these values. So the further from zero these are, the more different they are, right? And why over here, this is the penalty. So this is how much are we getting like penalized for being further away from our actual value. And you'll notice if you look at the scales here, l one is linear, right? So that means that if I'm like 10 away from my actual value, then my penalty will just be 10. What this l two loss, on the other hand is quadratic. So that means if I'm 10 away, then I'm actually penalizing by 100. Right? So that's, I mean, that's, that's literally x squared. Now, that also means that the closer I am, so within one away, this absolute value of one away from the actual value, then my loss is going to be less penalty than l one. But anyways, what I wanted to clarify here, let's not get too deep into the math, it's just that there's two different types of loss functions, which will tell our computer, hey, this is how far off we are. There's also this thing known as binary cross entropy loss. So this is for binary classification. Oh, yeah, I should clarify that these l one l two losses are for regression. So that's when we're trying to predict a final output. And we're telling our computer, hey, we're kind of far from that output. Now, binary cross entropy loss is when we have binary classification. So we have two different categories that we're trying to classify. And we're not going to try and understand this equation. But what we do want to take away from this is that as the loss decreases, the performance will get better as our model does better at predicting the right output, this loss decreases. Okay, so metrics of performance that we can assess that will tell us how good our model is performing. The first one is accuracy. So for example, over here, I have a bunch of different fruits, right? I have apple, orange, apple, apple. And our labels are over here. So our actual labels are apple, orange, apple, apple. Now, let's say that these get passed into our model. Well, our model comes up with these predictions, apple, orange, orange, apple. So what is the accuracy of our model here? The accuracy of this model is three quarters, it's 75% because we've gotten three of the four correct. And generally, for now, for this introduction, we will be only sticking to accuracy in order to talk about how well our model is performing. Finally, let's talk about the model itself. In this video, I'm going to focus on neural nets just because it's a very common example and very powerful example of a model. But there's many, many different models out there. So neural networks look something like this, we have our inputs. And so these are all of our features. I know that I showed you guys previously with the feature vector horizontal, but now let's take that horizontal and let's turn it so it's vertical. So if we have, you know, some sort of age, glucose level, I forgot what the Pima Indian data set was. But if we have like age, glucose pregnancies, then now it's going to be age, glucose pregnancies, okay, and this is going to be our feature vector, it's going to be this way. These values will get passed into these hidden nodes. And what does that mean, we will dive into nodes just in a second. And these outputs of these cells will get passed into an output. And this output, these output cells will determine what the final output of this neural network looks like. So as promised, let's take a look at these specific nodes. Okay, so again, here I have my feature vector, but vertical instead of horizontal. So this is x0 x1, all the way until n. So basically, we have n and then plus one more, but n plus one different features over here. Okay. Now, all of these different values, they all get some sort of weight attached to them. So that might mean, okay, we want to emphasize x0 some more. So let's double the value of that. So this weight might be two. Or, hey, we don't really like, you know, x1 doesn't seem that important. Let's decrease the significance of that this weight might be 0.5. Okay, so essentially, we're multiplying this w with this x. And now the value the output of that goes into this neuron. And this neuron is just taking all of these, like products, and summing them together. On top of that, it gets something called a bias, which you can think of as like an x intercept. Basically, this is just saying, okay, add this specific number, whatever this number represents, add this number to the neuron. And then finally, the output of the sum of all of these, plus this one, gets passed into something known as an activation function. And that activation function, then whatever output of that, that is the output of a single node in our neural net. Okay, and then you have all these nodes, you can chain them together. So I know that I use circles to represent this. But this entire thing is basically encapsulated in its own circle. So that represent like this entire circle contains this entire thing, right? So each of these circles has their own outputs that go into different nodes, that, you know, it's doing the same thing, it's taking the product, then the summation. And then that gets output into like some sort of output layer. And here, this is the final output of the neural network. I kind of lost over this activation function. So let's go a little bit deeper into that. Let's talk about that. Okay, so the activation function, if all of these were just a product and a sum and a little bit of something added, then we get an output that's just a summation of products, right? And when we chain all those together, so without activation functions, this basically this entire thing just collapses into a linear model. And by linear model, what I mean from that is just we could have one coefficient per input. So we multiply those two together, and then we add a bias. And that would be our output, it would be essentially, if all of this entire network could collapse into one cell, which if we don't have activation functions, that's kind of what happens. So we do need activation functions. Now, what are they? Okay, these are three examples of different types of activation functions that we can use. So this is a sigmoid function, this is a tanh function, and this is a relu function, rectified linear unit or something like that. I can't remember what this stands for. What this actually means over here, it just means, okay, what is the output of the cell? The y axis here is the output. So whatever, you know, this output here is this y axis over here. And with sigmoid, essentially what we're doing is we're taking the product and the summation of all the products. So all the different things that come into this neuron here, so all of this stuff that gets summed up, whatever the value of that is, is this x axis down here. Okay, so actually, with all of them, whatever that value is becomes the x axis. And we just want to map that to whatever the y value is. And that will be the y output of a single cell. And again, the reason why we use these activation functions is so that we create some sort of nonlinear, we introduce some sort of nonlinear value into our neural net so that it doesn't all collapse into a linear function. So that's the importance of activation functions. Typically, I will use relu because that is just a very classic activation function that is known to work. All right. Now, how does this part the training actually work with a neural net? Let's talk about that box for a bit. Okay, suppose that we're using our l two loss function. Okay, so again, that's this quadratic value. And remember that these x axes are how far is our predicted value from our actual value? How far is that we can calculate a numerical distance for that? Then this y value is, okay, here's the penalty for being that far away. Now, if you're really, really close together, then that penalty is going to be really small. But if you're like, pretty far, that penalty is going to be even larger. So up here, the error is really large. And our goal is to decrease this loss. So our goal is to get somewhere down here, right, like ideally zero, because that would mean that we have a perfect prediction. But we want to get just close enough. Now, that means that we have to take the giant step in this direction in order to get to that value. And in order to do that, we can use something called gradient descent, which is a heavy mathematics concept that will also include some calculus. So we're not going to cover that in the scope of this course. But let's kind of just get a general sense for what gradient descent means. Basically, our gradient descent means okay, this is where we are. And so what is like the slope at this point? So at this point, the slope looks something like this. Down here, the slope looks something like this. And down here, the slope looks something like this. So basically just means like at your specific point, how much are you changing? So that's your slope. Okay, so now once we have that value for the slope gradient descent is saying, okay, well, let's follow that slope down, because we want to minimize our value, right? So we're just going to follow gradient descent down towards wherever it's going. So if we take a look at these different, like w values, because these are the ones that we can adjust in order to train our neural net. So these remember are what we're multiplying with our inputs in order to get that summed output, which then goes into the activation function. So we're taking a look at our w's, which are our weights. And maybe, you know, with our current model, with this weight, it's super far off from where we want it to be. And so what we can do is we can calculate this, like trajectory, and we can say, oh, we want to take a step this way. Now, in w one, we might be slightly closer, and we want to take a smaller step in this direction. And finally, you know, another one of the random weights might be even closer. And so then we only take a smaller step in this direction. But essentially, what backpropagation does what this gradient descent does is it's telling us, okay, we want to take a step in this direction in order to correct our weight, how much of an adjustment do I make? And in what direction? direction. Now, in order to get that new weight, I'm taking the old weight, and then I'm just tacking on some alpha. So just some like very small value, that's what alpha represents some very small value, we call that the learning rate, but some very small value times this step. Okay, so basically, our new value will be this old one, and then adjusted slightly by this. And this, as I mentioned, it's called the learning rate, this is typically a really small value, and it's just to make sure we don't like overshoot. And so with all of these different weights, we can take this value, and we can multiply it by our learning rate, and then calculate out this new value based on the old one. And you'll see that this magnitude of this vector is smaller than the one over here, which means that we're going to make a smaller step. And if you want to get really technical, then sometimes you'll see this as negative alpha times the slope. And that's only because or like the derivative, that's only because here, I'm I'm don't get too caught up in that. The whole point is we want to take a step down towards our goal down here. And here, I've already flipped the signs. So that's why there's a positive if you're wondering, if you had no idea what I just said, don't worry about it. Just get the general idea of this. And this is how like training neural networks, we adjust our weights based on how far we are from where we want to be. Okay, so the moment we've all been waiting for, let's talk about convolutional neural networks, otherwise known as CNNs. So here we have this handwriting in an image. And the whole goal of our let's say that, okay, so let's say that we are building a model to detect what number in handwriting is written in an image. So here we have an image where, to us, our human eye, we say, okay, I think that's a five, right? But our computer doesn't know that. How do we get our computer to determine that using supervised learning? So that's where CNNs, convolutional neural nets come in. Basically, the idea is we want to somehow pass this image into a neural network, what we just talked about, in order to produce some sort of output prediction for the number that's actually in this image. Well, this image doesn't really translate into a neural net too well, right? Because think about the Pima Inions data set, we had different values in each column, which means that every single sample, so this here, this five would be technically one sample, every single thing in our data set had like, this very nice vector of values to associate with it. But here in our image, we don't really have that nice vector, right? It doesn't really pass into this neural network too well. So how do we solve that? And the answer is using convolutional neural networks. So this area here, which we'll talk about, this is our convolution part. But essentially what we're doing here is we're trying to extract features. What that means is that we're doing all these operations so that we can get this input, boil it down to a vector that's easily passed into a neural net that can actually perform the classification. So all of this that we're going to learn in convolutional neural neural network is just tacked on to the beginning of a neural net. So we can take an image as is pass it through all these different layers. So we can finally produce an output vector for that input, and then pass it through the convolutional neural network. Okay, so something to notice is that images are actually numbers. So here I have an image of this x and you'll see that certain pixels are darker than the others. But our images are composed of arrays of pixels. So basically, we have this like 2d matrix, right? And each matrix, each cell has some number associated with it. The darker it is, the closer to zero it is. And the lighter it is the closer to 255 it is. So white is 255 and zero is black. And now you know, something that's gray might be in the 100s. But essentially, this maps to this over here. And now once we have this 2d array, we're going to pass something called a convolution over it. So a convolution, by definition is a mathematical operation on two functions that produces a third function. What that means in our image processing world is that okay, here we have some sort of input 2d array, which represents our image, right? Because our image again, is this 2d array, we have this 2d array. And then we have some sort of filter on that 2d array, right? So this filter is going to take these cells in its size. So like this filter is three by three. So it's going to take cells in a three by three grid on this input map, it's going to do some sort of operation, most likely summing up everything, and then mapping it to something on an output map. So it might take like, so basically, what you're going to do is take this thing and keep sliding it over every single possible three by three window on our input, project that onto this output. And that's a convolution. So here's a little example. Okay, so this is our filter. And we're just going to overlay this on each part of the matrix. These values are just extensions of this, because we don't have anything to fill it. But we're just going to keep sliding it over, summing all of these up, you can see the summation calculation down here, projecting it onto this output matrix over here. So we do that for all the different rows. And then we finally get this value, this output value, based on this convolution. Now, there's many different types of things that can come out of this filter, also known as a kernel. So here's one example, this is some sort of edge detection on this original input. So this edge detection will give us all the edges in the image, that is the convolutional layer of a convolutional neural net. Basically, what we're doing is we're taking our image, we're trying to toggle these filters so that they're able to extract some sort of meaningful information from our original images. So like here, these are edges, but like, in some cases might be looking for eyes in the image, right? And it's going to take it's going to make a filter in order to detect these objects and map it onto an output image. So that's a convolution and our convolutional neural net, we're essentially training that filter to be able to detect these important properties. Now, another important concept is pooling. So if we go back to our CNN image, you'll see that we have all these convolutions, and then we pool them. So what does this pooling mean? Okay, so pooling is essentially taking a larger input, and reducing it down to something smaller that might still represent the data in that original output. So if we use two by two pooling, basically what that's doing is taking the two by two cell in the original input, taking the largest, because it's max pooling, it's taking the largest value in these four cells, and projecting it down here. Over here, 184. So it's 184. And this two by two array, it's 12. So 12. And then over here, it's 45. So it's 45 down here. There's also a different type of pooling called average pooling, which instead of taking the maximum, we'll just take the average of the four values in the cell and project that. So what you need to know from pooling is just we're taking a larger array, we're condensing that information down into a smaller, more compact array. And so then we put those two together, and that becomes our CNN. Often, we usually have more convolutional and pooling layers. And then we also have more fully connected layers. But this is a good schematic of what's going on. This is a visualization that I got from this website, I will post these slides and the site in the description. But essentially, here, it's really cool, because what I did was I essentially drew this five. And this is a first convolution, this is what happens after the first convolution. This is what happens after the first convolution. Essentially, it's taking this five and projecting it towards many different onto, you know, different kernels and such. And then it's taking these and pulling these. It's doing another convolution on these pooled layers. And then pulling those once more, flattening it into some sort of, you know, linear thing that we can put into a neural network that goes into this neural network. And then the output of all of these cells will tell us what value it actually thinks it is. So let's click on that link. This is something of what it looks like. If I draw a one, you'll see that the first guess is one and the second guess is zero. And each one of these cells on here is taken from something down here. So like this specific cell is the output of these cells over here. Okay, so all of these are some sort of filter. So it actually shows you the filter if I click on it. So it takes the input image perform some sort of filter on that area, and then gets this value. So here, our input, while we see this like grid, and we see these two blue cells in our input, and the filter, we use that filter in order to get this blue cell over here. And you'll see that these are the pulling layer. So it takes multiple cells and then pulls them down to only like to smaller array. And then we perform some sort of convolution on these. And this might be the input of multiple different images. So like here, we're taking four images, and we're getting this cell over here, right? So we have all these inputs. And these are the filters on all of those. And it gets some sort of output based on that. Whereas here, you know, our inputs are filters, you get the point. But basically, it's like, it's just taking all these inputs, putting some sort of filter on it. So we're summing up all these different things. And then we have some sort. So they're using a tan function for their for their nonlinearity, that activation function that we talked about. And finally, this, like all of these, again, pooling, each of these cells is some sort of value from derived from all of these pooled values. And then finally, this is our fully connected neural net. And then there's an output layer, which is over here. So this is our output. And over here, it's saying the one is lit up, saying, hey, we think that this is a maximum value, which means that this is what we think is our prediction, you'll see that zero is negative point nine, six, seven is negative point nine, nine. Well, we think that this one is the max. So play around with this. This is a really cool, let's do like a four. Okay, so our first guess is for our second guess is one. And you'll see that floor floor over here is lit up. The big picture here is that we're taking this input, we're projecting it multiple times, we're doing some sort of like filter over each like grid, mini grid in this original input. And we're just doing a bunch of like, array filtering mechanisms, basically, putting that into a this layer here, which is essentially our feature vector for this image. And then putting that through a neural network in order to get our prediction. Okay, so we don't actually have to code all that, we just kind of have to have this understanding of how it works. Because the beauty of machine learning is that there are machine learning libraries, where we can actually implement our models. So what that means is that we have this model that we want to implement in machine learning. And it might look something like this. But what we can do is we can replace it with something that has already encoded all the different like, mathematics concepts behind each of these layers. And we can simply say, okay, it's a sequential model with, you know, two fully connected layers of 16 units using relu activations. And then we have this output. And that is the beauty of TensorFlow. So TensorFlow is an open source library that helps us develop and train machine learning models. And in this next example, let's actually take a look at how we can use TensorFlow and train a convolutional neural network on different images of food in order to produce predictions of whether or not a food is a hot dog. Now that we've learned about the basics of machine learning, neural networks and CNNs, let's actually look at an example and see how we can use TensorFlow in order to build a model, train a data set and use this model to actually evaluate outputs. So I'm going to go to colab dot research dot google.com. And I'm going to start a new notebook. And this notebook I'm going to title hot dog versus not hot dog example. Because today, what we're going to be classifying is pictures of food. And we're going to be labeling them as hot dogs, or not hot dogs. So let's dive into this. All right, the first thing that we want to do is import a few different libraries. So we can import NumPy as NP import pandas as PD import matplotlib.pyplot as plt. Let's import random, we can import import TensorFlow as TF. And from TensorFlow.keras, we're going to import data sets, layers and models. And then finally, we're going to actually get our data from TensorFlow underscore data sets. And we're going to import that as TFDS. So basically, these are some data analysis tools that we have. PLT is a plotting tool. This is just the random library. So if you need a random number generator or something, TensorFlow, that should be Keras. This is more TensorFlow stuff. And then this is finally the data set that we'll be looking at. So go ahead and run that cell, you can run it by clicking this play button, or you can use shift enter, and that will run a cell. So in this video today, we're going to be building a hot dog versus not hot dog classification model using TensorFlow to distinguish hot dogs from not hot dogs in food images. Alright, so now I'm going to add a citation. And you don't have to worry about this. But this is just so that we can give credit to the people who actually produce this, this data set that is. Alright, so let's talk about the data really quickly. So TensorFlow already has the food 101 data set. So we'll actually use that. And you can learn more about that by clicking this link here. Now, in this data set, the hot dog label is label number 55. So let's take a look at this data set itself. Okay, so the first thing that we need to do is actually import this data. And we can do that by using TFDS. So that, again, is our TensorFlow data sets over here. So if we do this dot load, and then we use the string food 101, then that actually TensorFlow will load the food 101 data set. And we're going to shuffle the files. And because we want this as a supervised data set, we're going to set this to true. And we'll just also include info in that as well. And here we this will actually return a tuple with the data set as well as the data set info because we have chosen to include that over here. Alright, so we're going to run this cell. And this cell will actually take a little while to run. So we'll just sit here and wait. And you can pause the video until that cell finishes running. Alright, so finally, our data set is loaded. So the first thing we're going to do is actually split up the data set into train and validation sets. So here, I'm going to get a train data set and a validation data set. And this is because this data set automatically comes with train and validation data sets split up. So I'm going to grab those. And then let's actually show some examples. So what I'm going to do is I'm going to use TSDS dot show underscore examples. And what I'm going to do is actually pass in the train data set and then the data set info, which contains all the different information. So if I hit enter there, we get, you know, pizza, chocolate cake, bruschetta, waffles, etc. And somewhere in there is hotdog. And actually hotdog, the label for that is number 55. I found that I think by just running this a bunch of times and seeing where hotdogs came up. One thing that you might notice here is that these images are all different sizes, right? They're not all square. So the first thing we want to do is actually we want to resize it. And then the second thing, you might notice that we have a bunch of different labels, right? But what we're interested in is hotdog versus not hotdog. So what I'm actually going to do is I'm going to take an image, actually, let's use this one, this chocolate cake image over here, I'm going to resize it so that it's 128 by 128. And then pixels, and then I'm going to cast the label, such that it either equals a hotdog. So that's one, or it doesn't. So that's zero. Okay, so the maximum side length that I want here is going to be 128. Then the hotdog class is equal to 55. So what I'm going to do is I'm going to take this training data set. And I'm going to use a map function. So what this map is doing is saying, all right, whatever function is in this map, that's how we're going to, we're going to apply that to every single item in our data set. And we're going to transform the data set that way. So by this lambda, well, this is basically our new function, lambda means, okay, what are the inputs, you have an image and a label in each data set, okay, so you have this image, and you have this specific label in each data set. Now, with that, we're going to get a new tuple. So our new image and our new label, and that new image, we're going to use TensorFlow image resizing. And we're going to pass in the original image, and then the actual size that we want. So this would be max side length by max side length. So basically, we are resizing this image, the original image into something that's a square pixel 128 by 128 pixels. And then that's going to be the first thing in our tuple in our data set. Now, for the label, what we're going to want is actually whether or not the label in this pass in label over here, whether or not that equals the hot dog class, right? So this becomes our new data set, we're basically resizing and then our label is just true or false, zero or one, I guess true is one zero is false. And that's just whether or not it's equal to the hot dog class. Now, one other thing is that TensorFlow will complain about the the type of data that's held in here. So one thing that we're going to have to do is TF dot cast. And then we're casting this output into TensorFlow int 32 outputs. And the same thing down here. Cast, we're going to cast that into TensorFlow in 32. All right, did we close all of our parentheses? Let me just double check. Okay, yeah, it seems like we did. So then we're going to do this exact same thing on the validation data set. So here, now we're, again, mapping the image into something that is a square, and then we're casting the label into whether or not it equals a hot dog class. So let's run that cell. And now we have our training and validation data set. And let's just verify to see if that works. So let's just show these examples. All right, now everything looks like it's a square. And all of these say apple pie, even though it's not really apple pie. And that's just because in this data set info, it's still 01. But this the zero just means that it is not a hot dog. As you can see, all of these are not hot dogs. Zero is just apple pie is just the default label for zero. But what zero means in our context is that it's not a hot dog. So if we actually refresh this enough times, we'll get maybe one example with a hot dog. Let's see. Yeah, okay, so I guess the default label for one is baby back ribs. So in our specific example, like, apple pie means not a hot dog. And then baby back ribs means hot dog. Okay, so essentially, you'll just see that anything that's not a hot dog is zero and something that is a hot dog has a label hot dog is a one. So now we have our data set in these square images, and they're all the same shape. And then we also have our zero and one label. Okay, so one thing that we're going to have to do is there's actually only 750 of each food in. Okay, so something that we're gonna actually have to do is rebalance our data set. So in our training data set, there's actually only 750 hot dogs in like the entire data set, there's actually 750 of each different category. But the limiting factor here is the hot dogs. So let's try to reshape this so that we don't have a bajillion not hot dogs, and then only a tiny little bit of hot dogs to train on. So here, the training hot dog size, and the validation hot dog size, I'm going to set this equal to 750 and 250. I just know that from like the documentation of the data set. Now, if I get train hot dogs, what I can do is take this training data set, put a filter on it. So whatever, I don't really care about the image. But I'm basically filtering this by the label being equal to one. Now, we can also train the not hot dogs. And this is equal to again, this image with the label, and the label now being zero. So this is going to significantly outweigh this one. So what I'm actually going to do is I'm going to up sample this by just repeating this three times. So I'm going to take that 750 and duplicate that a few times to construct a new data set. And I will actually do the exact same thing for valid validation. So our valid hot dogs will be taking the validation data set and running the exact same operation on it. Okay, cool. Now we split our data set into hot dogs and then not hot dogs. But the issue is that the not hot dogs data set like outnumbers the hot dogs data set by a lot, even though we repeated this hot dog data set. So how do we actually sample from each one and get a balanced data set. So that's what we're going to do here, we're going to take we're going to create this new training data set by using this function that we need to get. So in TensorFlow dot data dot data set, we can call the sample from data sets function. And here, we're going to pass in all the data sets that we want to sample from. So for training, that would be train hot dogs and train, not hot dogs. And the weights for each one. So the weights here, we want 50% from here and 50% from here. So I'm going to do 0.5 and 0.5. Okay. And then finally, we want to tell this that once we've reached an empty data set to stop, so there's this option stop on empty data set, oops. And we're going to set this equal to true. Okay. Okay, so now to get this training data set into something that we can actually pass into our neural net. What we're going to do is we're going to cache, then batch, then prefetch. And I'll explain that in a second. So let's just type that out first. So first, we're going to cache this. And then we're going to batch it by some batch size, which I will actually set up here to be 16. And then we are going to prefetch using the parameter auto tune. And now I'm also going to just paste this for validation. Oops, valid. Here, this should be the valid data set. All that still applies. And instead of the train hot dogs and not hot dogs, instead, this is valid. Okay. So what caching means is that we're going to cache the data set somewhere. So that means we're going to save it either in memory or local storage. And that will save some operations such as file opening, and data reading from being executed at the beginning of each cycle of running the neural net of training the neural net, sorry, we batch our data set, because instead of passing just one image at a time into the neural net, we can actually pass a bunch in and in this specific case, our batch size is 16. So in this case, we're actually passing in a whole batch of images into the neural net, so that the neural net can train it and then using all 16 of the images in the batch rather than just every single image and training on that. Now, finally, this prefetch over here, this is simply to save more time, it overlaps the time that the pre processing of the data is taking place, and the model execution of the training step. So basically, while the model is training, step s, the input pipeline is actually already reading the data for the next step. So this will just save us a little bit more time. Okay, so now our data pipeline is ready. And what we can actually do, let's run this. Alright, so now just for the sake of proving that this data is in the format that we want it. What I'm actually going to do is iterate through just like the first item in the this training data set. So I can do that by getting training data set dot take one. And so we can get the image batch and the label batch. And let's print out the image batch. And let's print out the label batch. And I need an in here for my for loop. Okay, so you'll notice that this image batch is kind of oops, useless, because it's just a bunch of tensors. So there's actually 16 images in this tensor. And the reason why I can confidently say that is because when I go down to the label batch, somewhere down here, it actually gives us a shape 16 vector of the different labels in zero in one form. So here zero, again, means not a hot dog. One means it is a hot dog. And look at how well balanced this is, there's approximately ish, like 5050 in this output, right? So that means that we have a decently well balanced data set. And with that, we can actually start the neural network. Oops, let's use text for that. We can start the neural net implementation. Alright, so I'm just first going to seed this so that certain results are reproducible. But we want a sequential model. So I'm going to say models dot sequential. And now this is where we're going to start building our CNN. The first thing that I'm going to do is I'm going to actually rescale the images so that we divide by 255 for each pixel. And now what that does is instead of the scale of zero to 255, for all of our colors, we now get zero to one. And one just means white, zero means black, or like if there's color for RGB or not. Now we're ready to add some convolutional layers. So here, the next thing that I want to add is layers.com 2d. And it's going to first ask for how many different filters do we want. So let's just put 128 for that, our kernel size, let's make this three by three. So that's the size of that filter that we're moving across that image. And then let's use a relu activation function, just because it's a classic. And our input shape. So our input shape here is equal to max side length, max, side length, and then three because we have colors. Alright, then we want to actually add a max point layer. And we'll just make that two by two. That's just pretty standard. Again, we'll add another convolution layer. And now instead, here, this will just be 64 filters, we're still going to use relu. And we actually don't have to pass in this input shape anymore. Now, let's add again, another max pooling layer. So actually, I can just paste that. And I'm going to go with another convolution layer. In order to get from the convolutions to our fully connected, we actually have to add a flattening layer. So here, I'm going to flatten that. And let's just add a dense layer. So that's just a fully connected layer. dense 128. So that means like each input goes to every single node. So 128. And let's make the activation again for this relu. And finally, the output of this because it's binary, because it's either zero or one, I only need one node at the very end that will tell us what it is zero or one. Okay, so this is the general gist of what our neural network will look like. So let's create this. And let's get started on training. So our learning rate will be 0.0001. And let's compile this model. Okay, so a very classic optimizer to use is Adam. And so that's what we're going to use here, you can kind of think of this as a tool to help us adjust the different weights, just like in the diagram that we showed earlier to go down that gradient towards that minimal loss. And our loss here will actually be something known as binary cross entropy. So we will use losses dot binary cross entropy. And the reason the reason why we use binary cross entropy is because we have only a single output and we're trying to do binary classification. So whenever it's binary classification, we use binary cross entropy. And here we have this from logits option. So the reason why we set from logits, we actually have to set this from logits to be true, because our final layer does not automatically project the output onto zero to one, which we would do using a sigmoid function or something like that. So down here, we have to set from logits to true to let our, I guess, loss function know that it's not already projected from zero to one. And finally, the metrics that we want to use to assess this might just be accuracy. Okay, so let's compile the model. Oops. Oh, okay, so I'm missing an S over here. That should be accuracy metrics equals accuracy. Okay, cool. So now my model is compiled. And I'm just going to set epochs equal to 50. And we'll collect the history from here. But we're essentially going to fit this model to the training data set, the validation data will equal the valid data set, epochs will equal this epochs parameter that we pass it. And then we're going to set verbose to one so that we can actually see things get printed out. So let's run this. All right, this will take a little bit of time, but you see that the accuracy starts off not so great, it started at like point three or something. Okay, now it's at 0.5. So the thing is, we expect just a random model to be at an accuracy of 0.5, because we have equal parts of the hot dogs and the not hot dogs in the in the data set. So we do expect initially that the accuracy would be around something like 50%. So our goal is to see that if we can improve that. Okay, so now now our model is done training, let's go ahead and take a look at the results. Here, this is each epoch. So essentially, an epoch just means an iteration where we go through the entire training data set. And we see that the loss each time as we train our model is decreasing. So this means we're getting closer and closer to our goal of matching up our predicted labels with the actual labels. And you can also see that reflected in the accuracy. So we start at somewhere around 50%, which is expected because our data set is comprised 50% of the hot dogs and 50% of not hot dogs. So initially, we would kind of expect it to be randomized. So our accuracy at first is 50%. And we see that it increases. And it actually does fairly well. After a while, it gets to an accuracy of 100%. Now, let's see if our validation data tells a different story. So remember, the validation data is not actually data that we pass into the training, our validation data is kind of this data that we set aside to evaluate how our model is performing in the process. So if I come over here, we see that our loss is starting to decrease. So then it actually kind of goes back up. And we also see that with our accuracy, okay, it starts to go up to maybe about 73%. And then it starts to decrease again, and it kind of stays steady. And our actually our loss also increases up to around 2.93. Even though our training loss is extremely small, and our training accuracy is 100%. So what is going on here? Chances are what we're doing is overtrading our model. Essentially, what that means is that we've passed so much data into this model so many times, that the model has actually remembered each piece of data. And that model, when it remembers each piece of data, it can actually predict all of your training labels 100% correctly. But when it sees new data, it can't really generalize. And so that's why you see this decrease in accuracy and loss. Alright, so what we're going to do is actually go back in this model and see if we can make any changes to make this more robust. Now, one thing that we can do is add something called dropout layer. So after each max pooling, I'm going to insert a layer called dropout. And I'm going to set this parameter to point to five. What this is doing is saying at, you know, just randomly, during the training, we're going to turn off 25% of the connections in between this layer, sorry, this layer, and this one. And so by turning off these connections, you're essentially training the model to be, I guess, susceptible to a little bit of randomness, you're trying to train your model, so that you, you know, might not have the exact same features every time, but you get the same output. So I'm going to go ahead and add this to some of the layers. Okay, so now I'm going to do this before we flatten. And that's going to be my dropout layers. One more thing that I wanted to add to this model before we actually ran it was some data augmentation as well. So I'm going to insert some code up here, and call this data augmentation. So here, I'm going to use tf.keras.sequential. This is going to be our data augmentation layer, which means that when the data when the image is passed into the model, we're actually going to perform some operations on it. So here, what I'm going to do is first pass in a random flip layer that will flip it along the horizontal axis. And then I'm going to pass in a random rotation layer. And then I'm going to pass in this random rotation and use the factor 0.2 to randomly rotate within the range of negative 20% times two pi and then positive 20% times two pi. So basically, when our image is passed in, after it gets rescaled, we can add model dot add this data augmentation layer, which will essentially perform these operations on our input data. Alright, let's take a look at how this data augmentation function actually does to our data. So I'm going to actually try to extract the first image from our original training data set. I'm just going to get that first image. And then let's actually show that image. Oh, no, that's not what I wanted. I wanted to take the first one. So I have this dot take there. Okay. And then let's show that image. Alright, cool. Let's run this again. Let's get a more square image. Okay, cool. So we have this like ramen over here. So now what I want to show you is what data augmentation does and how this is actually being rotated. So something that I'm going to do is I'm going to cast this to a batch so that we can actually put it through our data augmentation layer. Now what that means is I need to expand the dimension. So essentially, I just want a list where the only thing in the list is my image. In order to do that, I need to use expand dimensions. And I'm passing in the image, and I'm just going to expand it along the very first dimension. So that's what the zero is, it just saying, okay, like put this into an like a list holder. And I will actually cast this as well into float 32. And one more thing is that this currently holds values from zero to 255. But when we actually show the image, it's going to expect from zero to one. So I'm just going to right here divide this by 255. Okay. And now we can finally show what the figures would look like with the data augmentation. So if I create a figure with fig size equal to 1010 or something, I'm just going to plot this with nine rotations. And I can create an augmented image and call my data augmentation pipeline on this image. And now let's just plot that. So let's define the axes for the subplot. And it'll be a size three by three. And then I'm going to show this on the figure. So let's do the augmented image. And remember that the augmented image is a container that holds our image. So I actually have to index into that in order to get the image itself. And I'm just going to turn off the axes. Now if I run this, let's see what happens. Okay, so you see that we actually get different rotations of our food and maybe you know, some flips here. But essentially, it scrambles our data set a little bit, which is exactly what we need in order to build robustness, so that we don't get the same input data every single time. So I'm just going to add that as a layer right here. So let's run this, run this. And so essentially what I've done is I've added this data augmentation to my input data. And then I've also created these layers of dropout that will allow us to train with certain nodes and connections missing at, you know, a given training interval. Alright, so one change that I'm actually going to make from this original model is I'm just going to move this dropout down here. So as far as I know, when you're building your model, they're, you know, general architectures work well, but there's no good answer for where should you place these, you know, layers? How many nodes should you have in these layers? How many? What should the stride length be? A bunch of different questions that, you know, what activation function do you use? These questions are things that are kind of almost a little bit trial and error from what I understand. Essentially, a lot of times what people will do is simply just try to go through and try a bunch of combinations based on stuff that's worked in the past and see, you know, what will produce the best outputs for you. Alright, so one more thing that I wanted to add to this was actually a kernel regularizer. And what that does is just it limits how much we can change the weights of a given kernel. So how I would do that is I would just pass in this kernel regularizer parameter here. And I would set this equal to tensorflow.keras regularizers. And let's just use an L two, so that we like, create a bigger penalty for bigger changes. And I will use 0.01 as my hyper parameter here. Now I'm also going to add that down here. Oops, added a period over here. All right, so I added this kernel regularizer to each of these, except for this top one. And just keep in mind that the kernel regularization is meant so that we can, the weights of the filters that we're using in the convolutions that we're training in the convolutions, those don't change super fast. And for the purposes of this project, I'm actually going to also decrease the number of nodes that we have in each of these layers, just so that trains a little bit faster. So but these are actually parameters that you in your free time can go and play around with you can play around this parameter, this where you insert the dropouts, different types of regularizers, different type of activation functions, and filter sizes, etc. So let's run. So let's run this cell. And I'm going to get I'm going to compile the model. And then finally, let's train the model down here. Okay, so now it is a waiting game. And we will just wait. Okay, so our model is finally done training, let's take a quick look at the results. So here we go from a training loss of 1.6, all the way down to something around 0.57, 0.56. And we see that the accuracy goes from 0.5 to around like 70%. So 50% to 70%. And this is remember on the training set. So now if we take a look at the validation set, so remember, this is a data that our model hasn't seen yet, it goes from around 1.2 and also 50% accuracy to something around 60% validation loss or sorry, 0.6 validation loss and 70% accuracy. So we have shown that we are effectively training a model. I will say that throughout, you know, when I was creating this video, the best accuracy that I could get was something around 75%. So for you guys, if you go back, play with some hyper parameters, the layout of the model, different activations, regularizers, etc. And if you have an accuracy that's better than 75%, please do share it with everybody. I would love to learn from you. So what this means is our model can achieve something around 70% accuracy on classification of hot dogs versus not hot dogs on this image. And now ideally, we would have also had some sort of test data set that we could try this model on. But unfortunately, our data set did not come with that. And I did not create that. So that's okay. We're going to use our validation data set to actually just demonstrate that this works. So I'm going to paste here the the figure that we use in order to draw our data augmentation images. And instead of doing this, what I'm going to do is instead use image batches from our validation data set. So image batch and label batch. In our validation data set, take the first item, remember, it's a batch already of 16. So the images that we want to use are this image batch. And the labels that we want to use are this label batch. There's probably a better way to do this, but I couldn't get it working. So this will suffice for now. And then so let's run that. Okay, and then here. So in this range nine, instead of an augmented image, what I'm going to do is actually just plot the image so that we can see what it looks like. So instead, we're going to take the images and get the ith image from that. And let's print all of these. From me looking at this, clearly, this is a hot dog. I don't think that's a hot dog. Can't really tell what that is. hummus, hot dog, garlic bread, hot dog, hot dog, oysters, and it looks like sausage and waffle. And now if we look at our labels, one means that our model thinks it's a hot dog, zero means that it doesn't think it's a hot dog. So for the first row, we get yes, no, no. So we get hot dog, not a hot dog, not a hot dog. And then we get yes, no, yes. So hot dog, not a hot dog, hot dog. And then yes, no, no. So hot dog, not a hot dog, not a hot dog. And so our model actually in these nine images was able to classify all of them accurately. So we would literally be able to pass this image in, ask it, is this a hot dog or not, and it would be able to say yes, it is, which I think is pretty awesome. That concludes our introductory course on convolutional neural nets. Thank you guys all for being here with me today learning about the basics of machine learning, neural networks, how to train them, and finally, convolutional neural networks with our hot dog or not hot dog example. I hope that you guys learned a lot. And of course, post comments, let's help each other learn and we can all get better at ML together. Don't forget to subscribe to my channel Kylie Ying, I will actually be releasing a course on artificial intelligence later this year. So stay tuned. Transcribed by https://otter.ai
Info
Channel: freeCodeCamp.org
Views: 72,551
Rating: undefined out of 5
Keywords:
Id: nVhau51w6dM
Channel Id: undefined
Length: 87min 41sec (5261 seconds)
Published: Mon Jul 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.