Learn TensorFlow and Deep Learning fundamentals with Python (code-first introduction) Part 2/2

Video Statistics and Information

Video

Captions Word Cloud

Captions

hello and welcome to part two of this fairly long or lengthy it was originally going to be 14 14 not 40. 14 hours long video but youtube's video limit is 12 hours so i've separated it part 1 if you're watching this you should have already watched part one so if you're hearing me say you should have already watched part one click on the link to part one and make sure you go and watch that but this is part two which basically continues off right where part one of this learning tensorflow and deep learning code first video series starts off so i want to hold you up all the links you need such as the course github repo all the materials will be down below and if you want to sign up to the full version of the course which is about 20 plus hours more of the materials than what you've already covered there'll be a link to that in the description below as well without any further ado happy coding now we've seen that our neural network that we've built can model straight lines with somewhat better than guessing accuracy however when it comes to non-straight lines it's not doing too well and so we also mentioned the missing piece we're like treasure hunters here the missing piece is non-linearity so i cannot emphasize enough how important this concept is for neural networks now before we write any code i want to introduce to you a little question here i want something to think about just over the next few videos while we we start to learn about non-linearity what could you draw if you had an unlimited amount of straight in other words linear and non-straight or non-linear lines so just think about that what kind of patterns could you draw you have an unlimited amount of straight and non-straight lines let's look at our data again we have some linear data so you might think is this possible to model with straight lines i think so and then we have some non-linear data not possible to model with straight lines now you could post an argument to say oh well daniel what if you just draw really small straight lines and go in between the gaps here well that is one option but let's just pretend that we need straight and non-straight lines to draw the patterns that we need so keep thinking about this question and again before we write any code we're going to have a little bit of a play around i introduced the tensorflow playground in a previous video however we haven't looked at it together just going to playground.tensorflow.org and we can set up our data here what i might do is zoom in nice and far okay beautiful so we have a few things going on here we have data now this data looks very similar to our circulator that we're working with except they're using orange and blue dots instead of red and blue dots and we have some features x1 and x2 we could pretend that this is our x0 and x1 now right now this neural network has two hidden layers what we might do is reduce this down to one hidden layer and one hit a neuron so a very simple neural network and let's have a play around what are these we've seen the learning rate all right let's switch twists to what have we seen before i believe atoms is the default is 0.001 and activation we haven't really played around with these what we might do is switch this to linear now we've seen what a linear data is what linear data looks like and we've got regularization rate problem type classification all right so let's just leave it these settings for now and i'm going to press play and see what happens so we get the test lost that starts to oh it's just evening out at about 0.5 what does that remind us of do you remember our model that we built previously that only ended up with 50 accuracy i think this is what's going on here i know to test this out how about we recreate this neural network of what's going on we've already got the data we've already got we can create the hidden layer and we can set the learning rate we can we might see how we can set the activation too all right let's do that let's go back and let's write some code so come in here now how might we do this just as before we're going to set the random seed and tf random dot set seed 42 and now let's go number one create the model model for tf carers sequential if we come down here and then we want to go cf carers dot layers dot dense one and the activation can be what did we set the activation to in here linear how might we get access to that well one shortcut is to just go linear as a string or we could do tf carers dot activations dot linear ah all right now we've got a parameter that we haven't seen before activation but we we actually have seen this before if we go back to our slides and i think we might have to go a few back here yeah there we go we've got an activation parameter now we're going to start to look into we've seen how we can add layers to try and improve our deep model we've seen how we can increase the number of hidden units but right now these aren't really working for us so we're up to changing the activation functions all right so we're we're working with this hyper parameter here now let's go back and what we're going to do is compile our model because we want to re recreate what we've done in tensorflow playground and see if we we can replicate those results this is a very a very good exercise to to go through for yourself if you're playing around with the tensorflow playground and to just re recreate what you can see in there binary cross entropy this is the because we're working with a binary problem now this is another way you can write the losses you can actually write the losses in many different ways we've seen this one binary cross entropy and we could even do it let me just introduce you to writing things as strings cross entropy just in case you see it something like that somewhere else and the optimizer we can also set this to atom that would be the same as writing it out like this tf carers optimizers dot adam the benefit of being able to write it out like this is that we can set the learning rate remember lr is short for learning rate and the metrics can be accuracy so we've got the same thing so far come back to our neural network playground features x we haven't passed our model data yet but we have these these are our blue and red dots we've got one hidden layer with one neuron and the activation is linear the learning rate is 0.001 okay now let's see if we can number three is fit the model and we're going to start to add the history variable here whenever we start to fit models now because this will come up later on just want to start building that habit all right let's do it 100 epochs i mean this one's run for over a thousand but we're just going to stick with a hundred so we don't have an incredibly large output in our notebook and shift and enter all right what kind of results are we getting wow looks like our model is performing worse than guessing right now is that aligned with the okay so the tensorflow playground is getting quite similar results they're basically still guessing as well if you were trying to separate these two the blue and the orange dots if you were just tossing a coin for a thousand different times and you got heads and tails 500 different times well you're going to get these about these results now let's remind ourselves of what our data looks like because whenever our model's predictions aren't working very well we can evaluate our model with by looking at the evaluation metrics or we can adhere to our model of visualize visualize visualize so our data is zero get the zeroth axis of x and the first axis of x and we'll color it with y and we'll choose a color set up as plot dot color map red yellow blue so we're just reminding ourselves of what our data looks like all right wonderful and now we've got model 4 which is a trained model albeit looks like it's not performing very well let's check out its predictions anyway check the decision boundary for our latest model so this is why we created a handy function before our beautiful plot decision boundary function where we pass it our trained model this can be model 4 we'll pass it our features and y what does it look like where's our model's decision boundary oh my goodness it's all over the shop so our model if blue is the blue class and red is the red class i mean red's right up here yellow is the crossover because we've set our color map up as so our model is basically going you know what anything in this yellow could be blue or red which is basically why our model's accuracy is blow guessing now what can we do it looks like our model is still predicting straight lines so what we might try is in the next video or actually if you want to jump ahead reset the tensorflow playground and try playing around with activation here what happens if you set it to a different value see what happens and we'll go through that in the next video in the last video we had a play around with the tensorflow playground and we saw that setting the activation function with one hidden neuron one hidden layer learning rate of 0.001 didn't result in a very good separation of the orange and blue dots similar to to the neural network that we built for our latest neural network that we built so there's model 4 we just copied what we've got here now i hope you've thought about this question here what could you draw if you had an unlimited amount of straight linear and non-straight non-linear lines so remember our data is that we're trying to model for classification is non-linear so it's not possible to model with just straight lines alone if we come back here we've we've spoken about non-linear right now our activation is linear how about if we change this activation function to something that is not linear so it could be any of these three values even though we haven't we haven't seen what these are let's just change it to let's try the top one relu okay and let's see what happens if we just run this hmm still not getting a very good value i wonder if we just keep it training for longer well we're almost at a thousand epochs now and it's still not improving but before we explore different ones here let's see how we might use the relu activation in our own neural network so let's come back to our notebook let's just see if we can replicate we'll take another practice and replicate this neural network here with tensorflow code so first of all actually we'll put in a note here let's try build our first neural network with a non-linear activation function and now when i say non-linear i really mean just anything that is not linear so we could choose linear it didn't perform very well but if we wanted to choose any of these which are not linear we're going to start with relu which is very common as you'll see in your deep learning journey so set random seed tf random set seed 42. now we're going to create a model or we'll put number one here so just remind ourselves create a model with a non-linear so just anything but linear activation so we'll go model 5 equals tf carers sequential and then we go back here tf carers layers dot dense and we'll just keep it the same as what we did before we'll set activation equal to you see here by default it's none so this is where we're going to set it to non-linear and we could do relu like this or we could do tf carers dot activations dot reload wonderful now two is we're going to compile the model just exactly as we did before model five dot compile loss equals what problem are we working on tf carers losses we're going to use binary cross entropy and what optimizer have we been using the atom optimizer optimizers dot atom we're just setting the lr to what we have in the tensorflow playground 1.001 just the same there and then the metrics is going to be accuracy now number three is fit the model model five this is exciting our first model with a non-linear activation oh we said we were going to start developing the habit of saving the history x y epochs equals a hundred let's see what happens oh what did we mess up here tf carers.optimizers a typo as usual all right wonderful so we keep going we're using a non-linear activation function remember in this case nonlinear is just anything but linear and again our model ends up performing basically worse than guessing hmm so it's still not learning what if we come back to our keynote now we go back to the slide of where we looked at improving a model so so far this is what we tried we've tried adding layers we've tried increasing the number of hidden units and we've tried changing the activation function as well as the optimization function far out we've tried a fair few things but we've only just tried changing the activation function we haven't done that in conjunction with these two so how about we try that what if we increase the number of neurons and layers and change the activation function i've got an idea we can try that in tensorflow playground first so before the next video go back to the tensorflow playground and try increasing the number of hidden layers the number of neurons keep the activation and the learning rate the same and everything else the same but just increase the number of hidden layers to whatever you want i think there's a maximum of six that's all right whatever you want and increase the number of hidden neurons to whatever you want and then run it and see what happens and if you really wanted to if your neural network works or if it doesn't reproduce it in tensorflow code and i'll see you in the next video where we're going to do everything we just talked about welcome back how'd you go did you manage to to get the tensorflow playground to find the patterns or distinguish orange from blue dots did you get the test loss to the crease the training loss to decrease did you increase the number of hidden layers or the number of hidden units i hope you did but if not that's what we're going to go through in this video now we've discussed that we've tried to improve our model and we've tried adding layers we've tried to increase the number of hidden units we've even tried to add a non-linear activation function because the data we're working with is non-linear we've even tried changing the optimization function but we haven't done all of these in conjunction with each other so let's have a play around with the tensorflow playground and see if we can adjust those parameters or hyper parameters see if we can get this to distinguish patterns between orange and blue dots so i'm going to add another hidden layer and i'm going to increase this to let's say four neurons yeah that's a good number and i'll do the same for this layer all right i'm keeping everything the same so i've just increased the hidden layer increase the number of neurons learning rate the same activation is the same it's non-linear so at the moment it's relu and let's press play and see what happens oh whoa what do we got here that's a good sign the test loss is going down and it's continuing to go down ha ha wow all right we're nearly at a thousand epochs it's still going down let's just wait here for a second is it going to keep going down oh look at this so it's starting to be able to distinguish orange dots from blue dots now once it hits 2000 i think we'll stop there you could leave yours running for longer and see what happens but how cool is that so just by increasing the number of hidden layers adding a few more hidden neurons and changing it to a non-linear activation function we get much better results now again some great practice let's replicate this neural network in tensorflow code so we'll come back to our notebook and time to replicate the multi-layer neural network from tensorflow playground in code and we'll set the random seed it's going to be tf random dot set seed 42 and what do we have to do for our model well we had two hidden layers with four hidden neurons each and then we had an output and we have a relu activation function a learning rate of 0.001 all right that seems pretty straightforward so create the model uh what number are we up to i believe we're up to model six equals tf cara's sequential beautiful now we need two hidden layers here t of cara's layers dense with how many hidden units four or six four and then activation equals relu wonderful we'll just come to the end of this and then we'll create another one tf carers layers dense oh forgot the s there four hidden units and activation equals value wonderful now we'll go step two compile the model model six dot compile just exactly the same as we've done before again getting lots of practice here loss can be we'll use the string notation for for this model cross entropy binary cross entropy that is optimizer equals tf keras dot optimizers dot atom and the lr is 0.001 which is adam's default learning rate so we don't actually have to put that one there and then finally we're going to set the metrics metrics equals accuracy and let's fit the model so we're saving our results to the history variable model 6 dot fit on x y just the same number of epochs that we fit before all right well let's see what happens here all right do we get any improvements oh looks like our model's performing even worse than guessing so below what if we evaluate our model far out what's the difference here evaluate the model model 6 dot evaluate and we'll go x and y hmm why is our model performing far worse than what we like this model trained oh 2000 epochs maybe we need to train our model for longer what if we did that what if we come up here make it 250 epochs let's try that oh whoa our metrics are jumping all over the place so it definitely seems like our model is still basically guessing what if we added you know what i might think it i think it might be you can't see it in this neural network playground but i think it doesn't make sense the fact that we're dealing with two or features and we're dealing with a binary classification problem and our the last layer has four hidden neurons instead of the same well instead of one for binary classification so let's see what happens if we just tf carers layers dense because we want our output to be one or the other right not four different options we want it to be red dot or blue dot not four so let's see what happens here accuracy still 50 percent wow even after 200 or close to 250 it's about to be 250 epochs are still getting 50 hmm what if we evaluate that what's going on what should we do when we're not sure what our model is doing based on the evaluation metrics we should visualize visualize visualize so how do our model predictions look so remember we've got our handy plot decision boundary function we're going to pass in model 6 x and y what type of decision boundary is our model creating oh my goodness all right so it's starting to realize that red might be towards the outside but it's still operating with it looks like straight lines hmm what gives our model looks like it's the exact one in the tensorflow playground too i mean ideally our yellow line would go inside or in between the red and blue circle all right let's model this circle once and for all we're going to build one more model i promise but actually we're going to build plenty more models throughout the course but for this circle i mean we've done enough times it's time to reveal the missing piece here it is if we come back to our keynote we've looked at improving our model we've added we've altered the activation functions in the hidden layers but we haven't changed the activation function in the output layer so we've set up relu relu here now what if we came back to our we we looked at this right at the start i mean we could have come back to this to begin with but i wanted to go through the concept of figuring things out let's go back to our architecture of a classification model the typical architecture that is so if we have a look what are we dealing with we're dealing with binary classification we've got hidden layers well we've got two at the moment we've got neurons per hidden layer generally 10 to 100 but we've seen on tensorflow playground that four is enough for this type of data set so we'll stick with that output layer shape is one we've set that up hidden activation is usually relu okay we've set that one up output activation sigmoid ah so on our demo model here we also have an activation on our output layer but on the current neural network that we're working with come back model 6 we don't have any activation here and remember for a dense layer the activation is by default none hmm so what should we do if in doubt we could refer to our little table that we have here sigmoid or we could search something like this we could go what activation function to use for binary classification now maybe we go here what activation function for the output layer so regression linear softmax simple sigmoid works too but softmax works better okay we could dig through this information here we could go back for another one here we go the output layer contains a single neuron in order to make predictions it uses this is binary classification that's what we're after the sigmoid activation function all right that must be the missing piece that this tensorflow program doesn't show us it doesn't show us the output layer but that's right we've got tensorflow code so i want to issue you another challenge if you want to model this circle once and for all create a model like model 6 but for the output layer add in a sigmoid activation function i'll let you do your research and give that a go but otherwise we're going to model this circle once and for all by introducing the output layer activation function in the next video welcome back last video we discussed that we're probably missing an activation function for our output layer however i hope you didn't take my word for it i hope you tried to write the code yourself but if not let's do it let's model this circle once and for all i'm getting sick of seeing these uh these straight yellow lines so let's um set the random seed as usual you can probably tell that my uh coating hands are eager to ride this neural network and get this circle modeled so let's do step one create a model just as we have before i believe we're up to model seven fingers crossed this is lucky model seven you know and we're going to go sequential go there tf keras layers it's going to be the exact same hidden layers as what we've been using before the activation equals value wonderful come back tf keras layers dense four as well activation equals well here tf keras dot layers dot dense we want one for the output layer because we're dealing with binary classification one thing or another and here is where we're going to introduce the magical we could do tf keras and activations dot sigmoid or if we wanted to keep in line with the string notation that we've been using we could just turn this into sigmoid now again where does this come from we could ask google or we could check out the typical architecture of a classification model so binary classification hidden activation usually relu or the output activation is sigmoid for binary when we deal with multi-class we'll have to deal with soft max so keep that in mind going forward now we'll come back all right our model architecture is looking great let's compile it compile the model model seven dot compile now we want to go loss equals tf carers losses dot no we're using string notation come on daniel let's go binary cross entropy wonderful and then optimizer can be tf carers optimizers atom we'll set the learning rate to be 0.01 again that's the default we don't necessarily need to do that but just keeping in line with staying true to our tensorflow playground setup and we will go metrics equals accuracy wonderful now let's fit the model i'm eagerly awaiting to see if the results for this one are better than our previous model this is the fun part of uh neural networks and deep learning is running so many different modeling experiments epochs equals one oh let's watch the training i was about to set verbose equal to zero but let's watch it if this neural network's gonna work i wanna see those metrics going where they should go accuracy it starts are we going up yes oh my gosh look at that look at that we finished with an accuracy before we we added that output activation function we were getting about a 50 accuracy and now we're borderline 99 but again let's not trust just the metrics let's try number four is evaluate our model so model lucky number seven that's where we're at evaluate x and y are we getting the same wow we are loss below 0.3 and accuracy is 99 how does that line up with our tensorflow playground so that loss is getting almost 10 times lower but this is fit for 2000 epochs maybe ours would somewhat approach that if we kept training but again let's not just trust the metrics let's visualize let's visualize our incredible metrics so again we created this beautiful function a few videos ago because we want to use it multiple times plot decision boundary model number seven x y are you ready three two one oh my goodness how much better is that so it looks like our model has basically perfectly found the decision boundary between the red and blue dots except for maybe a couple of points oh that one that one got caught and that one got caught there so that's why we're not getting a perfect result but we're very close that's 99 accuracy between two evenly spread classes or two evenly spread labels that's pretty darn good but i have a question for you i want to put this down here and we've we've discussed this previously but i'm going to put the question emoji it's a little challenge so question equals or he calls i'm too used to saying equals what's wrong with the predictions we've made are we really evaluating so if we're looking at this our evaluation metric and our plotting of predictions are we really evaluating our model correctly here's a little hint what data did the model learn on and what data did we predict on now have a think about that you probably know the answer already if not perfectly fine but before we we answer that i want to emphasize what we've just covered which is back to the question we had before if we go to our non-linearity slide i posed this question a couple of videos ago what could you draw if you had an unlimited amount of straight linear and non-straight non-linear lines and then we looked at our linear data and we looked at our non-linear data so the combination of linear straight lines and non-straight lines if we go back here so non-linear functions remember we haven't even discussed what relu is or what sigmoid is we just know that it's not linear so we come back linear and these three are not linear that's all we've discussed for now but this is a i'm going to even write this down so let me put a key here if you want to take away anything from the few videos we've just gone through is that the combination of linear which is straight lines and non-linear which is non-straight lines that's all you need to know for now functions is one of the key fundamentals of neural networks and notice key is on purpose because i've got key there now back to our question just think of it like this if i gave you an unlimited amount of straight lines and non-straight or non-linear lines you could essentially draw any pattern that you wanted to now that is essentially what our neural networks are doing this data set is relatively easy but if you imagine if we're working on a whole bunch of different other data sets such as building a neural network to understand what's in a picture there's almost an unlimited amount of things look up if we go food images if we want to build a neural network to identify different patterns in pictures of food look how many different patterns there are here we need a whole bunch of non-straight and straight lines so that's the essence of what a neural network is doing when it looks at different examples of data it's drawing patterns with straight and non-straight lines through it so if this doesn't really make sense for now you might even be thinking hey daniel i've never actually seen a linear function or a nonlinear function before well you kind of have we've been using the whole time they're what power the layers we've just built so with that being said uh in the next video let's take a look at applying these linear and nonlinear functions that we've been using in our neural networks but just applying them on their own so i'll see you there in the previous video we modeled our non-linear classification data once and for all i mean look at that that's a beautiful site isn't it now decision boundary is basically splitting them perfectly now we've also discussed the concept of linear or straight lines and non-linear non-straight lines or functions but we haven't really built up an intuition about them we've just we've just got familiar with the names of some of them we've looked at relu and we've looked at sigmoid but let's start to build up an intuition of them so right here now we've discussed the concept of linear and non-linear functions lines let's see them in action so to do so how about we create a toy tensor which is very similar to the data we pass into our models because we're dealing with tensorflow all of the data we use gets encoded into a tensor we pass that tensor to a neural network it figures out patterns and outputs another tensor so let's go a equals tf cast i'm going to go tf range i want just a tensor a nice one from negative 10 to positive 10 and i want it to be offload 32 let's have a look okay nice and simple just negative 10 all the way up to 9 because it includes zero but it's 20 20 long now let's have a look at what it looks like visualize our toy tensor plt.plot hey wonderful nice and straight line what would that be called what's a straight line it's a linear line right now what activation function did we just try in our output layer the sigmoid okay so how might we apply this sigmoid activation function directly to tensor a i've got an idea why don't we look that up tensorflow sigmoid activation function do you have keras activation sigmoid okay wonderful oh the sigmoid activation function sigmoid x equals 1 divided by 1 plus exponent to negative x okay what if we just did sigmoid activation function what does that look like we're going to get some fancy math notation i'm guessing yeah there we go okay so sigmoid z equals one on one plus e to the power of negative z and we get a line like that so see how that's non-linear all right enough looking at pictures let's try to replicate this we could do it tfk as activation sigmoid but how about we start off by just replicating this so what i would do uh in this in a case scenario like this where i'm trying to replicate something we go let's start by replicating sigmoid and i'd put this here so i'm trying to build a function that replicates this here i was just trying to point at the screen but i realized you can't see that and so let's go def sigmoid and it's going to take an input x and then how about we we just it's relatively it's only this 1 divided by um 1 plus now to get the exponent we let's look that up can we do it in tensorflow tensorflow exponent tf math x wonderful does it have an alias oh tf x all right exp so if we go tf.exp pass in negative x is that our sigmoid function really that's all it is so let's try it out use the sigmoid function on our toy tensor so we created a up here before remember this is what a looks like right now it's linear now if we use our sigmoid sigmoid on a all right we get a whole bunch of what looks like gibberish numbers but let's plot them that's a much better way to look at them and how do we do that plot our a our toy tensor transformed by sigmoid so plt.plot we want to go sigmoid a what do you think this will look like give you a second to think about it let's check ho ho that's what's up now notice here that the values are between zero and one hmm that might be something we look at later on but the most important point here is that this line was originally straight and has now been modified to be non-straight now where is the intuition there if we come back remember how we couldn't draw a curved line around our data but as soon as we added this sigmoid activation function to our neural network it was like now that we've seen what it does to a straight line it was like we gave our neural network a tool to go hey you've been trying to draw patterns with just straight lines before but now you've got this non-straight line so use that as best you can to find better patterns and as it turns out it was able to find better patterns so let's not stop there let's keep going what was the other activation function that we've used so far we've come up here we've used relu activation all right so let's see how can we uh replicate this because that looks pretty cool i wonder if we can do the same for relu let's go um another tab tensorflow value function and what do we get tf carers activations relu all right well it looks like we can do very similar notation to our sigmoid function but does it have a definition of what value is oh it's just the max between x and zero so does that mean it's going to make all negative numbers zero hm or we could even just go what does relu do in a neural network the activation function is responsible for transforming the sum weighted input from the node into the activation of the node or the output for that input we could dive into that value is the max function x0 what if we go images is there a function of it equals 0 for x less than 0 or x for x equal or greater than zero okay why don't we just replicate that max 0 or z okay let's give that a go let's recreate the relu function see what happens oh and one thing before we do notice what this line looks like how does it look compared to our original tensor our toy tensor that we created above so maybe we go def u um we want to return how do we get the maximum in tensorflow we can just do m a x is it maximum or just max maximum wonderful we want to return the maximum between 0 and x that's all we've done for relu if we come here we want to go all is there a wikipedia page wikipedia is usually pretty good there we go rectifier neural network this is all we've written value equals max 0x that's it that's all we've done okay now let's pass our toy tensor to our custom value function value a and what's going to happen what do you think is going to happen well let's check it out all right how is this different to our original tensor just by looking at it this one's a bit easier to look at than the sigmoid but see how these are all zero it seems like value all it's done is it's turned the negative numbers to zero and that we've done that with the maximum function by just going hey look at the input you're getting and give me the maximum of of 0 or or the number itself so with 0 larger than negative 1 yes it is so we set it to zero all right now better still let's see how this looks so plot value modified tensor i'm gonna go here plot plot row u ay let's see ha ha now what is that we come back well let's just look at bring our straight line tensor down here so this it started out straight and now this is you could still argue that this is straight but it's got a kink in it so now we have a bunch of tools we've given our neural network with hey here's this curvy line and here's this bendy line now you've got these two tools let's start drawing patterns in our data i mean i could draw some pretty cool shapes if i had this curvy line and this bendy line now let's not stop there we've got one more activation function that we've tried remember we're now in our neural network playground the first thing we tried was linear and then we tried nonlinear so let's let's do that let's uh for completeness let's see how let's try the linear activation function so we might just do what we did before tensorflow linear activation function tf carers activations linear linear activation function pass through what does it do arguments x the input tensor returns the input unmodified are you serious we just put a tensor into the linear activation function and it just returns the same tensor did that even need a function oh well we said we're we're working for completeness we might as well try it out we go tf keras activation it's not even fun to replicate this one has no has no model activation or activations that's what we need we need the s come in here shift and enter wow are you serious that's all that does just our exact same tensor well for completeness let's uh does the linear activation function change anything so plot plot tf carers activation dot linear a we'll see oh my canada making the same error hold on this can't be real life let's go does a does a even change a equals tf carers activations dot linear all of the elements are still the same i guess well the linear activation function lives up to its to it's our documentation returns the on the input unmodified but this is a fundamental concept of what we've what we've just covered now let's come back to our slide i've just prettified all the code we've just written but the important thing is that we've written this code and we've seen it happen but it makes sense that the model didn't really learn anything using only linear activation functions because the linear activation function doesn't even change our input data in any way so it's just basically passing the same input data through the entire neural network and the outputs no wonder they're basically as good as guessing because it hasn't changed a single thing whereas with our non-linear functions such as sigmoid for the output layer we give our neural network this this tool here of using this curved line and the same thing here for the relu function when we give our model non-linear functions it's able to deduce patterns in non-linear data that's a fairly important concept now we've only covered two activation functions here but the the premise of what we've just seen is the concept of non-linearity is we've covered that so that's the main takeaway you need to take away from this series of videos is that neural networks use a combination of linear activations and non-linear activations to find patterns in data now if you want a resource for learning more about activation functions there's the machine learning cheat sheet activation functions i'll leave this in the resources section but ml cheat sheet dot read the docs dot io actually has a whole bunch of different stuff but here are some of the most popular and useful activation functions we've seen linear relu leaky value is a is a different form of value and sigmoid i'll let you go through here maybe some of your extra curriculum could be to reproduce these in tensorflow code however we're going to push on and in the next video we're going to see how we can evaluate and improve our classification model this one that we've built here so with that being said i'll see you in the next video in the last few videos we tackled the important concept of non-linearity and we learned that the combination of linear straight lines and non-linear non-straight lines functions is one of the key fundamentals of neural networks or other words how they find patterns in data and we even saw a few different examples of linear functions we even rebuilt our own non-linear functions in the sigmoid activation function and the relu activation function but now it's time to evaluating and improving our classification model alrighty so do you recall that in a previous video i posed a question of what's wrong with the predictions that we've made so far we scroll back up again jumping all over the place here but i did pose a question before what's wrong with the predictions we've made are we really evaluating our model correctly hint what data did the model learn on and what data did we predict on so so far in our toy example we've been training and evaluating or training and making predictions on the same data set but why is that wrong we'll come down here what should we do what type of data or what data set should our model learn on and what data set should we evaluate our model on now if you answered that question with we should train our models on the training data set and test our models on the test data set you'd be 100 correct but at the moment we don't have a training or a test data set so let's remind ourselves of the three data sets possibly the most important concept of machine learning and i know i say that about a lot of the concepts we've talked about but this is probably the number one thing to do with data so we've got the course materials as if you were studying at a university course is the training set the validation set is a practice exam and the test set the test data set is the final exam and the goal here is for our machine learning model or deep learning model to generalize in other words the ability for our model to perform well on data it hasn't seen before so what do you think we should do to properly set up our machine learning model training and testing well if you guessed we should create actually let's write ourselves a little note here so far we've been training and testing on the same data set however in machine learning this is basically a sin so let's create a training and test set all right so how many examples do we have let's uh check how many examples we have we can get length of x that'll tell us beautiful it's a thousand because we use the make circles function all those videos are going to create our data now we could because our data is in random order order and if we look at that and we look at why we could randomly split this using scikit-learn's train test split this one here or we could create a train and test data set by indexing i'll let you choose how you do yours but i'm going to create mine using indexing so split into train and test sets and then i'm going to go here i'm going to set x train and y train equal to we're going to do an 80 20 split so 80 of this uh of our samples is going to be training data and 20 is going to be testing data so the first 100 samples will be or 800 sorry of x and y will be training samples and then we'll do the same for x test and y test however these will be the last 200 samples so from index 800 onwards wonderful now let's check the shape of x train and x test and then y train shape and y test shape what have we got at the output okay x train and x test so there's 800 examples in x train and it's of a shape 2 that's excellent and 200 examples the next test and then y train has 800 labels and then y test has 200 labels beautiful so now we've got a training and test set how about we recreate a model fit on the training data and then we can evaluate it on the the testing data this one here so let's write that there let's recreate a model to fit on the training data and evaluate on the testing data how we should have been doing things right from the start so first of all we'll set the random seed tf random set seed 42 and then we'll go number one is create the model now again we're retyping the same code because we're making the same model as lucky model seven but we're getting ourselves or we're getting a lot of practice writing tensorflow model code which is very important because we are doing a tensorflow deep learning course so if we go over here and what was it tf carers well we could find model 7 dot summary what's this going to tell us okay so we had two hidden layers with four hidden neurons each so let's recreate that layers dense four and what was our activation function it was non-linear relu come back here then we'll create another layer tf carers layers dense form activation equals relu and then what was our output activation we recreate it again if you need to refer to the slide which which has the activation for the output layer of the binary classification model but it was sigmoid so now let's compile the model so we want model 8 dot compile we want to set the loss function to binary cross entropy because we're working with a binary problem and then we want to set the optimizer equal to tf carers optimizers dot adam however we're going to make a change here we're going to change the learning rate so adam's default learning rate is 0.001 but i have a feeling that if we increase it to 0.01 our model might be able to discover the patterns it found in our data faster than what it did previously so with model 7 we fit for 100 epochs we come back up here we fit for 100 epochs but it looks like our model was a little bit slow out of the gates so it really didn't start learning much so it's still in the 50s here and then it's not until about halfway that it starts to really increase up past the 60s in about five epochs and then the 70s and about another 10 or so epochs and then it gets really close to 100 accuracy so the reason why do you remember how we discussed what the learning rate is it's okay if you don't we mentioned it briefly in a previous video i'll set the metrics here but the learning rate dictates because let's actually let's take a step back the optimizer tells our model how it should improve or how it should update its internal patterns that it's learned the loss function says how wrong these those patterns are and then the optimizer says hey you should improve them in this way and the learning rate is how much our model should improve those patterns so if we set the learning rate to be a lower value say adam's default such as 0.001 this might be the equivalent of saying hey every time you take an epoch improve your weights by 0.001 so if we were to change it to 0.01 we've increased it by 10 percent so basically every epoch we've given our model the potential to improve its weights by ten times as much now that's not exactly how it works but that's how i intuitively understand the learning rate and in practice you'd be surprised how much that simple definition of going hey the higher the learning rate the more our model will update works in practice and by the way adam's default learning rate of 0.001 and these other default parameters here are actually very good for the majority of problems that you'll you'll work on of course as you start to get more and more into the the deep learning world you can start to tune these to your problems but you'll find that a lot of the values that we use in tensorflow all of these parameters that are hard-coded already have been experimentally found as very very very good so now let's go here we're going to fit the model model 8 dot fit we're going to fit on the training data oh what have we been getting into the habit of the last few videos that's right setting the history variable y train epochs 25. so i've increased the learning rate by 10 but decrease the amount of epochs our model is going to look at or in other words the amount of times the model is going to go through the training data by four times so let's see if it can still get just as good results with only 25 steps versus 100 steps what have we done wrong here oh we need to set this as model eight all right where do we get to oh do you notice what's happened here straight away so we have epoch 1 out of 25 we start at 54 percent and then we're into the 70s after only eight epochs and then we're into the 90s after only 15. i mean if we go back up to our previous model we didn't reach the 90s until epoch 77 wow okay so if we come down here look at that by the end we're we're borderline 98 accuracy after only 25 epochs now the real test here of course is for evaluate the model on the test data set so let's go model 8 dot evaluate now that we have a test data set x test y test how does it perform well so on the test data set in other words data our model has never seen before it gets an accuracy value of 100 that means every single test value it's it got correct out of 200 it predicted them all correct that is amazing now we can really evaluate this how how can we do so by plotting the decision boundaries we've done this for our complete data set before but now let's do it both for the training data set and the testing data set plot the decision boundaries for the training and test sets plot.figure fig size equals 12 6 beautiful plot dot subplot and one two one plot dot title let's go train data so here we're just saying hey we want to create a subplot we want it to have i believe it's rows columns so we want it to have one row two columns and the first value is going to be the training plot plot decision boundary and then we're going to pass at model 8 our trained model x is going to be x train our training data and then y is going to be y train our training labels and then we'll set up the second subplot this is going to be one two and the second plot so one row two columns the first one is training the second one is going to be test and then again plot decision boundary model eight x equals the test data set so x underscore test y equals the test labels beautiful and let's show our plot doing binary classification oh would you look at that so our model's decision boundary the yellow line goes through the training data it misses a few examples here i think it might have missed that one the red dot and i think this blue dot so that's why our model doesn't get 100 results on the training data set but that's okay we want our model to generalize to data it hasn't seen before so when we look at the test data set it gets about 100 percent or it does get 100 not about 100 it gets 100 accuracy excellent and that was after 25 epochs so the main takeaway is that all we did was we we took our model 7 the model that was working but we increased the learning rate by 10 times now again this is one of those hyper parameter things that won't always work our atoms default learning rate is actually very good for the majority of problems you're working on the only reason i set this to be a little bit higher is because i had an inkling that our model might learn a bit faster because it performs so well on our data set so again in practice you may take a little bit of a may take a little bit of tweaking for the learning rate but we'll see in an upcoming video how we can design a function to find the ideal learning rate value for us but in the meantime let's uh since we've been saving our model's training history to this history variable in the next video let's see how we might visualize that training history you may have noticed in a few of the previous videos and whenever we called the fit function and created our different models we've been getting into the habit of setting history equals model 8.fit something or model whatever number dot fit now i think we have seen this before plotting the history variable but we haven't discussed it very recently so let's go through that now let's see how we can plot the loss or also referred to as training but probably most often referred to as loss curves let's go here now the reason why we're doing this if we come back to our keynote let's find our where's our data explorer's motto first of all what is our machine learning explorer's motto it is visualize visualize visualize and we've had a little bit of experience visualizing our data and our models and our predictions but not so much training so that's what we're going to cover here and of course it's a good idea to visualize these as often as possible so let's see what the history variable is but first to understand it how about we look up the doc string of tensorflow fit function what does this give us here we go tfkeras.model so this is the model class which uh our sequential model is built off and then when we call fit off this so if we look for fit and does it have here once the model is created you can config the model with losses and metrics with model.compile we've seen that in practice train the model with model.fit that's not what we want we want the fit function i want the fit function to discuss what it returns here we go fit so we can actually pass a fair few things to fit if we look here if we go x y batch size a whole bunch of things we'll see a fair few of these throughout the course but if you want to skip ahead and read them you definitely can but i want to see what it returns where do we go here returns here we go a history object history.history attribute is a record of training loss values and metric values at successive epochs as well as the validation loss values and validation metrics if applicable beautiful so that's why we've been setting up this history variable now history the documentation just said that history.history so what happens if we go history.history what does that give us ah accuracy okay so it looks like there's about 25 values there so i think it tracks at every epoch which is beautiful so really what history tracks for us is this output here and it's good to look at these in numerical form but we can also look at them in visual form so let's um convert the history object into a data frame make it nice and tabular so we can structure it up pd dot data frame and we want to go history dot history let's see what it looks like beautiful so we can see our model's loss started at around 0.7 and decreased right down to about 0.14 and the accuracy started at about half 50 percent and then increased right up to 97 and that was on the training data set so now how about we look at the the loss curves we just plot this so plot the loss curves pd data frame history dot history and then we're going to just go simple plot i believe by default it's going to be a line plot so this is going to be model 8 training curves or we'll call them lost curves because that's what we've been calling them loss curves and let's have a look wonderful now that is probably the ideal loss curve scenario we want to see for a binary classification problem or actually most classification problems because accuracy metric is going up and loss is going down so actually for many problems i think this is a key we can add in here let's write i'll write in here note the loss function because remember what is the loss function it's how wrong our model is so for many problems the loss function going down means the model is improving the predictions it's making are getting closer to the ground truth labels now you might be wondering okay i understand that it's good to see the loss curve going down but what other values or what other value can i generate from looking at plots like this well in future videos we're going to see how we can compare multiple different models and check out their lost curves so the value in this is if we were running say we ran 10 experiments at the same time and we plotted all of our models lost curves together and say model 8 had a learning rate of 0.01 model 10 had a learning rate of 0.001 and we noticed model 8's loss decreases far quicker than model 10s but after about 100 epochs model 10 starts to catch up so that's where we'd sort of be able to use this visual knowledge to guide our future experiments so just keep in mind that whenever we set the history variable we can inspect our model's training curves by plotting them like this and we'll see in the future another way to do this with tensorboard but for now let's just leave it there and in the next video we're going to check out how we can use loss curves to find the best learning rate so i'll see you there in the previous lectures we've seen how much the learning rate hyperparameter can influence our model's training so wouldn't it be great if we had a method that we could use to find the ideal learning rate i mean a value which when our model started training meant that its loss decreased as fast as possible because remember the loss is a measurement of how wrong our model is so if we want to decrease the loss as much as possible what value could we set the learning rate to be hmm well to do this we're going to have to visualize our loss decreasing and potentially decrease our learning rate during training now have we visualized the loss decreasing i think we have there we go okay so we visualize the loss decreasing now how might we decrease the learning rate during training because so far we've only hard set the learning rate so using something like lr equals 0.01 but i haven't actually introduced you the answer to this question that i just asked so don't worry if you're not sure but that's what we're going to do in this video it's finding the best learning rate and to do so i'm going to introduce a new concept which is called to find the ideal learning rate in other words it's uh the learning rate where the loss decreases the most during training because remember that's the loss metric we want that to decrease during training we're going to use the following steps so the first one is a learning rate callback now callback is the new concept that we're going to talk through this lecture or we're going to code it out actually now a callback if you're wondering what it is is you can think of a callback as an extra piece of functionality you can add to your model while it's training so when our model trained before it went through the data and found patterns but it'd be great if we could execute some other kind of functionality while this training has taken place that is where callback comes into play so that's the first one a learning rate callback we'll see that in a second and we're going to need another model so we could use the same one as above but we're practicing building models here and finally we're going to need a modified loss curves plot so very similar to what we've got here this loss curve but we're going to have to modify it because what we're going to set up is a learning rate callback to start at a certain value of a learning rate and gradually decrease or increase that learning rate during training and then we'll make another plot to plot the loss versus the learning rate to find out the learning rate value that the loss decreases the most now if that didn't make sense remember what's our motto if in doubt code it out so let's get started we're going to create a new model so set random seed tf random said seed beautiful now we're going to create a model now it's just going to be the same as model 8. remember how i said we're going to have heaps of practice coding our models well we're almost up to 10 models so far in this one little section so tip model nine equals tf carers sequential and that's really the only way to to get better at programming or any kind of machine learning or or data science is to just just to keep writing more code keep working on different things there's no secret here relu we could go up and see what it is but or you could just follow along with what we're doing so looks like it's got oh we forgot the layers it's got two dense layers with four hidden units with the reload non-linear activation beautiful and then we have an output layer with one hidden unit and we're using the sigmoid activation beautiful so now what's next stuff we create the model so we're going to compile the model just the same as before except now we're using model 9 109.compile loss we're working with binary so we're going to binary cross entropy [Music] and then we're going to put in here the optimizer let's use atom we use the text based version of atom matrix equals accuracy beautiful and now here's the step that's going to be different so we're familiar with these steps here but we're going to introduce as i said the learning rate callback there are many different types of callbacks but for this one we're using a learning rate callback and so a callback works during model training so to get it to run during model training it has to exist before model training so before we call model 9. fit yeah yeah our callback has to exist so let's create that create a learning rate callback so we're going to call it lr scheduler because as you'll see in a second if we go tf keras callbacks there's a callback called learningrate scheduler beautiful and now how do we find out the doctrine of this we can press command shift enter or command shift space sorry it came up automatically so this is learning rate scheduler at the beginning of every epoch this callback gets the updated learning rate value from schedule so schedules this parameter here function provided at init with the current epoch and current learning rate and applies the updated learning rate on the optimizer okay so this is kind of a way of saying every epoch if we put in some functionality here to change the learning rate it's going to this callback if it's running during the fit function it's going to give our optimizer in our case adam the updated learning rate so let's see what we might do what we're going to do is do lambda epoch oh lambert needs a b epoch and then we're going to go 1 e negative 4 so this is just 10 to the power of negative 4 here and then times 10 epoch divided by 20. wonderful so essentially what this is saying is to or for the learning rate scheduler every epoch to traverse a set of learning rate values starting from 1e negative 4 and increasing by 10 to the power of the epoch divided by 20 every epoch so let's see what that looks like there we go here fit the model so this time we're going to pass lr scheduler callback model oh we're getting into the habit of saving history history nine equals model 9 dot fit now let's pass it our training data some training labels epochs we'll run it for 100 epochs and then callbacks now callbacks come as a list so we're going to pass it in lr scheduler because you can put us you can pass multiple callbacks here so say you want to call back to call back three i'm not sure why i'm putting an underscore between call and back it's actually just one word but that's all right we don't have callbacks two and three we only have one so let's just see what's going on here run this beautiful now our training appears to be just working as normal this is we're very familiar with this we've seen this before but i want you to have a think about it what might be different now if we passed it our learning rate scheduler here what's it going to do every epoch it's okay if you're not sure because remember our motto if and down code it out now what we can do is check out the history remember we saved it to history nine so we'll turn it into a data frame so that we can plot it history nine dot history and then we're going to go plot fig size or like my favorite two numbers in poker 10 7 also just a handy size to put out in this this this kind of window x label equals epochs beautiful let's see what this looks like oh okay so this looks a little bit different to the lost curves that we plotted before it's the exact same sort of plot that we created up here so this is our model 8 loss curve so you see here history.history now this is model nine history nine dot history plot we've just given the x label epochs so if we look at this axis the y axis our learning rate starts very low basically zero and then as the epochs go on and on and on it starts to increase okay so that's essentially what this code up here is doing so it starts at a low number one e 4 and every epoch it increases by 10 to the power of epoch divided by 20. so that's why towards the end we get this exponential curve it starts to increase really fast but what happens here with our accuracy our accuracy seems to go up slightly but then goes down and the loss goes down fairly significantly here and then it stays low but then goes back up hmm so looking at this what did we want what do we want before we wanted the learning rate where our model's loss decreases the fastest so potentially this value here whatever it is the learning rate at say let's say 45 epochs maybe this is where our learning our loss seems to be decreasing the fastest i got an idea let's plot the learning rate values during training versus the loss so how might we do that let's go plot because we can't really see what's going on here let's get accuracy out of there and let's just compare learning rate to loss so plot the learning rate versus the loss so our lr's is what does it start at it's started at 1e negative 4 times 10 to the power of now we're going to have to do we need a range so tf range 100 because that was how many epochs we did divided by 20. how does that look that should give us there we go shape 100 beautiful so if we go len lrs allows is short for learning rate there we go we have a hundred different values of learning rate so see how it starts here at one e negative four and then it slowly increases as we go along all we've done with this line of code here is we've just replicated the same thing we passed to the learning rate scheduler except that we had to substitute in 100 as an integer for epoch because we set 100 here that's all we've done so these are our different learning rates that our model tried out now what can we do how about we create a plot plt dot figure fig size equals 10 7 wonderful now we're going to do a semi-log x plot so that means this just means we want log on the x scale make a plot with log scaling on the x-axis you'll see what i mean by that in a second so log x lr's so we're just passing it this that we created which is just a tensor of this shape may have to be numpy i'm not entirely sure we'll find out if in doubt run the code now we're going to go history we're going to get the loss so this is what we want so on the x-axis we want lrs and on the y-axis we want the loss values from our model history beautiful now we're just going to decorate our plot with x label can be learning rate and the y label can be loss and finally let's give it a title plot the title learning rate versus loss beautiful what does this look like okay now before we even discuss this i want you to maybe pause the video for like 10 seconds and just have a look at it and have a think about where do you think our ideal learning rate would be totally okay if you're not sure but just have a look remember what we're trying to do here is we're trying to pick a learning rate value where the loss decreases the fastest or the most so what what part of this graph is the loss decreasing the most so have a think about that and press play when you're ready to go how'd you go well i'll tell you the methodology to figure out the ideal value of the learning rate or at least the ideal value to begin training our model with the rule of thumb here is to take the learning rate value where the loss is still decreasing so maybe here but not quite flattened out like it like it looks here and it's usually about 10 times smaller than the bottom of the curve so in our case our ideal learning rate would be somewhere in this section here so it ends up being between 0.01 so 10 to the negative 2 so this value and 0.02 so it'd be about this value here now i actually did prepare something that highlights this a bit better earlier finding the ideal learning rate here's the code that we ran it's just our plotting code boom this is what i was talking about so this is the lowest point on the curve somewhere down here but remember the ideal learning rate is somewhere about 10 times smaller than that so we have to go back here so it's somewhere in there now if we go back if we said that the ideal learning rate is 0.01 or 10 to the power of negative 2 what was the learning rate we said above here that our model got really good performance where's model 8 wow we set our learning rate to be 0.01 how phenomenal is that you could call it a lucky guess but there's another little heuristic we can go here we can either find the best learning rate through this methodology here or we could use the default value learning rate or just take a guess because an example of other typical learning rate values are 10 to the power of 0 10 to the power of negative 1 10 to the power of negative 2 10 to the power of negative 3 and 1 e negative 4 if we have a look at these so see here what do these have in common they're all multiples of 10. now you could have a learning rate that's 0.03 or you could have a learning rate that's 0 2 5. realistically there's a whole bunch of different range of the learning rates that you could use but why daniel you give me these values but you said your learning rate can be almost anything the reason why i'm highlighting this is because whenever you use an optimizer say let's go tensorflow atom if you use a pre-built optimizer as we've discussed before their default parameters are generally pretty good as in for most of the time they'll work pretty well but for when they don't well you've got some other learning rates that you can try here you could just try hard coding these typically you probably won't ever use one you'll start to use below one so starting from here and keep going down or you can use this methodology to find the ideal learning rate so up here by using a learning rate scheduler have it decrease during training and then plot the log learning rate versus your model's loss and find the value where or the learning rate where the loss curve decreases the fastest or in other words just like this graphic here so see here loss decreasing very fast the ideal learning rate is going to be somewhere between the lowest point in the curve and about 10 times smaller than that point so have a practice with that potentially you could train another model on some other data that we've worked with and find the ideal learning rate but now that we've found the ideal learning rate let's fit another model actually you can probably try that before we go there so pick a learning rate in this little section here create a new model and fit it to our data and i'll meet you in the next video in the last video we saw how we might be able to find our model's ideal learning rate by the learning rate scheduler callback and starting with a learning rate of a fairly low value and slowly increasing it every epoch and then plotting the different learning rate values versus our model's loss we also discussed that the ideal learning rate is somewhere between the lowest point on the curve and if we jump back to a region where the loss is still decreasing so how about we try building a model with a learning rate of so if we look at this point on the curve still the loss is still decreasing sharply so that's the value we actually set before if we come back up to model 8 model 8 there we go so this is the learning rate we used before 0.01 and then if we come down here which is the same as 10 to the power of negative 2 so if we go 10 to the power of negative 2 [Music] but we can also see that if we go jump up one little notch here because this is a log scale remember that the loss is still fairly sharply decreasing here so that would be a value of zero so how about we try build a model with that learning rate and see if it can achieve similar results to model 8 that we use this learning rate with in less epochs because remember what is the learning rate code for it codes for how fast our model should try and update its patterns the higher the value the more our model is going to update its internal patterns every epoch so let's see it in action let's try using a higher ideal learning rate with the same model as before so we're going to set random seed tf random dot set seed 42 beautiful and then we're going to create the model we're up to model 10 now how good is that double digits sequential wonderful now you could probably just jump ahead here if you really wanted to we're going to have two layers that are dense with four hidden units and a non-linear activation relu and then we'll do the same for the other layer tf keras layers dense and it's going to have four activation it's going to equal relu or rel u depending on how you want to pronounce it and then the final layer just the same as model a dense one activation equals sigmoid and now what do we have to do next this is where we're going to compile the model with the ideal learning rate so the value we've just picked off the curve so this one here 0.02 and then we'll go and put this here [Music] learning rate we used before model 8. we come here we're going to go model 10 dot compile loss is going to be what loss are we using binary cross entropy and then we're going to go optimizer equals tf carers we have to use the the class version of atom here lr equals where we're going to set it to 0.02 beautiful because we said we're picking this learning rate value here let's come back down metrics equals we'll just keep everything else the same except we're increasing our learning rate by 0.01 and now let's go fit the model for 20 epochs because model 8 we fit for 25 epochs so 5 less than before let's see what kind of results it achieves with less epochs model 10 dot fit actually i might say this as history 10. that's probably better history 10 the fit x train y train and epochs equals 20. let's see what happens and come down how did it go all right loss equals 0.878 and accuracy is 98.24 so what was our model 8 results let's go let's go back so let's save this 0 0.9824 and if we come back so 0.9824 you remember that too where's model 8 are we going too far model 8 there we oh 25 and it's 974 ah so 9824 model 10 has performed slightly better than model 8 on less epoch so less chances to look at the data all because we increase the learning rate slightly so it gave our model 10 a chance to learn patterns in the data faster because they were updating with larger steps now of course as with every hyper parameter with our deep learning models that might not always happen but that's just another example of how powerful tuning the learning rate for your models can be so how about we evaluate it evaluate model 10 on the test data set actually we could have just done this to begin with model 10 evaluate um x test y test wonder how they perform differently on the test data 99 beautiful and then we go evaluate model 8 on the test data how did this perform model 8 dot evaluate x test y test hmm so our model 10 gets a lower loss value on the test data set than uh than our model 8 but it model 8 gets a higher accuracy value now that depends on which of one of these metrics you want to optimize for so again remember the metrics you get from the training data set aren't always important as the metrics you get from the testing data set so this is something you'd want to investigate further depending on what your needs are and depending so potentially our model 10 just because it's learned faster doesn't mean that it's eventual performance on a test data set or unseen data will turn out to be better so this is where it takes a little bit of a trial and error to figure out which model is ideal for your use case now how about we finalize this model 10 with the ideal learning rate and just see how the predictions look so i'm going to go plot the decision boundaries for the training and test sets so plt figure fig size equals 12 6. wonderful and then we'll do we'll get a subplot going that's one two one and then we'll do the training data first we'll set up the title there then we'll bring in our trusty function plot decision boundary and that takes in model 10 it'll also take in x train first we need an underscore there and then it will also take in y train and then we want to subplot again one two two so again this is row one column two or two columns and this is section one so the first one and this is one row two columns and the second section so we'll see what that looks like in a second plt dot title is going to be test and we want to bring in our fancy function from above model 10 x equals x test and then y equals y test beautiful and then plt dot show binary classification wonderful so again our model 10 using a ideal learning rate that we picked off the loss curve gets basically perfect predictions on the training data set and the test data set and see this is what i was talking about before with the subplot function we want one row two columns and this first plot train is the first element here and this test plot is the second element in the subplot so with that being said we've explored a few ways to evaluate our classification models we visualize them but there are a few more classification evaluation methods that we should really be looking at so let's have a look at those in the upcoming videos so far we've seen a few visually rich ways to evaluate our classification models let's have a look at a few more evaluation methods that we can use now these are some of the most common evaluation metrics that you should have in your machine learning and deep learning toolbox to evaluate your classification models so let's have a look here they've got a key tp equals true positive not toilet paper tn equals true negative fp equals false positive and fn equals false negative so let's have a look we've got metric name accuracy we've seen that one there's the formula there true positives plus true negatives so all of the true predictions on top of divided by all of the other predictions if you wanted to do this in code the accuracy metric you can do tf.keras.metrics accuracy or if you want to use scikit-learn it has a accuracy score function in there now when should you use accuracy well it's a default metric for many classification problems however it's not the best for imbalanced classes so for example if you had 10 000 examples of one class and only 10 examples of another class and you've got to classify to score 99.999 or something like that it could just predict that everything is all one class and get that sort of result so in that case you probably want to look into using metrics like precision now there's a formula for precision here true positives over the total positive predictions including false positives there if you wanted to use it in code you can use precision higher precision leads to less false positives let's go to google and go what is a false positive oh coronavirus testing what is a false positive that's a topic of the moment what is a false positive there we go a false positive is when someone who does not have coronavirus tests positive for it so you can see where a false positive may have altercations well it's not a good thing because if someone tested positive for coronavirus that could go on to have a whole bunch of adverse effects that you didn't want so if you're training a machine learning model deep learning model as well and you wanted it to predict less false positives in other words predicting if someone had coronavirus when they actually didn't you probably want to optimize for the precision metric now there's also recall which is true positives over true positives plus false negatives if you wanted to use a code you could use one of these two functions here now higher recall leads to less false negatives so if a false positive is someone being predicted as having coronavirus when they don't actually have it what do you think a false negative is and what would what would be the implications there so the false negative in this case would be if someone was to predict we did a coronavirus test and i actually had coronavirus but my test came back and said that i didn't have it so you could imagine again that would have consequences as well if i had a test that said that i didn't have coronavirus and i go about my life and do whatever i want to do and then i start giving it to other people well then that's not an ideal scenario is it so for problems where false negatives are not good for your use case you want to train your deep learning models to have higher recall however you might be thinking why don't we just increase precision and recall there's often a trade-off between the two and i'll show you here what is the precision recall trade-off precision recall trade-off beautiful let's understand precision recall where is it have we got a trade-off curve there we go this is probably it unfortunately you can't have both precision and recall high if you increase precision it will reduce recall and vice versa this is called the precision recall trade-off so keep that in mind that in an ideal case your model would have high precision and high recall but usually when you try to improve one versus the other the other one goes down so say we wanted higher precision you would have lower recall and the inverse of that is also true another option is to try and improve the f1 score which is like a combination between precision and recall so the f1 score is one of the scores that i like it's usually a good overall metric for your classification models but again me just reading these evaluation metrics to you out loud probably doesn't make as much sense as to when you start to code them up and start exploring them yourself so just keep that in mind we're just talking about these here we're just naming these different metrics it's not until you get to start to use them in practice that you'll really start to understand when to use which and then finally another great one is a confusion matrix so this is particularly helpful when you're dealing with all the way from binary to multi-class classification you can create your own custom function or scikit-learn has a built-in confusion matrix function so when to use so when comparing predictions to truth labels to see where the model gets most confused it can be hard though to use with very large numbers of classes as we'll see in a future project so keep these in mind again this is just a slide in the upcoming video we're going to have a practice implementing a confusion matrix but for these ones here the accuracy we've already we've already done this during our model training so for precision recall and f1 score your homework for this video is to dive into the tensorflow or scikit-learn documentation for each of these three and then i'll see in the next video we'll start applying some of these metrics or evaluation methods to the problems we've been working on in the last video we introduced some more classification evaluation methods now again these are some of the most common and the ones you definitely want to keep in your toolbox so make sure you screenshot this or take a note down of these metrics because they're probably going to be the ones you most commonly see with classification deep learning models we didn't write much code last video so let's make up for that we'll go back to our notebook here and all right another heading here more classification evaluation methods and so alongside visualizing our model's results as much as possible there are a handful of other classification evaluation methods and metrics you should be familiar with and because this is not a markdown cell it's going to keep going that is all right we'll just put a little few dot points here we'll just replicate that slide that we had we had accuracy which is probably the most common and then precision a higher precision leads to less false positives and then we had recall a higher recall leads to less false negatives however there is the precision recall trade-off which is an important concept to be aware of there's the f1 score which is a combination of precision and recall and usually a good overall classification metric and then of course there's the confusion matrix which is another visual way of looking at things and then finally this is not tensorflow specific but it is also another way that you can see everything classification report from scikit-learn is that on the slide before i don't think so we can put this one here okay here classification report scikit learn the good news is a lot of these classification metrics all have a very similar principle they take in some true values and they compare them to our model's predictions that is the crust of of whenever we're evaluating a model what were the true values the model should have predicted and what were the values that it did predict let's compare the two so i'll just put this little link in here for the rest the previous slide that we looked at shows the code examples that you can use so let's start off with accuracy i mean our model has already used it because we passed in accuracy up here as the metrics let's write some code to make that look a little bit better so we can go here we've got check the accuracy of our model and we've got loss accuracy equals model 10 dot evaluate and we're going to evaluate it on the test data set and then we're going to print we'll make this a little bit prettier so model has or just model loss on the test set that sounds a bit better doesn't it and then we're going to do because it's an f string we can pass that beautiful and then model accuracy on the test set wonderful and then we can just because we're going to set accuracy up here let's times that by a hundred and then we'll just put we'll shorten it to two decimal places will this work i hope so there we go okay so see what we did there was we just evaluated it on the test data set and then we found out the loss because that's what our model is going to return with the evaluate function so the loss is there and then the model accuracy it comes out in this dot point notation we've just adjusted it here to be more visually appealing so what should we work on next how about a confusion matrix so what we do is we'll end this video here and we'll come back to the next video and we'll see how we can make a confusion matrix with our model last video we left off with the question how about a confusion matrix we checked out the accuracy again if you want a little bit of homework you can check out our model's precision recall and f1 score before we create a confusion matrix let's see what the anatomy of a confusion matrix is so this is what we're going to be working towards creating so it's a matrix so it's got rows and columns and on the y-axis of a confusion matrix are usually the truth labels so this is what ideally our model did predict but on the x-axis is what our model actually did predict so inherently what happens when you lay it out like this is that on the diagonal are the correct predictions in other words the true positives and the true negatives so you might be wondering well what's a positive and what's a negative well in the case of binary classification a true positive is when the model was supposed to predict one when the truth is one and a true negative is that the model predicts zero when the truth is zero so knowing this you might have guessed that the ones outside of this diagonal are the false positives and the false negatives in other words a false positive is when a model predicts one when the truth is zero and a false negative is a model predicts zero when the truth is one so again this is just seeing the name of things let's code this up and see what it looks like let's go back to the previous slide and see where the code is at confusion matrix what is the code custom function or sk learn metrics confusion matrix all right let's not reinvent the wheel to begin with let's bring in scikit learns confusion matrix so scikit-learn confusion matrix metrics.confusion matrix beautiful okay is there an example all right so example from sk loan metrics import confusion matrix we need true labels we need some predictions and then we just pass it to the function wow okay let's try it out so [Music] create a confusion matrix so from sklearn.metrics import confusion matrix wonderful then we need to make some predictions how do we do that with our trained model so we'll save under y threads equals model 10 [Music] dot how do we make predictions with a train model if you guess predict you'd be correct so we'll predict on the test data set and then we can create our confusion matrix how good is that i love scikit-learn i love tensorflow some beautiful beautiful code libraries out there so that's all it is we've got the y test which are our test labels and our models predictions let's yeah see what the confusion matrix looks like oh no what's happening no module named sklearn oh typo standard sklone dot metrics import confusion matrix this should work oh no classification metrics can't handle a mix of binary and continuous targets hmm what is happening here well what do we do when we face value errors like this we have to inspect what we're trying to predict on so what does y test look like maybe we view the first 10 values okay so that's what y test looks like what do our threads look like do they look the same ah i see where the trouble is look this is why we're getting a value error classification metrics can't handle a mix of binary and continuous targets so our test values are in binary form so it's zero or one whereas our predictions are in continuous form so they're not zero or one so what should we do here what do we have to do to compare our test array and predictions array well we're going to have to convert our predictions array into zeros and ones but what even are these values here well these are called let's write this down oops looks like our predictions array has come out in prediction probability form so this is the output we'll make that in bold so the standard output from the sigmoid or softmax as we'll see later on activation functions so what we're going to have to do is convert them so they're in prediction probability so this is a value that the model has output so the closer the value is to one the more the model thinks that it's a one label and the closer the value is to zero the more the model thinks it's zero value so out of the knowledge that we've learned with tensorflow so far is there a function that we can use to potentially round these values there was a little hint there to their closest integer value so for example if one of these was let's see if this actually outputs 9.852 i'm not going to type out the whole thing because that's going to take way too long what does this actually look like there we go so is that closer to zero or one now you might be wondering of course it's closer to one so this one would go to 1 right and then the same would be for this one and the same would be for this one and then we might have to go on for a while before we find 0. you might be wondering what's the cutoff well the cutoff is 0.5 so anything higher than 0.5 will go to 1 and anything lower than 0.5 will go to 0. however this is a value again you can tune but for simplicity's sake we're not going to use that for now we're going to just convert our prediction probabilities to binary format and view the first 10. now again i want you to try and figure out this yourself how can we round these to zero or one using tensorflow if you guessed we use tf round or perhaps you already knew that you're like daniel come on i'm about 50 videos into using tensorflow i know this stuff already we go boom what does this look like ah that's what we want that's looking more like what our test labels are so remember whenever we're comparing things again one of the biggest issues that you'll run into is your tensors or your data types been of the wrong format so all it is is about thinking about how you can get them into the right format so that's what we want so this is what you'll often have to do with a classification problem the output of your model will come in prediction probability form and you'll have to convert them to human readable form in other words integer form now let's see what our confusion matrix looks like so let's create a confusion matrix confusion matrix y test and then we want to go tf round y press and we're off hmm okay so we've got down the diagonal if we go back to the anatomy of our confusion matrix let's see all right so we've got the same values here but this isn't as pretty this is just an array what we might do in the next video is prettify our confusion matrix to look something like this we know what the y-axis is we know what each row is we know what each column is we know what the x-axis is so we can look at this one but it's not the ideal type of confusion matrix i mean if you imagine you were trying to share that with a colleague what is even going on here so next video we'll see how we can prettify our confusion matrix in the last video we made our first confusion matrix however we said that it's nowhere near as pretty as a confusion matrix we have here so how about we write this down how about we prettify our confusion matrix because it needs to be more beauty in the world excuse me i'm getting a little bit poetic here but code can be poetic too so the function we're going to use or the code we're going to write is i'm going to put a little note down here so the confusion confusion matrix code [Music] we're about to write is a remix of scikit learns plot confusion matrix function now we've got this up here so if you look into this we can click on the source code and we can follow through with that fairly extensive source code scikit-learn is beautifully documented and you can go through that i tried out this plot confusion matrix function and i found out it only works with estimators so a scikit-learn model is referred to as an estimator whereas where you want to use tensorflow so we want to adapt it to our tensorflow code so you'll often run into this right you'll often run into it's like where you want some sort of functionality but it exists somewhere else but to get it working for your use case you kind of have to tailor it so that's what we're going to do here so just follow along and we're going to make a pretty confusion matrix with this here and with our tensorflow model so let's set up a fig size just so we can use this again now because we're writing we're going to be writing a fairly extensive amount of code here what's our principle whenever we're writing something if you get stuck or you're not sure what's going on or my explanation isn't as great as it could be always pause and rewrite it yourself and see what happens that's what i do whenever i don't understand something so we want to go we can just create our confusion matrix just like we've done up here so nothing new so far why preds beautiful you might have noticed i've imported etta tools we'll see where that comes into play in a second we're also going to create a little normalized feature because if we come back here this is how we're going to get percentages here right so this label 98 of them are correct and 2 are incorrect this confusion matrix works out well because we have 100 examples might be different depending on the amount of samples you have for each label but let's not get distracted daniel let's code so we're going to go cm as type float now because remember our confusion matrix so far is just an array a very non-pretty array that is we can just divide it by the sum to normalize it axis equals one and then we're going to go here np new axis so this is going to be this will normalize our confusion matrix again if we wanted to see what that looks like we could go cm norm wonderful there we go and now let's go in here we want to set up number of classes so we can do this by getting the shape of our confusion matrix again to see what something does dot shape this will be helpful for if we had multiple classes so right now we're only dealing with binary classification but what if we had 10 classes we may see that later on spoiler alert so let's prettify it i've got my dyslexia kicked in there got my f and t's mixed up so we're going to create a figure and an axe using plt.subplots and we're going to set our fig size to equal fig size we hard coded this up above here you don't have to do that but i just decided to and then we're going to create a matrix plot so c ax equals ax dot mat show so this is a matrix plot we'll go here matplotlib mat show what does this do display an array as a matrix in a new figure window beautiful again we're remixing some code from scikit-learn here so mat show what do we want to pass it we want to pass it our confusion matrix right so cm and then we'll set the color map to equal plt.cm blues a lot of cms here don't confuse our confusion matrix cm with plt dot color map blues wonderful and then we'll go fig dot color bar [Music] cx wonderful and finally we can create classes we need to set up a bool [Music] so if we do have multi-class we want it to do something and if we do only have binary class which is what we're working with now we want it to do something else so that's why we're creating a conditional if classes labels equal classes else labels equal np a range cm dot shape zero wonderful so if we have a list of classes if it exists we'll set the labels to equal classes but if it doesn't which is our current scenario we'll set the labels to be just a range of our confusion matrix shape on the zeroth axis which is just two so it'll be zero to one wonderful and now let's label the axes axe dot set because we're going to be writing a bunch of labels here we're just going to use the shortcut ax dot set for matplotlib confusion matrix that's a title and remember we're prettifying it so it looks like this that's our title this is our x label this is our y label so these are the ones that we need x is the predicted label and then y label equals true label wonderful we're also going to set up the x text to the npa range and classes and then the y text is going to be the same thing np a range and classes that's the number of little dashes that we want to have on our figure we'll see what that looks like in a second we can also set the x tick labels to be our labels variable which in the case of binary classification it's just going to be the range if we don't have a list of class names the labels are just going to be the class integer values and same thing for the y tick labels equals labels we need to set the threshold for different colors this will make sense in a second so the threshold equals cm dot max so the max value in our confusion matrix plus the minimum and then we want to divide that by two so that's the threshold so this is going to give our confusion matrix different shades of squares depending on how many values are in there so a typical confusion matrix is you want the diagonal access to be really dark where the correct predictions are and all the other ones where there's not many prediction values to be light this will make a lot more sense when we visualize our pretty computer matrix so plot we're going to plot some text on each cell so we can go for i j in eta tools so better tools is going to iterate through whatever we pass it so we want range cm shape zero and same thing for range cm shape one i'm gonna go boom plt now this is we're gonna set the text for each square so j i this is giving ourselves coordinates and then we can set up an f string for our confusion matrix for i j coordinates i j index that is we want that and then it's also going to be and we want in brackets cm norm again the same locale index times 100 we want one decimal point here f and then squiggly bracket fix up the f string and then percentage or our bracket's correct here fingers crossed there's going to be an error here somewhere isn't there let's traverse back through where have we got an extra because we want this to be the same color okay wonderful yep that should be like that oh our f string has ended too early that's what we want wonderful and now we can have a little comma after that we want horizontal alignment equals center and then we want to go color equals white this is going to be the color of the text if our confusion matrix at a particular index is greater than the threshold else we want the text to be black wonderful and the size can be 15. there's a fair bit going on here but again if you're not sure what's happening in any of these lines remember break it apart we've got a lot going on here this took me a while to remix this function here so i didn't just pull this out of the hat let's see what it looks like anyway see where our errors appear wonderful text has no property horizontal alignment oh i have no n there we go whoo so we've basically just replicated our pretty confusion matrix see what i meant with the color threshold here we want our squares with lots of values in them to be darker and the squares with not many values in them to be lighter and the more values is you can't really see it with this confusion matrix but we will see it later on the higher the values the darker the shade of the square will get maybe if we go confusion matrix images there we go this is a good example so this is without normalization so we see here all these squares are light and all these squares are darker because they have more values so with that being said we probably could increase the text size here so that's a bit more visual how about we do that what do we want to do we want to set the x axis labels to the bottom that's what we'll get that to the bottom see those zeroes and ones we want to get those down there so we can go ax dot x-axis access the x-axis parameter set label position and then we're going to just type in bottom then we go x-axis dot tick we want to have the ticks down there as well correct and then we want to adjust the label size we can do that by going x y axis label set size 20 and then we're going to go x x x-axis label set size 20 and then we can do the same for the title axe dot title dot set size 20. now of course beautiful that's looking way better have a go this we've just purified our basic confusion matrix we've come back up here albeit took a fair bit of code we have this array here and now it looks like this now of course we could functionalize this maybe that's a little task for you could functionalize this to work with any y test values and y print values so maybe we look at that doing that later on or maybe you could try that now but as you could see this is a really great and visual way that we could quickly show someone how our model's performing and see how it does on different classes where it gets confused now our model's doing pretty well right now on our data because as you can see the diagonal is very dark but going forward it'll make more sense once we work with a multi-class classification problem and make a bigger confusion matrix to see where our model messes up when comparing say if we had 10 different classes so that's where we'll finish this video we'll get rid of these two cells here and then in the next video we're going to start tackling a larger example more specifically multi-class classification so go back through see if you can turn this into a function maybe you want to turn it into a version of plot confusion matrix so you could do def plot confusion matrix blah blah blah and see if you can pass it something like true labels and predicted labels and have it come out with something like this so give that a shot and i'll see you in the next video alrighty welcome back i'm super excited for this series of videos because now we're going to start let me write this down actually working with a larger example in other words or more specifically multi-class classification so we'll just make this little heading here actually it's probably worth making it a size one heading so we'll turn that into markdown oh we've spelt multi-class classification wrong that's all right now what we're going to do in this series of videos is so far we've created our own data set now we created a blue and red circle data set which is a fairly simple data set and then we went through a whole bunch of different steps to model it we learned how we can improve our models we learned about non-linearity which is important if we want to model data that is non-linear so it has different shapes we learned how we could evaluate and improve our classification models we learned how to plot the loss curves find the best learning rate and a whole bunch more classification evaluation methods so now to really drive all of these concepts home let's start with a new problem a multi-class classification problem so let's say you were a fashion company and you wanted to build a neural network to predict whether a piece of clothing was a shoe a shirt or a jacket so in that case you have three different options so let's write that down when you have more than two classes as an option it's known as multi-class classification so two classes on their own is binary classification but more than two is multi-class classification so that means this means if you have three different classes it's multi-class classification it also means if you have a hundred different classes it's multi-class classification now the good news is with a few tweaks everything we've worked on so far in our binary classification problem we can apply to a multi-class classification problem we could just look at here to see the steps that we've gone through or we could just jump back into our presentation and remind ourselves of the steps in modeling with tensorflow i love that little animation look at this colorful little picture so we need step one let's get data ready turn it into tenses okay step two is build or pick a model we've built a lot of models so far so we're pretty familiar with this yeah fit the model to the data and make a prediction okay evaluate the model improve through experimentation and save and reload your trained model wonderful let's start with step one let's get the data ready so i kind of hinted at what kind of data set we're going to be using before we're going to pretend that we're a fashion company and we want to build a neural network to classify different images of clothing so to practice multi-class classification we're going to build a neural network to classify images of different items of clothing now the beautiful thing about this is that we can use the fashion mnist dataset which is built in to the tensorflow.keras datasets module so let's have a look if we go tensorflow fashion mnist beautiful fashion mnist tensorflow datasets so the tensorflow datasets module is up here it has a whole bunch of different built-in data sets that you can use to practice on your own problems so they're great to use to sort of get familiar with how a neural network you can build with tensorflow and get it working before you adjust it to your own data set that's a really important concept a lot of the time in machine learning and deep learning you'll work on problems that have existing outlines and then slowly adjust them to whatever you're working on so we've got a description here fashion mnist is a data set of zolando's article images consisting of a training set of 60 000 examples so it's going to be our biggest data set yet and a test set of 10 000 examples each example is a 28 by 28 grayscale image associated with a label from 10 classes you could go through that documentation there or you could just follow along with the code here to get started with the tensorflow data sets we'll re-import tensorflow as tf and we'll also get the tensorflow.keras.datasets module we're going to import oh that needs to be data sets we're going to import fashion mnist now the good thing about the tensorflow data sets is that all of the data sets are basically as many as you can or at least the ones i've worked with so far which is a fair few have already been split into training and test sets so we can go the data has already been sorted into training and test sets for us so to import it we can use tuples this is going to be train data train labels and then another tuple for the test data and test labels so we'll set that up as equals fashion underscore mnist which is this module we just imported here and then we'll use the load data method so of course if you wanted to see this in the documentation you could come back here and the split says that it's already in test and train 10 000 and test 60 000 in train and then there should be an example somewhere of how to import it i guess not that's right it's somewhere here it may be in the overview section let's get back to focusing writing code so we're shifting into that and it'll download it'll be pretty quick because it's a relatively small data set and it's stored on google storage now to check out an example we can go show the first training example and go print f training sample and i'm going to put it on a new line just so it's nice and neat we'll go train data and then we'll just get the zeroth index from that and then i'll get a new line after that we need to finish that f string there we go and we're going to do the same thing for the training label we'll just do the same index here of course we could have done this randomly but i'm going to just get one out of there because that's one of our first steps right whenever we're downloading data we want to become one with the data we want to visualize visualize visualize okay so what have we got here training sample we've got some sort of array of numbers beautiful and the numbers are varying from it looks like from zero up to about 255 okay and then the training label is a nine all right so this array of numbers or this matrix or this tensor of numbers represents training label number nine so if we go here we look up zelando researchers fashion mnest these are the images that we're going to be working with so there's 10 different classes of image we've got shoes we've got dresses we've got shirts come down we can get the data from there okay here's the labels so the labels each training and test example is assigned to one of the following labels so number nine this is an ankle boot wonderful so this first sample is an ankle boot now of course we could keep looking at numbers what else do we want to investigate whenever we're working with a new data set we want to look at the shape we also want to look at what it looks like so let's do that let's go check the shape of a single example so let's go train data we need to get the first sample dot shape and let's go train [Music] label labels zero dot shape wonderful so we've got a 28 by 28 tensor here and the train labels is just a scalar so it has no shape there now how about we get visual plot a single sample and come in here so to plot it we can import matplotlib got pi plot as plt we go plt because we're working with an image m show and then we can just pass it a single example we'll go zero and because it's training label is nine what should it look like it should look like an ankle boot let's see if this works oh okay so fairly pixelated here but you can kind of see the outline of the boot there and how about we try another one wonderful so that looks like a sweater and then check out samples label so if we go train labels and pass at the same indexes here what label does this get all right it gets a two and if we come back here oh it's a pullover okay so i said this was a sweater kind of looks like a sweater to me but again depending on what data set you're working with it will have different labels so we've gotten pretty familiar with our data we're probably going to have to set up a little list to index on our training labels so let's start doing that in the next video and then we'll slowly keep getting more and more familiar with the data before we start to build a multi-class classification neural network let's keep going getting familiar with our data that we're going to be working with for the multi-class classification problem so far it looks like that our labels are in numerical form and while this is going to be fine for our neural network we probably want them in human readable form so what we're going to do is create a small list of the class names that we found on the datasets github page and that way we can index onto that list so that instead of just having a train label as 2 this actually reads as pullover so right down here create a small list so we can index onto our training labels so they're human readable again this is all part of becoming one with the data that we're working with understanding what kind of problem that we're working on so we go here all i'm going to do is just copy these in the same order so it'll be t-shirt top trouser pull over address so let's just write that down might speed this little section up wonderful now we've got a list of class names that reflects the actual human readable name and we've got their labels in integer form which will be great for our neural network so let's have a look how many classes are we dealing with 10 so again anything over 2 is classified as multi-class classification so now we've got our class label names let's plot another example so we go here plot an example image and it's label and go plt dot m show and which number or which index should we pick this time however we go 17 and then we're going to set the color map to be plt dot cm for color map to be binary because this is going to be grayscale we'll set the title plt.title now this is where we can index onto our class names list class names list with train label and the same index here or actually we can probably just create an index of choice variable boom so that we can just update that index of choice again you could probably just set this up to be just a random number generator if we really wanted to but this is how we're going to plot different examples wonderful so there's a t-shirt top we can do it again 20 there's a dress 10 t-shirt top 100 2000 there's a bag and there's a coat wonderful now again we could keep going through these and visualize different examples so that we know the data that we're working with but what's next how about we set up rather than just running this cell dozens of times we set up just some code to plot a bunch of random examples so that we can get familiar with them now with a data set like this you could probably just look at what it looks like here and start to understand okay this is pretty straightforward what it is it's ten different types of pictures and they're all gray scale but if you had a larger data set with 100 different categories and your images weren't grayscale they had a whole bunch of other details in them you probably want to have a look at multiple or hundreds if not thousands of samples before you start building a model and when i say that number again that's an arbitrary number it's just the number of samples that you start to feel familiar personally with the data that you're working with that's the most important part so let's write some code to plot multiple random images of fashion mnist import random plt.figure and right now we're working with an image dataset but the same would go for any type of multi-class classification data set that you're working with is to visualize visualize visualize as many samples as you can so we'll probably loop through four samples at a time we'll set up an axe with plt.plot actually we need a subplot don't we because we want to plot multiple different things we'll give it two rows two columns and the index can be i plus one because we're looping through we'll select a random index using random.choice and we'll set up a range for the length of how many training samples we have which is 60 000. so all this is going to do is just pick a random number in the length of our training data so it'll be from zero to fifty nine thousand nine hundred and ninety nine and then plt dot m show we're going to show the training data um sample that appears at the random index and the c map can equal plt.cm at the binary so it comes out in grayscale and then the title what i want the title to be we want class names and then again we'll index on our training labels where the random index occurs and we don't want any of the ticks because we don't need them we just want to see the images so hopefully this works we should be able to run this cell and visualize okay shirt bag sneaker trouser wonderful t-shirt shirt dress bag so you can imagine if we went through this this is a bit easier than what we just did here so this is what i like to do whenever i'm working with a fairly large data set is never underestimate the power of randomness because i like to look at samples randomly and just keep going through the data sets start to build an image of what these look like in my own mind before i start to build a neural network that's going to distinguish patterns in them now i want you to have a careful look at the data that we're working with what kind of shape does it take on are there many straight lines are there any curved lines how does that relate to what we've learned with linearity and non-linearity so do you think with this type of data that we're working with are we going to need a neural network that uses just linear like straight lines or will we need a neural network that's going to have some kind of non-linearity in it so that's something to think about but have a play around with this little section here become one with the data run it about 25 more times so you've seen at least 100 different samples and then next video we come back to our modeling we've got our data ready luckily it's already in tenses for us because we've downloaded it from the keras data set module however often times your data won't be ready turned into tenses so we'll take this step off now we're up to building or picking a pre-trained model to suit our problem so we'll get onto that in the next video okay so we've started to become familiar with the multi-class data that we're working with now let's build a model so building a multi-class classification model now i said earlier well we've already built a fair few binary classification models and i did mention earlier that the multi-class classification model we're only going to have to tweak a few things to get it to work with our multi-class classification data so if we go back to the keynote we look at the typical architecture of a classification model so we've got multi-class classification over here is what we're working on now so the input layer shape same as binary classification depends on the number of features you have number of hidden layers again same as binary classification neurons per hidden layer it's the same we've got another difference here it's the output layer shape so for binary classification is one one class or the other multi-class classification has one per class for the hidden activation we can use the same in binary classification we use the non-linear function relu and i asked you in the last video to think about whether or not we might need non-linear activation functions the output activation okay that's different too for multi-class we're going to need the softmax function rather than sigmoid we also need a different loss function so for binary classification we use binary cross-entropy but for multi-class classification we use categorical cross-entropy in tensorflow and then finally the optimizer can stay the same so we can use our trusty atom all right let's write some of these down if we go here for our multi-class classification model we can use a similar architecture to our binary classifiers however we're going to have to tweak a few things namely first of all is the input shape so what is our input shape what what shape is our data this is what we explored before so the train data sample zero dot shape okay so our input shape is 28 by 28 so let's write that down 28 by 28 the shape of one image wonderful and we're also going to have to modify the output shape what was the output shape if we come back to our typical architecture of a classification model the output layer shape is one per class so how many classes did we have again length class names beautiful so the output layer shape is 10 one per class of clothing that's what we want and what else was there ah the loss function loss function equals has to be tf carers.losses dot categorical cross entropy instead of binary cross entropy i'll put that there and then i think that's all we'll have to do beautiful because we can keep the other things the same ah the output activation that's what we need to write down output layout activation equals softmax not sigmoid all right so this being known how about we put together our first multi-class classification neural network model so we're going to go through the exact same steps that we've been through before so set random seed tf random dot set seed we'll give it 42 wonderful let's just try replicate the exact same model we've been using in previous videos so we'll go create the model i think we're up to model 11 now tf carers sequential wonderful and then what do we have we had two dense hidden layers tf carers layers dense with four hidden units and a relu activation so we'll just make two of those tfcares.layers.dense four and a reload activation wonderful non-linear because this is the answer to the question i posed before if you guess that we're going to need non-linear activation functions for our data you'd be correct because it is composed of straight lines and non-straight lines so that's why we need nonlinear activation functions for our output layer oh this is going to be different isn't it we can't use the exact same one how many output shapes do we need we need 10 wonderful we can do that 10 and we're going to change our activation from sigmoid to soft max so we can do it like this soft max or we can also do it like this tf carers activations dot soft max now i'm not sure if it's a capital s let's look it up tensorflow softmax activation there we go so i believe we might just be able to put it like that let's see if autocomplete helps us out softmax there we go wonderful now what else do we have to do we have to compile the model as always compile the model we're up to model 11 dot compile loss equals what do we have to change here we need to change it to categorical cross entropy tf carers losses or categorical cross entropy wonderful we can do that and then we can go optimizer we don't have to change that we can do our trusty atom optimizer we'll leave all the default parameters there and finally let's set up metrics which is set as accuracy our trusty accuracy beautiful so let's fit the model and we're going to save this as non-norm history and you might be asking why is that well i will reveal all in an upcoming video now we can fit directly on the train data and the train labels and micro just for 10 epochs so 10 passes through all of the data and we're going to introduce a new parameter here to the fit function which is validation data so this is where you can put in we don't have a validation data set but we already do have a test data set so what what we can do here is at the same time the model is fitting and trying to find patterns in the training data so the relationships between the training data and the training labels it can evaluate how well those patterns are on the validation data but in our case we don't have a dedicated validation set so we'll just use our test data here so this data will remain unseen and that way we can evaluate how good our model's patterns are that it's learning in the training data are when we use them on unseen data so let's see how this goes oh no what have we got wrong here value error shapes ah the classic shape error hmm what have we got shapes 31 32 and 32 28 10 are incompatible where are we getting these shape errors from ah four so what's 28 plus 4 that's 32 you know what i'm gonna have to introduce a new layer here and this is a layer that you're going to often need so it's worth exploring it a little bit we need to flatten our data and what does flatten mean i'm going to give it here input what's our input shape of our data input shape is 28 by 28 so the input shape we're going to pass it here as 28 28 this is telling our neural network that hey we're passing you some images that are 28 by 28 and you might be wondering what does flatten do well let's explore it we'll get a new cell here and go flatten model equals we're just going to create a model with a single layer tf care is sequential and tf carers of course if you want to skip ahead and just look up the documentation rather than listen to me talk about what the flattened layer does you can 100 do that but i like exploring things by writing code because i'm not sure about you but i tend to read like documentation three different times and still don't understand it when i write the code i start to understand it let's check the shape of this what is happening here ah none 784 where did 784 come from i'll give you a little hint what's 28 times 28 784 all right well now that we've seen the flattened layer in action now let's look it up in the documentation so tensorflow flattened layer platinum layer flattens the input okay does not affect the batch size do we have a demonstration here okay so this is the original shape 1 times 10 times 64. and if we use flatten on that it's going to turn it into 1 times 10 is 10 10 times 64 is 640 ah i see so what it turns it into instead of being a 28 by 28 array it flattens it all so that it's now of shape none 784 because what you'll often find is a neural network likes everything to be in one long vector and then we pass that through these other layers here and we get to the outcome that we like so our data needs to be flattened and we'll go here from 28 by 28 to none 784 so if you ever run into a shape error in your neural networks and you find that you haven't flattened your data into one long vector could be because you're not using a flattened layer as the very first layer of going into your neural network some layers can flatten your data automatically but typically you'll need to tell your neural network that hey here's the data here's my input shape i want you to compress that flatten that into one long vector and then pass it through your other layers so now that we've flattened our data let's see if it gets rid of this value error of the shapes that are incompatible and again as i said the shape error is one of the most common errors you'll come across oh no what have we got here value error shapes 32 1 and 10 are incompatible hmm you know what i think it's our loss function you might be thinking daniel that loss function is the exact same that we have in the multi-class classification in the architecture of a classification model what's going on well there are two types of loss function now one is for if your data is one hot encoded so if we go what's our training labels look like train labels zero so our training labels are in the form of integers so maybe we'll have a look at the first ten there we go so nine zero zero three zero two seven two five five let's have a look at the documentation for categorical cross entropy i'll just copy and paste that in there computes across entropy loss between the labels and predictions use this cross entropy loss function when there are two or more label classes we expect the labels to be provided in a one hot representation if you want to provide the labels as integers please use sparse categorical cross entropy loss so that is where we're getting our value error from now the shape error is coming because the loss function categorical cross entropy now i get confused with this all the time but this loss function expects our labels in one hot representation but if we change it to sparse categorical cross entropy it should work so let's try that sparse categorical cross entropy let's see what happens oh yes look at that our new network is running beautiful so two little tidbits to take away from this is that our binary class classification model that we've used before this is very similar this model 11 to all the other models we've been building throughout this entire section can work for multi-class classification data with a couple of tweaks namely defining what the input shape of our data is changing the output layer activation function as well as how many classes we're after and updating the loss function to reflect the problem that we're working with and also the style that our labels are in so i wonder if we can one hot tf.1 hot one hot train labels depth equals 10. what happens if we do that there we go oh i wonder if that'd work categorical cross entropy change that and then we just tf one hot train labels depth equals ten and then we can do the same we're gonna have to do the same for the validation this is our experimenting on the fly here depth equals 10. and we need an extra bracket here don't we boom now neural network starts to run as well wonderful so let's put a little note down here if your labels are one hot encoded use categorical cross entropy and if your labels are integer form use sparse categorical cross entropy this one has tripped me up a whole bunch of different times just takes a little bit of practice so again typo there if you get any shape errors with your models the three things you have to look at input shape output shape and the loss function that you're using so they're the three main value errors or shape errors that you're going to come across of course it could be more but they're the three that i most often run into so how exciting is that we've already built our first multi-class classification model let's continue on with where we were in the next video in the last video you built your first multi-class classification model so you should be very proud of yourself give yourself a little pat on the back but there was one thing we kind of forgot to talk about we did code it but we didn't really explain it is this validation data parameter so you might have noticed a change in the output of our model training log here if you see loss accuracy val loss val accuracy you might have sort of figured out what the vowel loss and the val accuracy is if not that's fine the loss here with no prefix is the loss on the training data so how wrong the model is trying to figure out the patterns between the training data and the training labels the accuracy here with no prefix is that the model's accuracy on the training data but the valve loss and the valve accuracy here is the model's loss on the validation data and the accuracy on the validation data now this is important because this is the data that a model has never seen before so it trains on the training data and then it validates itself to see how good its patterns are on the validation data so whenever you pass the validation data parameter with some kind of data you're going to get these extra outputs here and so this is a way to tell remember a model's results in the training data set don't necessarily reflect how it's going to perform in the real world you really want it to perform well on data it hasn't seen before to sort of get an idea of how it's going to perform in say your application or something like that so right now our model is getting a score an accuracy score of about 35 percent now that's better than guessing because we're working with 10 classes so if we do 10 or 100 [Music] 100 accuracy divided by 10 for 10 different classes if our model was just guessing it would get about 10 accuracy so okay we're getting about three and a half times that but let's see if we can improve it first let's get a model summary check the model summary i want to highlight something else before we move on to try and improve this accuracy is the input and output shapes of our model which is a very very important point check the model summary model 11 dot summary we go here so we can see that the flattened layer takes out 28 by 28 images flattens them into a 784 long vector it passes through these two dense layers so dense layer 1 here dense layer 2 there and then it gets output into a size of 10 for 10 different classes so now do you recall back when we were pre-processing data we haven't done much pre-processing data with this problem because the data we got from the keras data sets module the fashion mness dataset is already numerical however we spoke about in a previous video the concept of scaling or normalization if you're not sure if you can't remember it that's okay we're going to go back through it so we come back to our keynote i deliberately left this arrow here so that you would think about hmm what's daniel missing out on there well number one is turn all data into numbers luckily we've already done that number two is make sure all the attempts are the right shape well we've already been through that too so tick tick number three is scale features hmm what does this mean normalize or standardize neural networks tend to prefer normalization so let's remind ourselves of what that is better yet let's remind ourselves of our training data check the min and max values of the training data train data min and train data max alrighty so 0 and 255. now if we said in this slide that neural networks prefer normalization what is normalization well it's also referred to as scaling so let's write this down neural networks prefer data to be scaled or which is also referred to as normalized or depending on what circle you're from this means they like to have the numbers in the tenses they try to find patterns in between 0 and 1. however right now our data is between 0 and 255 so how might we get our training data and we're going to have to do the same for our validation data between zero and one well we can do that by dividing all of the data by the max number so let's have a look so write this down we can get our training and testing data between 0 and 1 by dividing by the maximum so this is referred to as scaling or normalization again sometimes you'll find different names for different things but if i use the word scaling or normalization i'm referring to getting our data set between zero and one so let's go here train data equals train data and we'll divide by 255 as a float because that's the max value 255 then we do the same for the test data equals test data divided by 255 or actually what we might do to save the fact that we're not overwriting our original variable we'll do train data norm and test data norm wonderful and then we can check the min and max values of the scaled training data train data norm dot min and train data norm.max boom what do you think they'll be zero and one beautiful now that our data is between zero and one let's see what happens when we model it so we're going to actually just use we'll change nothing absolutely nothing from model 11 except for the data that we're using so the only thing that it will change is we'll use train data norm and test data norm and everything else will stay the same so you can go ahead and try to replicate that before i replicate that in the video but i want you to think now that our data has been normalized it's one of the things that we can tune with our neural networks to make the performance better what do you think will happen with a different data using the exact same model so right if you're not sure let's find out so now our data is normalized let's build a model to find patterns in it so we'll come here set random seed tf random set seed and then we'll go create a model same as model 11. we'll call this model 12 actually a dozen models how good is that tf carers.sequential we're going to know how to do this off by heart and then we're going to need a flattened layer to get our data from 28 to 28 flatten input shape equals 28 28 telling our model hey i'm passing you images that are size 28 by 28 and then we'll create our two hidden layers with non-linear activations relu tfkara's layers dense so that our model can find non-straight line patterns in our data and then the output layer needs a shape of 10 because we have 10 different classes and an activation of softmax because we're dealing with multi-class classification and we can compile the model we go model 12 [Music] dot compile we'll set the loss function it's going to be tf keras losses dot sparse categorical cross entropy because why because our labels are in integer form i say integer like differently every single time i say it but if our labels are one hot encoded we just want to get rid of sparse the optimizer is going to be tf keras activations our trusty friend adam beautiful and then the metrics is going to be accuracy which is a great baseline metric for all classification problems and then finally we can fit the model we want norm history so we saved our model's training history to non-norm history i said i'd come back to this variable but this time we're going to use norm history model 12 dot fit on train data norm and then train labels is just going to be the same thing we use 10 epochs equals 10 wonderful and then the validation data is going to be test data norm and test labels look at us building multi-class classification neural networks like a boss all right do you reckon this will work what do you think is going to happen before we run it we've normalized our data that's the only thing we've changed have a think about it and i'm going to run the code in three two one hopefully no errors are of course activations activations did i spell that wrong oh no this has to be optimizers oops honest mistake again what have we got wrong here expected float 32 past a parameter y got type string hmm test data norm test labels oh this is an error i haven't seen before what are we missing out on here you know what i believe it's just simple brackets wow yeah so those are the type of errors you're going to run into just simple little things like that that was kind of a lucky guess but you can spend hours sometimes troubleshooting them so don't worry if you get stuck on something like that because even i myself who have built a fair few of these models still run into simple syntax errors like that but look what's happening here 10 epochs we've run this just as before what has changed between our two models the data we've normalized it that's it the val accuracy has shot up from zero point what was it before come back up here has shot up from 35 percent to 70 to 80 so it was over doubled so almost 2.5 times as good now all we did is we normalized our data so that's something to keep in mind so let me write this down as a key so um we'll get our key emoji out all right here note neural networks tend to prefer data in numerical form [Music] as well as scaled slash normalized so numbers between 0 and 1. beautiful so just by normalizing our data we got a fairly dramatic increase in performance how cool is that so that's something i want you to keep in mind so in the next video what we'll probably do is we've saved our [Music] norm history our model training history from norm history as well as nom nom history up here so that's a bit of a tongue twister to say so if you want to you can have a go at plotting those lost curves between each other from the history variables and see how they look compared to each other we've trained two neural networks using the exact same architecture but except one has or one was trained on non-normalized data and then we said neural networks tend to prefer data in numerical form that's definitely needs to be numerical form but they also prefer it to be scaled slash normalized in other words the numbers need to be between 0 and 1. so let's compare the loss curves of each model in other words the loss curves of normalized data versus non-normalized data so we'll grab pandas we'll import that and then we'll plot the non-normalized data loss curves so pd data frame non-norm history dot history and then we'll go dot plot give that a title of non [Music] normalized data beautiful and then we can plot normalized data loss curves pd dot data frame and we want norm history dot history or plot title equals normalized data wonderful let's see what this looks like wow so from these two plots we can see how much quicker the model with normalized data improved versus the model with non-normalized data so have a look at this if we go here a model with non-normalized data the loss decreased and then it kind of flattened out there but with normalized data our model's loss dropped it even started at a lower value than what this model finished with and then it kept decreasing as it kept going and the accuracy who knows maybe if we train this for longer we'll keep going the same with this one actually but it would probably take a fairly long time to find the same level of results that our normalized data found another key point to remember is when making comparisons of different models so we'll put down here oh what is our key emoji when comparing the results so write down note the same model with even slightly different data so this is the same data set all we've done is we've just turned it from non-normalized to normalized so it's the same data that we're working with but the same model with even slightly different data can produce dramatically let's get really dramatic here dramatically different results so when you're comparing models it's important to make sure you're comparing them on the same criteria eg same architecture this is the comparison we're making here but different data or same data but different architecture so that's something to keep in mind here when you're comparing comparing the results of different models keep your comparisons to only comparing as smaller variables as possible what i mean by that is don't change like 10 things and then compare one model to the other change one thing and then compare the results from one to the other so you know what change is making the difference in performance so with that in mind when we're pre-processing data let's come back here when we're getting our data ready for our neural networks we need to turn it all into numbers we already started with that with fashion mnist we had to make sure all of our tensors are in the right shape we did that using the flattened layer so the input shape to our neural network as well as the output shape of our neural network and number three scale your features so normalize or standardize them remember neural networks tend to prefer normalization with this being known how about we try tweak another one of our neural network hyper parameters let's see how we might find the ideal learning rate just as we have before in a previous video and see what happens to our neural network training we've seen how even slightly changing the data we input to our neural network can produce dramatically different results how about we try let's go in here finding the ideal learning rate and see if that changes anything so just as we've done before in a previous video we'll set the random seed so the ideal learning rate is the learning rate value where the loss decreases the most so to do that we're going to set up the exact same model that we did before create model we're up to model 13 now tf carers sequential and we had a flattened layer t of carers layers layers dot flatten wonderful and the input shape equals 28 28 of course that will change depending on the data you're working with to have cara's layers we need a dense layer we're spelling everything wrong today activation is going to be relu again not sure what's going on with the code editor i want to keep our code nice and neat right so if someone else had to read this they're not going holy gosh what did you even write here daniel and don't forget when you're writing code that someone else could be you just later layers dense we need a layer with 10 output units and an activation function of softmax because we're dealing with multi-class classification we can compile model model 13 dot compile everything is going to stay the same as above loss function when using tf carers losses sparse categorical cross entropy the optimizer is our trusty friend adam optimizes.adam wonderful and metrics we're going to use accuracy boom and now we're going to fit the model oh what do we have to do first i've got a step i'm too used to going create compile fit we need to create the learning rate callback so lr scheduler equals tf carers do you remember how to create a callback we need a callback and we need learning rate scheduler and we're going to go lambda epoch let's start at what can we start at 1 e negative 3. let's start at yeah negative 3 for now because our model's already performing pretty well so we'll start there epoch divided by 20 so that just means remember this just means start at this value here and slowly increase the learning rate every epoch by 10 to the power of epoch divided by 20. and to fit the learning rate we can go fit rlr history we'll create another history variable model. underscore 13 fit we're going to use train data we'll fit on the normalized data and train labels i'm going to go epochs we've got about 40 this time and then the validation data of course is going to be test data norm and test labels and then finally the callbacks is llr scheduler wonderful so this should work fingers crossed there we go okay so this is going to take about three seconds per epoch so what's that times 40 120 seconds so i'll come back i'll speed this up and i'll come back once uh our model has finished finding the ideal learning rate alrighty looks like our model has finished let's see what's happened here okay so it looks like it ends up on a pretty not as great validation accuracy as what it once was we get a pretty good range of values here but let's not look at just this training output let's how did we do it before we plotted the learning rate decay curve so let's do that again plot the learning rate decay curve so we want to import numpy as mp import matpot lib we've already got these in our notebook as before however if you were continuing on from this we'll just re-import them just for completeness lr's equals what did we do before one e negative three times 10 to the power of np a range 40 because we use 40 epochs divided by 20 and actually we don't even need to use numpy let's just use tensorflow stick in the spirit of using tensorflow and we're going to plot we want to semi log x because we want our learning rate to be on a log scale ls and then find lr history our history variable from up before dot history and we only want the loss component from the history and then we should make our plot pretty so we'll add x label this is a learning rate and the y label is going to be the loss and then the title can be finding finding the ideal learning rate let's see what this looks like alrighty so we can see when it started off at 10 to the power of negative 3 the loss decreased fairly sharply and then it kind of plateaued all the way up to 10 to the negative 2 then the loss just sharply increases as it gets closer to 10 to the negative 1. now if we come back to our keynote where is our finding the ideal learning rate slide there we go so the ideal learning rate is where the loss is decreasing sharply find the lowest point on the curve and then go back a little bit alrighty so let's do that so the lowest point on the curve is about here then if we went back a little bit to where it's still sharply decreasing i would say you know what 10 to the negative 3 is 10 to the negative 3 is probably our ideal learning rate which happens to be what optimizer are we using we're using atom so as i said this is proof look at that 0.001 that the default parameters for a lot of the different optimizers and other functions in tensorflow are pretty darn good so looks like for our problem in particular the ideal learning rate is just the default value for adam so with that being said for completeness let's refit a model with the ideal learning rate so let's refit a model with the ideal learning rate we'll go here set random seed tf random dot set seed 42 i don't want that symbol getting ahead of myself pressing the shift key shift key all the keys are chef keys because what are we doing we're cooking up neural networks that's what we're doing tf care is sequential wonderful now we come over here tf carers we have to flatten our data from 28 by 28 arrays or tenses into a vector of 784 in other words 28 by 28 and we go to tf carers dot layers dense just the exact same as what we're doing before activation equals relu we'll do another hidden layer layers dense for activation equals relu wonderful and then we can come down here tfkera's output layer what does our output layer need how many classes do we have we need a hidden unit for every class and we also need an activation function that is not sigmoid daniel come on softmax thank you for catching me on that one and we're going to compile model model.compile what's our loss function tf carers losses we're dealing with integer values so we need sparse categorical cross entropy when would we use just categorical cross entropy if our labels were in one hot encoded form now our optimizer we don't even we actually don't even have to change anything here with our optimizer but we will just because we spent all that time finding the ideal learning rate so we might as well put it in there lr equals 0.001 so again the default learning rate for atom is actually 0.001 we can check that by looking at the dock string wonderful and then we can set up metrics equals accuracy and we can fit the model let's save this to just history 14 so we're keeping track of what we're doing model 14 dot fit train data train labels epochs this time how about we go 20 epochs we found kind of what the oh no we've got the normalized data here because remember our model performs much better on normalized data we'll go the normalized data we've got the ideal learning rate which we actually didn't have to change anything for adam but will fit for a bit longer this time because we've worked out that our model is doing pretty good so maybe this time we'll the thing that will tweak is how long it looks at the data for validation data equals test data we need to do this as a tuple actually test data norm and test labels wonderful let's fit that shouldn't take too long invalid syntax of course oh did you catch that i didn't catch that wonderful so i'm going to let this fit but this should turn out to be a fairly well-trained model with close to the ideal learning rate and performing pretty well so we've got a couple more options based on what we've done in the previous lectures we can evaluate and improve our classification model so we've done a bit of improving so it's probably now time to once this is finished fitting is to start evaluating it with some techniques that we've used before so i'll let this run through and then i'll see you in the next video we'll start to to run some more evaluation methods on our multi-class classification model welcome back so now that we've got a model trained with the close to ideal learning rate and performing pretty well let's check back in with our workflow and see what we should do next so we've got build or pick a pre-trained model we've kind of done that we've definitely got our data ready turned into tensors we fit the model to the data we haven't quite made any predictions yet but making predictions is kind of synonymous with evaluating the model because again what happens during evaluation we compare what the model should have predicted with the what the model actually did predict we've actually already improved through experimentation as well so this is another key point to highlight here is that although this is kind of like a linear step through way to do things this is just like a rough overall guideline to steps in modeling with tensorflow you can always jump back and forth through different steps and suit them to whatever problem that you're working on with that being said let's make a little heading here evaluating our multi-class classification model and we'll turn this into markdown and let's put in here a couple of things that we could do or actually make this sound a little bit better to evaluate our multi-class classification model we could what are some things that we could do hmm evaluate its performance using other classification metrics such as a confusion matrix assess some of its predictions through visualizations that's always a fun one we could improve its results by training it for longer or changing the architecture now this isn't really evaluating but that's just another step that we could do now and then of course referring back to our steps of modeling in tensorflow we could save our trained model so that we could use it later so we'll put that there just so we know save and export it for use in an application wonderful how about in terms of sticking with evaluation we go through this oh and i've made a typo assess should have a double s on the end we go through the top two so right down here let's go through the top two so the first things first is create a confusion matrix now we've actually got some code up here that we did with the binary classification problem evaluating and improving our classification that should be model but let's find where's the confusion matrix code that we had before it should be somewhere up here far out we've written a lot of code here we go i'm back again you've reached the time of the part two video where i'm going to tell you about a special code that you can use to sign up to the full course that you're watching 20 plus hours more content on xero to mastery dot io for 15 off so if you do go to the the website and go to the academy page where you sign up and you enter the code t flow for tensorflow it's a shortened version of tensorflow t flow t f l o or capitals you'll get 15 percent off but keep this a secret between you and me you can share with your friends but don't leave a comment on this video because we want to keep it a surprise for people who've made it this far but that's enough for me um i'll probably only see you one more time by the end of this video i'll say goodbye to you enjoy so this is our remix of scikit-learn's plot confusion matrix what we might do instead of typing that all out again i might just break my rule here of writing code and i'm just going to copy everything that's in this cell and i'm going to come down to evaluating our multi-class classification model this is why it's handy to make headings here you know so really lays out your work so you can just jump between different sections create a confusion matrix and have an idea why don't we functionalize this because i kind of issued that as a challenge in a previous video it'd be cool if we could actually go through it together so from sk learn metrics import confusion matrix just in case we don't have it we want this to run as a standardized cell now to functionalize this what should we call it let's call it make confusion matrix and because the whole premise of an evaluation function or a confusion matrix in general this function here is comparing the test labels with predictions at a bare minimum our make confusion matrix should take in test labels but i believe scikit-learn calls them is it y true yeah let's make it y true so we stick with theirs and is it also why pred yeah why pred why true and then the classes we're going to set that equal to none because if we want a list of classes we can set it there and then what else should we set we've got the fig size we've hard coded that let's hard code that into here 1010 wonderful then what else is there well we could do the label size um why don't we just do the text size text size equals we'll set that to 15 by default okay that should be some good parameters there we'll have to tab all this across tab so it's part of the function and then now what do we have to update here so confusion matrix this is now going to be y true and then our preds are going to be y pride wonderful and then fig size equals fig size that works there so we can get rid of this yes yes wonderful classes equals we're gonna have to set that to i don't think we need that anymore so set labels to be classes so if classes exist now the labels should be the class names yes else they don't exist it should just be a range of how big our confusion matrix is wonderful anything else we need to change we could change these to text size text size that way i know i set it as 15 at the top but we'll just make all the text a very similar size why don't we do that that should work text size and we can adjust it size equals text size okay now all the text should be the exact same size beautiful so now we've got a function make confusion matrix which takes in a bunch of true labels and a bunch of predictions and classes that's what's important so let's run this make sure all the syntax is correct remind ourselves of what our class names are okay beautiful but what don't we have we have a function make confusion matrix and we could have really put some doc strings here to tell us what the function does but that's all right i'll leave that for you to explore we have some true labels so this is our test labels but we don't have any predictions with our model yet so let's make some predictions make some predictions with our model and we're going to create y probes for prediction probabilities you might be wondering what does that mean we kind of covered in a previous lecture but we're going to reiterate here so to make predictions we just go model 14.predict and then we want to make predictions on the test data and i'll put a little note here probs is short for prediction probabilities because we've got the activation function of our output layer the outputs of our model's predictions are going to be prediction probabilities so view the first five predictions we're going to have a look at why probs [Music] oh what do we get here why are these like that so has it already rounded it for us that is interesting so this is did we set our model to have let's come back up here train labels test labels sparse categorical cross entropy yes hmm that means it's outputted what label it should be i'm going to pause the video here and inspect this and then come back to see what's going on here now that took me a while but i figured out what went wrong here so i said that probs are short for prediction probabilities but as we see here we get whole numbers here now you might have spotted what went wrong our model predicted on the wrong data now this is a very important point to highlight here is that we want our model to predict on the same kind of data that it was trained on now what's the key difference here what did we do before with test data and test data norm that should be the hint as to what we've got wrong just before so here's our test data and our test data normalized what did we change well let's have a look at this first example here test data 0 and test data norm what did we do we normalized it remember so our test data samples still have values between 0 and 2.5 whereas our test data normalized have values between zero and one so that's why we get interesting outputs from our model when we ask it to predict on test data non-normalized but if we passed in test data norm which is with the variable we created before we get the outputs that we should get so that's a very key point i'm going to write a little note here even this is a reminder to myself let's get a key emoji in here note remember to make predictions on the same kind of data your model was trained on eg if your model was trained on normalized data you'll want to make predictions on normalized data beautiful so what are prediction probabilities so these are different numbers let's get the first one actually y probs we'll get the zero so the highest number here indicates the index so we'll get our class names back up indicates the index that our model thinks is most likely the value so i believe it is this last one maybe so why probs we can use the arg max let's use tensorflow tf argmax for arg maximum which is going to give us the index where the maximum probability occurs so all of these are values as to how likely the sample zero is t-shirt or top or trouser or pullover so the first one here is t-shirt on top so that's a fairly low number e to the negative 11. this is even lower so it's definitely not trouser or at least in the model's eyes it's definitely not trouser so it seems that the highest value is this one here which is should turn out to be about 0.8 so that's the last one so our sample zero our model is predicting as ankle boot so let's have a look here so numpy nine and then if we take that index tf arg max let's get y probs zero and then we index on our class names list with that we should get ankle boot wonderful okay so now let's turn our prediction probabilities from this from a prediction probabilities array into integers so we can do that by going and get rid of this now convert all of the prediction probabilities into integers so let's go y threads equals yprobs.argmax and we'll go on the first axis and then we will view the first 10 prediction labels why preds boom so now our predictions are in the same format as our test labels so what can we do now well we can compare the two so i will leave that as a challenge for you but in the next video we're going to use our mate confusion matrix function to compare these two and how about you could try another metric maybe looking to creating an accuracy score as well see if you can reproduce the accuracy score we got from model 14 dot evaluate by comparing these two wonderful now that we've got our models predictions in the same form as the true labels let's create a confusion matrix to evaluate our model's predictions but first we're going to create a boring confusion matrix using scikit-learn just so we can demonstrate how valuable our make confusion matrix function is confusion matrix why true is the test labels and why pred [Music] is why preds so let's see this it's kind of hard to tell what's going on here but with our confusion matrix we know that down diagonal we should have the highest numbers so looks like our model's performing pretty good across all the classes so the highest numbers are down the diagonal now let's remind ourselves of what an ideal confusion matrix looks like so the correct predictions the true positives and true negatives are down the diagonal but right now this is pretty hard to interpret so if we were to send that to someone they're probably going what the hell is going on here and this is where our pretty confusion matrix comes in so let's make a prettier confusion matrix and we'll go make confusion matrix yes yes yes y true equals just the same as before test labels y pred equals y preds now we have classes is going to equal class names that's our list of class names of the fashion mness data set then the fig size it's hard coded as 1010 but let's make it 15 15 just to have a little practice and then text size equals 10. let's see what happens look at that i think i'm gonna have to zoom out because that's a that's a big dog well maybe we we changed this to 10 10 see what happens might be the text is too big now yes it is so 15 15 was a good size so let's go back to that alrighty this one looks much better than before so this is where you can really see the power of a confusion matrix see how visual this is like you could you could send this to someone and they'd be like okay they'd start to intuitively figure out so the true label here let's explore this for the t-shirt slash top so again the darker the square the more predictions the model got right for this class however if we come over here this square is pretty dark i mean 16 of the predictions of t-shirt top were shirt ah okay well does it make sense that our model is getting confused between the shirt and the t-shirt slash top i mean to me how do you decipher those you know and then where's another one what else is our model getting confused with so pull over okay again the darkest squares down the diagonal that's good but it's also getting confused a lot with the coat class so it's predicting coat when it should have been pull over 188 different times now are they similar we're going to have to explore that in a second but you can kind of imagine pullovers looking similar to codes and what else is getting confused on ankle boot and sneaker okay well at least for ankle boots they're a shoe at least so a sneaker is a shoe so that kind of makes sense this is the kind of thing that you'll you'll be doing with your own problems i see the shirt class gets confused a lot with t-shirt top pull over it's to explore where your model is making errors and what you can do with this information is start to improve your models so maybe the t-shirt slash top class should actually be incorporated with the shirt class or maybe we need some more data of actual just shirts so that our model really learns to differentiate it between t-shirts and tops but anyway investigate this confusion matrix a little bit more it's nice and pretty in the next video let's go back through what did we say we're going to do to evaluate our model let's get visual a little bit more assess some of its predictions through visualizations so the confusion matrix is one way to visually explore it but it's something else to actually look at a picture and look at the label that our model predicted versus the true label so let's write some code to do that in the next video we're going to start this video off with a little key so let's go here okay note so what's our motto or one of our mottos is visualize visualize visualize so often when working with images and other forms of visual data it's a good idea to visualize as much as possible to develop a further understanding of the data and the inputs and outputs of your models so i've also discussed the power of randomness when exploring your data so how about we create a fun little function for let's write this down so we don't forget how about we create a fun little function for hmm what should we do we should plot a random image we should make a prediction on set image and we should label the plot with the truth label and the predicted label yeah this is a great great way to evaluate our model so let's do that first we'll need random from python because we want to plot a random image and now we'll create a function def plot random image i often create a lot of these little helper functions if i want to do something over and over again i make sure to functionalize it so that i can use it multiple times so we'll pass it our model where right now i think we're up to model 14 we'll pass it a list of images we want to inspect what the true labels are and the class names beautiful so we'll make a doc string here so it's a little bit complete picks uh what does it do clicks a random image plots it and labels it with a predicted [Music] with a prediction that will do and troop label wonderful so we need to choose a random number so set up random integer we'll go i equals random dot rand int between zero and len images oh we need a comma there does that make sense so i is just going to be a random number between 0 and the length of images and images is going to be the images we wanted to look at wonderful and now let's create predictions and targets so the target image is going to be images i or just index on our random number and then the pred probs is going to be model dot predict so it's just going to take our model here dot predict on target image dot reshape 1 28 28 so we'll make sure it's in the right shape for our model because we're only predicting on one image here so our model right now is trained on images in 28 28 size but we're telling our model hey we're only passing you one image at a time this time then we're going to do a pred label equals classes pred probs and we want to get the arg max wonderful and then the true label is just classes true labels i beautiful what else do we need oh yeah we need to plot the image so plot the m show we want target image to be the plot and we'll set it to a binary color map so it's just black and white beautiful and now let's do a little something a little bit fancy here all right we're going to change the color of the titles so change the color of the titles depending if i could type correctly depending on if the prediction is right or wrong so how might we do that let's just create a boolean so if pred label equals true label that makes sense so if the pred label is the same as the true label so the prediction our model has made is the same as the true label yes we want to set the color green because green means good and else the color equals red wonderful and then let's add x label information [Music] and we want prediction slash true label let's go plt dot x label and how can we do this and we just do it with the dot format pred what's the pred pred and then we want to change this to let's put the confidence in there why not two point f i always get confused when i'm typing out this little section of code there we go and what else do we want we want the true so the true is just going to be like that and then dot format on here we could have done this as an f string or could we because we have to have a calculation yeah we probably could have oh well that's right this will do for now we want 100 times for the confidence we want tf reduce max so in other words find the maximum value in pred probs and then we'll just finish off with true label and then i believe color goes down here color equals color we've definitely got something wrong in this function but so set the color to green or red based on if prediction is right or wrong so does this make sense plot random image passes a model picks up a random integer picks a target image from images then uses the model to make a prediction on the target image we have to reshape it to be because it's only one image the pred label is going to be classes.thread probs argmax so the index of pred probs which has the highest prediction probability the true label is going to be the true labels indexed on i yep that's correct plot the image target image cmo binary yada yada we could keep going through that but let's just run the code see if an error comes up oh i know we're just defining the function there we haven't actually run it yet let's do this check out a random image as well as its prediction oh yeah plot random image model equals our current model model 14 images equals test data because that's the test images we want to work with test or true labels sorry equals test labels and then classes is class names how does it look oh nice okay there we go so the prediction is coat 100 so that's a very high prediction probability but the true is shirt does that look like a shirt to you kind of let's look at some more ankle boot 100 true sandal ah is that an ankle boot i mean that's a fairly pixelated image that's kind of hard come on surely our model got something right prediction coat true is actually shirt well that's a bit see that's that one's on the edge to me as long as our function's correct there we go there's one that's right t-shirt top 100 true wonderful oh what do we also forget here this should be norm ah come on daniel remember always make predictions on the same type of data that your model has trained on i'm going to write a note here always make predictions on the same kind of data your model was trained on and by the same kind i mean pre-processed in the same way all right ankle boot yes that's what we want green show me some more green ah damn it t-shirt top it's a shirt see this is what i mean i'm not sure who created this data set but to me t-shirt and top is kind of the same thing as shirt maybe it's not maybe it's not trouser yes that's what we want shirt predicted or the true is a code kind of so as you see here you could keep going through this for a lot we could have really just set this up to plot more than one at a time so we don't just have to keep sitting and to rent and maybe that's a little bit of an extension for you is to see how you could functionalize this to plot say four images just like we did before when we're exploring our data so going through this i would say do another 20 or so go through them and then figure out does visualizing these predictions here help you to better understand this confusion matrix here and what i mean by that is does it start to make sense where the model gets confused in other words like does the overall shape of a trouser relate to pants or does the overall shape of an ankle boot is the model getting mixed up with a sneaker because an ankle boot is a similar shape to a sneaker does that make sense so we've done a bit of evaluating for our model how about the next video we've talked a lot throughout this entire series that our model is learning patterns so how about in the next video we discuss what patterns exactly is our model learning so have a little bit of a play around with our visualization function here see if you can start to get an idea of where the model gets most confused and compare that to the confusion matrix and then i'll see you back in the next video and we'll check out the patterns our model is learning to make these kind of predictions we've covered a fair bit well actually fair bit is probably an understatement we've covered a lot in neural network classification with tensorflow but this whole time we've been talking about how neural networks learn patterns in our data so that we can use those patterns later on but what exactly do those patterns look like well let's use this video to find out so first we're going to crack open one of our models so find the layers of our most recent model so if you're not sure by now take this as me stating it straightforward is that a deep learning model is constructed of layers and each one of those layers has a specific role in finding patterns in the numbers that we feed it so let's have a look at model 14 layers so we can see we start off with a flattened layer and then two hidden dense layers and then an output dense layer because we typically go from this direction so top to bottom so this is input this is output and now we can inspect what's going on in a target layer using indexing so how about we extract a particular layer i'm going to go with the first hidden dense layer by indexing on one wonderful so there we've got an individual dense layer there and now we can find the patterns learned by a particular layer using the get weights method so let's try that out so get the patterns of a layer in our network so we're going to set this up as weights and then biases we'll have a look at that in a moment model 14 layers one dot get weights you can just search this method up if you want but i prefer to figure out things ourselves and then we go shapes so we'll view weights and then we'll also view weights dot shape ah okay so this is what the internal patterns of a single layer or specifically the first hidden layer of our neural network look like so to you they might just look like a whole bunch of random numbers and i mean to me that's what they look like but to our neural network it considers these as the patterns which contribute to the decisions that it makes so we're working with fashion mnist data so the shape of this weights matrix corresponds to the shape of our input data so remember our input data was a 28 by 28 image that's where this number comes from so if we go 28 by 28 and then we remind ourselves of what our model looks like what our model input shapes look like so if we see here this is the 784 the flattened layer to begin with and then this four comes from the number of hidden units in our first dense layer so that means that for each data point in our input tensor our weights matrix has four because see here four has four numbers that it starts to learn and adjust to find patterns in these 784 numbers so just looking at this off face value can be very confusing and don't worry if you're not sure of it to begin with this is the first time we've ever cracked open one of our neural networks and this is kind of where the term deep learning gets the idea of black box from as in when you crack open a deep learning model you get all these random numbers if you or i will try to interpret them we can't really interpret what's going on here but somehow they correlate to our model finding patterns in the input data now what each one of these values does so each value in the weights matrix it corresponds to how a particular value in the data set should influence the network's decisions now you might be wondering how does our neural network even create such values how does it learn these values well and let's go back to the keynote to see a high level overview of how this might happen so we're working with the input data here which is grayscale images of the fashion mnist data set so we have code ankle boot shirt sneaker wonderful and then we might encode them to well we should have actually normalized this data shouldn't we we might transform them into a tensor to pass it to our neural network and then what's going to happen in our neural network we've seen this kind of overall schematic before is that it's going to learn the representation in other words patterns features weights these can all mean similar things when you're hearing a neural network talk for now we're referring to them as weights but i have been referring to them as patterns and so what's going to happen is at the beginning our neural network is going to initialize itself automatically with random weights at the very start so if we come back so all of these numbers the weights the internal weights of one of our neural network layers are going to start off as random numbers just completely random and it does this we can look at if we go to the tensorflow dense layer it does this using which parameter there we go kernel initialization now a little bit of extension on this video is that you can read into what this is glow root uniform but it might actually tell us here kernel initializer initializer for the kernel weights matrix so glow root uniform is a form of randomness so just take that from here we're not going to dive into what glow root is but just understand that that's a form of randomness now if we come back so it's going to initialize itself with random weights in each particular layer and then what's it going to do well we're going to show it different examples of the data we'd like it to learn so we might show it images of coats images of ankle boots shirts and sneakers and it's going to keep looking at these and slowly update its representation outputs or weights and biases into a different kind of tensor which is come back to here if we come back to our notebook this is what it's going to learn over time these representations as we continually repeat with more examples so does that make sense our neural network we feed it input data it starts off with random numbers random weights random patterns however you want to refer to it and then as it looks at more and more examples it's going to go hey i'm going to try these random numbers and see if they correlate to any of the patterns in the data and if they don't well the neural network's going to try and correct itself thanks to our optimizer adam and then we're going to repeat it with more examples as we have more data then it's going to slowly adjust these patterns to better suit the data as best it can to eventually hopefully it starts to output all correct predictions that's the ideal case so let's come back now we've only talked about the weights matrix so far but alongside a weights matrix is also a bias vector so let's come down here let's have a look now let's check out the bias vector let's go bias and biases shapes we'll go biases and biases dot shapes so again this is from a single layer more specifically the first hidden layer in our current neural network so we come down what does this correlate to number four so this is a bias vector there's only four here so it's this four here so that means for every hidden unit in our neural network in the first layer it has one bias vector whereas for a weight matrix this is the difference the key difference between a bias vector and a weights matrix a weights matrix has one value per data point whereas a bias vector only has one value per hidden unit and so what the bias vector does is we'll write this every neuron has a bias vector each of these is paired with a weight matrix and so the bias vector also gets initialized but this time let's have a look up in the dense layer what is our bias initializer zeros so you can kind of intuitively guess what zeros mean if we come down what does it say how does it get initialized initializer for the bias vector gets initialized as zeros at least in the case of a tensorflow dense layer now where it kind of gets tricky right is that sometimes depending on what layer you're using in deep learning so within the tensorflow carers layers module your weights matrix or the kernel initializer and the bias may be initialized differently however as you can see we never actually set these variables they got initialized by themselves so this is what i'm saying a lot of what tensorflow does for you the majority of calculations are done behind the scenes so of course you can dive deep into this as much as you want and i actually am big advocate for that but to begin with i just want you to focus on writing as many neural networks as possible and just getting them working and then once you want to know more start diving into the nuts and bolts of what's going on behind the scenes so we've said what a bias vector is now what does it do so the bias vector dictates how much the patterns within the corresponding weights matrix should influence the next layer okay so for every hidden unit there's a weight matrix and a bias vector so if we change this to 10 how many weights matrixes would there be and the same thing for this one how many bias vectors if we change both of these to 10 how many weights matrices would we have and how many bias vectors would we have just have a think about that for a second so this is a key point here as well how much the patterns within the corresponding weights matrix should influence the next layer another big thing about deep learning let's re-familiarize ourselves with our model architecture so model 14 summary what it looks like okay so we've built a few of these we've built a few deep learning models so now's right about the time to point out the whole concept of inputs and outputs not only relates to the input layer of a deep learning model and the output layer it relates to every single layer within a model so let's go to the keynote and go to the next slide so this is inputs and outputs layer by layer so if we imagine this is our model here very similar to what we've just built model 14 and if we we can create this in a second but if we imagine we have this input layer that takes in our input data in our case which is images 28 by 28 into a tensor we should have normalized this if we were really preparing our image data quickly so this is going to take the inputs and they output it to the flattened layer the flattened layer outputs to the dense layer or the first dense layer the first dense layer outputs to the second dense layer and the second dense layer outputs to the output layer so for each layer in a deep learning model the previous layer is its inputs so this is the crux of deep learning this is what makes deep learning deep is that you have multiple layers and of course as you added more layers the deeper the neural network would go but each subsequent layer does its part to find patterns in the original data and then feeds it on to the next layer so as you keep going through the patterns get more and more refined towards the ideal or hopefully they get close to the ideal outputs that you're after so let's have a look at how to replicate this and i believe that will probably be more than enough to wrap up this video so let's check out another way of viewing our deep learning models so we can do this with from tensorflow.keras utils import plot model and see the inputs and outputs of each layer plot model model 14 and show shapes equals true beautiful so this is what we have as i said this is inputs and outputs layer by layer so it starts off the input the none here is for batch size we'll tackle that in a future video students can look that up if they want let's just say we put in 28 by 28 images so that's the input then this layer outputs the inputs to the next layer and then this layer outputs the inputs to this following layer and then so on and so on and so on until we get our idealized output which is this 10 shape here because that's how many classes we have now we've covered all of this very quickly of what's actually going on i just wanted you to get familiar with cracking open one of our models now if you want to dig deeper on what's going on behind the scenes here and how these calculations actually happen layer to layer i've put some resources in the extension section but otherwise i think that just about wraps up everything in the introduction to neural network classification with tensorflow so go back through pat yourself on the back because we've covered an incredible amount take a break to let things sink in and when you're ready make sure you check out the exercises so i'll just put this here next check out exercises and extra curriculum check out the exercise and extra curriculum to practice and cement what you've learned i've put in some stuff there to not only practice everything that we've gone through that's in the exercises but as i said before the extra curriculum will help you really dive in to what's going on behind the scenes here so with that being said congratulations on finishing section two neural network classification with tensorflow i will see you in the next module holy bajibas if you've made it all the way to this endpoint and you followed along you've gone through over 14 hours of video on youtube and i trust you've learned a fair bit about tensorflow a fair bit about how to write neural network code to solve regression problems in other words predicting a number and writing neural network code to solve classification problems both binary classification and multi-class classification which are some of the most common problems in the field of machine learning so this is a me signing out to you and you're probably wondering has you won the same shirt through all of all these little clips and yes i have it's the magic of cinematography but there's a big props to you thank you so much for watching through this if you've loved it please let me know in the comments if you have any questions remember the github discussions page is your friend otherwise if you'd like to sign up to the full version of this course and learn a whole bunch more about tensorflow and deep learning in general there'll be a link below sign up to the xerodemastery.org academy and if you've seen the other videos you'll know what code to use when you sign up that's it from me congratulations happy machine learning happy deep learning and all the best learning tensorflow in the future you

Info

Channel: Daniel Bourke

Views: 30,820

Rating: 4.9888887 out of 5

Keywords: learn tensorflow, learn tensorflow and deep learning, learn tensorflow beginner friendly tutorial, learn tensorflow tutorial

Id: ZUKz4125WNI

Channel Id: undefined

Length: 237min 54sec (14274 seconds)

Published: Tue Mar 16 2021