Welcome to this course on deep learning for computer vision with TensorFlow. In this course, you'll master deep learning concepts and their applications in computer vision tasks, such as image classification, object detection, and image generation. You'll learn about tensors, variables, and neural networks, including convolutional neural networks through practical projects like predicting car prices and diagnosing malaria. You'll also learn advanced techniques for model performance, data augmentation, and deployment, as well as learn about modern convolutional neural networks, transfer learning, and transformers in vision. This course comes from NeuralLearn, which offers a variety of machine learning courses. Hi, everyone, and welcome to this course on deep learning for computer vision by neurallearn.ai. In this course, we shall make use of tools like Hug and Face, TensorFlow, Onyx, and 1DB to build and deploy different computer vision solutions. Applications of computer vision are everywhere today, going from Tesla self-driving cars to livestock farmers who can now automatically count the number of animals they have, to mobile facial recognition apps, and even in the hospitals where the gastroenterologists could make use of computer vision for much faster diagnostics. Behind those different computer vision solutions is deep learning models like the convolutional neural networks and the vision transformer. Throughout this course, we shall explain in detail how the confidence of the vision transformers work. Given that these are all deep learning-based models, let's look at a high level how deep learning works. Let's suppose we want to build a system where we get an input like this one, and we are able to say that this is a damaged car, or get an input like this one, and we say that this car is still intact. Then in that case, we could build and train a deep learning model, which takes in this image, for example, and then for its output here, we'll be able to say that this car is damaged. Or in the case where this model takes in this other image, then we'll be able to say that this car is intact. Nonetheless, for this deep learning model to be intelligent enough to make these kinds of decisions, we need to train it. And the way this training is carried out is we have thousands, hundreds of thousands, or even millions of damaged cars like this. And so our output level is obviously damaged. On the other hand, we have thousands of cars which are still intact, and the output level is intact. And so now we have what we'll call our data set, that is this input and your corresponding levels. And then in this deep learning model right here, we stack several neural network layers. So we'll have this, we have some other layer, some other layer, right up to the output. Now it should be noted that this neural network layers here are essentially mathematical functions. If you do not have a background in mathematics, you shouldn't be worried as in this course, we shall focus on implementing all this practically. So that said, as we're saying, this we have here, all these layers, these neural network layers we have here are essentially mathematical functions. And if we have a linear layer, then this means that we could take an input x, multiply this by a weight which we'll call w, and then add some bias which we'll call b. And then this gives us the output which is that output of a given layer. So here for example, let's suppose this input is x, we have x here, we'll take x, multiply it by the weight, and then add up the bias to have the output at this level y. Let's call this x1, then this is y1, and then this y1 will be used as input to this next layer, and then we'll have y2, right up to say yn. And so during the training process, given that we know the input and the output, our aim will be to obtain the values of this w and this b which ensure that when we pass in images like this or this, the model is able to know whether it's a damaged or an intact car. Now we see that at every layer we have our weights and biases which are all going to be updated during the training process, making use obviously of our training data, that is these inputs and their corresponding levels, such that when shown an image which this model has never seen, our deep learning model is able to take the right decision. So in essence, these kinds of models are called deep learning models because we have several layers stacked between the inputs and the outputs. And the number of layers we stack represent the depth of our model. On the other hand, deep learning algorithms fall under a category of artificial intelligence algorithms known as machine learning, where the model learns from the training data. A prerequisite to this course is knowledge of Python programming. So you could head over to the Neural Learn YouTube channel and check out our free essential Python programming playlist which will help you master the basics of Python programming. Let's now take a look at what we'll be learning in this course. The way we've designed this course is such that we'll start from the very basics. So we'll suppose that you have no prior experience in deep learning and so we'll start with basic topics in TensorFlow like the tensors and the variables. And from here, we'll dive into solving our first practical problem which is that of call price prediction. Now it should be noted that although this isn't necessarily a computer vision problem, this will permit us to learn how to prepare our data with TensorFlow, build simple models like the linear regression model, train these models, evaluate them, and test out this model such that by the time we move into our next project which happens to be now a computer vision based project which is that of Malaria diagnosis, we already have what it takes to start building more complex models like the convolutional neural network. But before building them, we'll first of all understand how and why they work. By the end of the section, we should have learned how to build a simple solution for our Malaria diagnosis problem and so now we're ready to dive into building more advanced models with TensorFlow. After looking at these more advanced models, we'll then dive into evaluating classification models. So we'll look at different metrics like the precision, recall, accuracy, and then we'll learn how to come up with the confusion metrics and the ROC plots. Once we're done with this, we'll dive into model performance. Here we shall look at TensorFlow callbacks, learning rate schedulers, model checkpointing, and then how to solve the problems of overfeeding and underfeeding. That said, one main way in which we could mitigate the problem of overfeeding is by using data augmentation and so we have the section reserved for data augmentation using TensorFlow and albumentations. Now once we're done with data augmentation, we'll look at more advanced concepts in TensorFlow like custom losses and metrics, the eager and graph modes, then custom training loops. So we'll learn how to train our model without necessarily using the fit method. We'll look at TensorBoard integration where we'll learn how to carry out data logging, viewing model graphs, hyperparameter tuning, then profiling and visualizations. From here, we'll get into machine learning operations with weights and biases where we shall look at how to carry out experiment tracking, hyperparameter tuning, data set, and model versioning with 1 dB. And now we shall move to our next project which is that of human emotions detection. Again here, we shall prepare our data set, build our model, carry out data augmentation, and then we shall look at TensorFlow records. Now once we're done with this section, we shall go ahead to look at modern convolutional neural networks like the AlexNets, VGGNets, ResNets, MobileNets, and the EfficientNet. With all this in place, we'll learn about transfer learning. We should help us now train our models much more efficiently using already pre-trained convolutional neural networks. Then we'll look at how the convolutional neural networks take decisions by visualizing intermediate layers. And so up to this point so far, we've been looking at the convolutional neural networks and now we'll be set to dive into the vision transformers. We'll understand how to work and even get to fine tune our own vision transformer using the hugging face transformers library. And so now that we have a working solution, the next logical step will be to deploy this such that anyone around the world could make use of our model which we've just built. And so we'll convert our trained model to the onyx format, we'll quantize this, build out a simple API, and then go ahead to deploy this API to the cloud. Now that we've learned how to deploy our computer vision models to the cloud, we could dive into other computer vision problems like object detection. And in this section, we'll look at the basics of object detection and also build and train our own object detection YOLO model from scratch with TensorFlow. Then finally, we'll dive into the domain of image generation where we'll look at the variational auto encoders and the generative adversarial neural networks which we shall use for digit generation and face generation. In this course, the coding platform we'll use to build and train our models will be Google collab. We shall make use of Google's free GPUs to train our models. We've just explained that in order to train a model like this one, we make use of a training data set which has its inputs and its corresponding outputs. These inputs and outputs are multi-dimensional arrays which are commonly known as tensors. In this section, we shall start with tensor basics. Then we'll move on to casting and tensor flow. We'll look at initialization, indexing, broadcasting, algebraic operations, matrix operations, commonly used functions in machine learning. We'll look at the different types of tensors like the ranked tensors, sparse tensors, and even the string tensors. In the context of deep learning, tensors can be defined as multi-dimensional arrays. An array itself is an ordered arrangement of numbers. It's important we take note of these keywords as the data we shall be dealing with like for example this image right here can be represented using these numbers which have been arranged clearly in an ordered manner and can be represented in multiple dimensions. In the specific case, we have this array which is represented in one, two dimensions. So this is a two z or two dimensional array or what we generally call a matrix. Now we'll explore different types of arrays based on their dimensionality. So here for example we have what we'll call a zero dimensional array which is simply because this array or this tensor contains a single element. So let's say we have 21. This is a zero dimensional array. Let's say we will have one. This is zero dimensional so on and so forth. So essentially once we have a single element then it's a zero d array and then now for our next example we have this 1d tensor right here which in fact is a combination of several zero d tensors. So if you look at this you see that this is a zero d tensor. This is another zero d tensor. This is another zero d tensor. So this vector is made of three of these kinds of elements. Now we could have other examples like this. Let's say we have five, eight and then three. Let's change the length. So let's say we have one of length five for example. We could have ten, two, eleven, four and seven. So this is a 1d tensor we have right here. So so far we've looked at the zero d tensor. We've looked at the 1d tensor and now we could dive into the 2d tensor which essentially is made of a combination of several 1d tensors. You could see that right here. This is a 1d tensor here. This is another 1d tensor. This is another 1d tensor and finally here we have a 1d tensor. So when you bring together this 1d tensors you form this 2d tensor. For a three-dimensional tensor you might have guessed this right. You would simply combine several two-dimensional tensors. So right here we have this 2d tensor with this other 2d with this other 2d and this other 2d forming a 2vd tensor. We could visualize this differently here. So we take this this this and this. Now you see we have one two and the third dimension. Now that we understand this we're going to take a look at the concept of tensor shapes. Starting with this given that it's zero dimensional then there is no shape. With this one here it's one dimensional and it's made of three elements and so we could say this is of shape three. So we have three and that's it. Now for this other one it's made of one two three and four 1d tensors and each and every one of this is made of three elements. So this here for example is made of three elements. This made of three, this three, this three and given that there are four of this made of three elements its shape is four by three. So that's it. That's how we obtain the shape for this. Now for the 3d tensor we have here it's made of one two three four 2d tensors. So this here we have four two or two d tensors and each and every one of this is made of two 1d tensors where each and every one is made of three elements. So what we'll see here is this is two by three this year because we have one two is two by three this is two by three and this year is two by three and now given that we have four of this different two by three um two d tensors then we will see the shape of this 3d tensor is four by two by three. So similar to this we have four by three this is four by two by three. Now notice that the number of elements we have here tells us or gives us information about the number of dimensions our tensor is. So here because we have a 1d tensor or one dimensional tensor we have just one here because it's 2d or two dimensional we have two and here because it's 3d we have three. Also in matrix notation we would say that this year or this matrix right here is made of one two three and four rows. So number of rows we have here is four and then here we have one two three columns number of columns is three. In the case of this 3d tensor we have right here we have one two rows and one two three columns for each and every one of this 2d tensor we have right here. Now we have some basic knowledge on tensors let's go ahead and create them with tensor flow. So first things first we'll import tensor flow as tf and then you should note that because we're using colab we do not need to install tensor flow before making use of it all we need to do here is just import and we're good to go. Now first thing we're going to do is we're going to have our tensor which we'll call tensor 0d and the way we're going to create this tensor is by calling on the constant method. So we have tensor flow's constant method and then we specify for example that we want this to be let's say four so that's it that's how we create a zero-dimensional tensor. You could see here from the documentation that this constant method takes in a value takes in a data type takes in a shape and a name but for now we pass in only this value right here and then we could go ahead and print this so let's have tensor 0d run that you could see right here we've had this tensor which has value four it has no shape and it's of type int 32. With this we could go ahead and build out our tensor 1d, tensor 1d we have as usual our constant method but this time around because it's 1d we're going to have a list. We're going to take this list we had here 2 0 negative 3 and then putting this now right here we have 2 0 and negative 3. Now let's print this out we have tensor 1d and there we go you see we have the value which is this 1d or this list we could see shape right here and then the data type now let's modify this and say we have 8 and let's say 90 let's run this now and check out its shape you see here this is our new list or this our new tensor and then you see now it's five because we have five elements in here but again this is still an integer so let's add this here changes to a float and then see what we have see here now this is a float so it's still our inputs by this time around floats same shape but the data type changes from int to float let's get back to integer and then we move now to the 2d tensor so we've looked at 0d 1d to create a 2d tensor it's going to be quite as simple as we've done with the 0d and 1d so here we have 2d and then we have as usual our constant and then let's open up the square brackets so we have the square brackets right here open up and then we'll have this 2d tensor we have seen already so let's have this right here we have 1 2 0 we put a comma to move to the next row or to the next line we have 3 5 minus 1 then we have 1 5 6 and then we have 2 3 8 okay so as you could see here we have this 1d tensors which we stuck and which will form our 2d tensor now as usual we have our constant method which takes in this value and you should also take note of the fact that we have this outside of this stacked 1d tensors now we have this let's go ahead and print out tensor 2d we have tensor 2d and there we go as you could see we have our 2d tensor that's it we have the shape 4 by 3 which is what's expected and then the data type is in 32 from here we move on to the 3d tensor which is essentially made of this 2d tensors which have been stacked together so let's go ahead and see how to create this with tensor flow we'll start by putting together those 2d tensors we had 1 2 0 3 5 minus 1 so that's it 3 5 minus 1 and then we have all these remaining 2d tensors so let's get here we have tensor 3 tensor 3d it's a constant we have our square bracket here then we simply copy this copy this and then paste this in here now let's print this out and see what we get so here we have tensor 3d as you can see we get an error and this comes from the fact that we omitted commas now if you look at this you'll see that uh between one 1d and another 1d to form this 2d there is a comma and so this means that between this 2d and this other 2d there should be a comma so let's have this comma here let's have this comma let's have this comma and this comma okay so let's run that again and see what we get as you could see we have this um our 3d output of shape 4 by 2 by 3 and that's exactly what we expect to get data type int 32 now before we move on we could we should note that we could also do let's let's have this here we could also do um tensor 3d shape to obtain this tensor shape so you see that we have that um right there we could replace this zero we could have one see that's zero that's one that's two um there we go let's let's have one that's one um that's it okay and also we could also have here let's say three we could do ending and this gives us a value of three showing us that our tensor is a 3d tensor at this point we've looked at zero d 1d 2d and even the 3d tensor let's go ahead and check out a 4d tensor but to construct a 4d tensor we need several 3d tensors so let's have this here we have this 3d well we saw this already this is this 3d and then we move to this other one right here or this two others we we have this other 3d and this other 3d and then now if we stack this three 3ds up what would obtain would be a 4d tensor see we have this one we add this other and we have this other so now we have here a four dimensional tensor which is made of three three dimensional tensors now you should know that this could be two this could be four this could be whatever number so if we consider only this two then we're talking about two three dimensional tensors but uh what's so important to note here is the fact that once you stack several 3d tensors you create a 4d tensor now at this point you could take it as an exercise to create your own 4d tensor so you could pause the video but um we are going to go ahead and show you how to create this first things first we're going to take up our 3d here we have this 3d tensor let's just copy this there we go we paste this out here let's repeat that operation copy and paste again but now let's modify some values so let's say we have 13 26 let's just randomly modify this um 23 so it's now this this other 3d is different from the other one say 30 okay now let's copy this again and then paste out here so we have the three 3d tensors um let's say this is one oh let's just add zeros we have here um 23 um two four and six okay so now we have our three 3d tensors we have this one we have this other one and then we have this other one now to create our 4d tensor as usual we call on our constant method so we have 4d um we have the constant and as usual we have our square brackets which will open up and then now we just simply take this copy all this to the end cut this from here and then paste this out here okay so we see we have three of these um 3d tensors now remember last time when you were separating two 3d tensors or between the three d tensors we need to have a comma so here we're gonna have this comma and then here we're gonna have this comma and here we have a comma okay so that's it so let's run this oh let's print this out and then see what we get so here we have print um tensor 4d there we go you can see from here we have our 4d tensor you see it's cheap three by four by two by three and this is because our 4d tensor is made of one stick this off it's made of one two three three d tensors so that's why we have three and then for each and every one of this we have four that's one two three four two d tensors so here we have four and for every two d tensor we have one two that's two rows and one two three that's three columns and that's now exactly what we have here so we still have the same data type we have uh values which we could have here and that's it if we get back here we'll see that we have different data types which we could use so instead of um does this default int uh we could have um tf float say 32 and you'll notice that the outputs as we have seen already should have this decimal since we're now dealing with floats now the float we have in here is of type um float 32 or specifically the float 32 meaning that the position value here is 32 now if you reduce this if you reduce this you'll see that you actually have the same output but the difference is that less memories are located for storing this tensor as compared to maybe even let's say the 64 that's float 64 and so in certain contexts where we have the memory constraint we would want to use the lower precision tensors in the documentation you could have um all the different data types um which are supported in tensorflow so here we have quantized data types and talking about quantization we're going to treat this in subsequent sections and so don't bother for now if you don't really master this um here we have the brain floating point here we have the boolean here we have complex and it's 128 and 64 bit versions here we have the double which in fact means the double precision floating point now this is double because the float 32 is a single precision so here we have single the float 16 is half prediction or precision and then the double is double precision here we have the ints under different versions or the different uh positions we have the quantized ints um we have resource we have string we have unsigned int and then we have variant getting back to the code if for example here we have this float let's just add a decimal so we suppose that this is a float and then now we do int let's say int 64 we run that you'll see that would obtain an error so let's have this here what do we see we told that we cannot convert this um tensor to or to eager tensor of the data type int 64 but now if we use the cast method if we do tensorflow cast um and then here all right let's take this back let's let's have our tensorflow or our tensor 1d and then let's take this off or let's yeah let's just have this float um 64 there we go or let's say float 32 and then let's say we define some casted tensor casted tensor 1d and then this casted tensor 1d is simply going to be the casted version of this or 1d tensor so here we have tensor 1d and then we specify the data type and now we say we want this to be in 16 for example so here we're going to print out the tensor 1d i'm also going to print out the casted tensor 1d we get in this error we should have had tf dot so let's have this here there we go and then let's look at what we have as output as you can see thanks to the casting operation we are able to get from this float to this integer tensor and so this tells us that instead of coming right here and saying for example int 16 or let's say int 32 it's preferable to make use of the cast method that said let's say for example instead of having this that's instead of casting this into an integer we want to cast this into a boolean so let's run this you see here that we get all true except for this one year which is false and that's simply because this is a zero so it looks like the what a casting method does is for all the values which are different from zero they're true but values equal zero equal false we could also go ahead and create our own boolean so let's say we have tensor tensor bool we call this tensor bool and then we have all this lists made of this tensor so we could have true true false true true false okay so let's print out tensor bool and see what we get as you can see here we have our output tensor of shape three and here is values you see the data type bool right here now another data type which you could look at is um the string so let's let's say we want to have tensor string um there we go let's say we want to have um halo world halo world um let's print this out tensor string there we go we have our tensor we could also put this in the list because right now this is a zero dimensional tensor so let's put this in the list and um hello world hi and we'll close that and now we have this 1d tensor of shape two which is a string from here we would also look at how to create or how to convert a non-py array into a tensor so let's call this um np array first of all we'll start by importing non-py as np we could just do that right here import non-py as np there we go and then we have um np array and let's say we have np array and then let's take in this here let's say one two three or one two four there we go print this out we have np array right here now we could make use of this tensor flow convert to tensor method so let's say we have your converted converted tensor and we have um convert to tensor which essentially takes in a non-py array now let's have this and then print out the converted um tensor with that getting back to the documentation we will look at a couple of other um methods like the i method here we have this i method right here as usual we have the well described method in the documentation which comes with this um short phrase explaining how it works it comes with different arguments and also even some examples so getting back here we told that this i method permits us construct an identity matrix or batch of matrices so now we have this definition we could simply copy this out and then get back to the code right here paste this out and let's say we call this um i tensor i tensor and for now we have this number of columns which is known the batch shape known data type float 32 name known now let's say we want to have three rows so we have number of rows which is equal three now in this case uh we'll print this out so we see the kind of output we get here we have i tensor and there we go as you could see we have this identity matrix where all the values of the matrix are zero except for those of the leading diagonal let's um show clearly this leading diagonal right here here we have our matrix which is this and then we have this element of the leading diagonal which is equal one now you should note that you could you could say for example um let's say three times this and you should have all the elements of the leading diagonal to be equal to three and the others zero so in this case we now have someone like this so you see all this three while the rest zero now let's get back let's take this off here we have our i tensor and then we see that we could specify the number of columns now here or previously where this was set to known we had a square matrix that is when you define the number of rows to be three the number of columns automatically takes up the value of three so we have a three by three matrix and obviously the type here is float 32 could modify this and say for example 16 from that let's try out bold after this there we go we have float 16 let's try out bold you see everywhere is false false false false false false except for the leading diagonal which has true now getting back let's say here we have five and then now number of columns let's set this to three you see right here this is the output we get now it's it acts like we have a five by five matrix so we suppose we had a five by five matrix where this would be zero zero this would have been zero zero this would have been zero zero this would have been one if this was five by five zero and then here zero one so this is what we would have had if we do not if we let this to to be known that is number of rows equal number of columns but now that we've set number of columns to be three this part has been cut off and so this is what we're left with now with that said let's go ahead and take this back to known and then let's set a bad shape so let's say this is three around that as you could see because we set this bad shape to be three here we have this output which is three by five by five well let's let's change this to two so you could see that this actually is responsible for deciding on what number of batches we have right here so here we have two by five by five so essentially saying that you want to have this bad shape to be two like this means you want two five by five matrices which are identity matrices that is which have their values or zeros except for those of the leading diagonal which are equal one and so we see how we could create this 3d matrix or this 3d tensor from the i method now we could go ahead and say for example one foyer let's run this and see what we get we should get two by four two by four by five by five here we have two by four by five by five and that's it for the i method we move on to the next method the field method here we have a method as defined here in the documentation which creates a tensor filled with a scalar value so here we have this field method which we're going to copy but before testing out the code you could see here from this example that with this field method we have this tensor which takes all which has shape two by three and which has value nine on all different positions so let's get back here paste this out let's call this field tensor and let's say this dimensions let's say we have three by four and then let's still want to have the value um let's say five okay so let's print this out and see what we get field tensor and there we go that's the output we get now let's have this one for example run that see we're able to create this 3d tensor where all the values all the elements in each position takes the value five from here we'll go on to the ones method and it should be noted that this ones method is quite similar to the field method in the sense that this creates a tensor just like with a field method but the only difference is that here all the elements are set to one so you notice that here in this definition we do not have this value argument right here we had a value which we passed but with the ones there is no value because the value by default is one so let's have this here and then paste out right here see we have this ones so we'll call this ones tensor um there we go we specify a shape let's say five by three and then we print that out so here we have ones tensor and there we go we have this output matrix which um is made of five rows one two three four five and three columns one two three where all the elements of this tensor take up the value one now you could obviously do this to have or to obtain a 3d tensor so that's the output we get you see it's five by three by two from here we'll look at the ones like and what this one does is it creates a tensor of all ones that has the same shape as the input so what this means is if you have an input like this one let's say we have twelve let's not make that one twelve one three and then five um seven two this actually a two by three matrix let's change this so you see that the this is two so we have the shape the shape is um two by three so its shape is two by three now if this is the input here if this is the input into our ones like method what we'll get as output will be um a matrix or tensor with this same shape that is we shape two by three so our output is going to be having a shape two by three meaning that we're going to have um two rows and then one two three columns but the output or rather the values we're going to insert here will be all ones so now we're going to have one one one one one one so hence the name ones like so this like is actually for the shape of this so you imitate in the input with respect to its shape now getting back here you see we had this field tensor which is a one by three by four tensor so let's um let's have this right here let's say we want to have uh ones like tensor there we go tf ones like and then we have field tensor field tensor there we go okay so let's print out those ones like tensor now before printing let's um try to obtain the output so here we know that this is field tensor the shape is one by three by four so it means that we should have an output of shape one by three by four where all these values are ones so let's run that and and make sure that that's the output we have there we go c is one by three by four and we have all ones from here we move to the zeros method so it's quite similar to the ones here uh just like with the ones we have um all elements set to a given value with the ones all the elements will set to one with the zeros all elements are set to zero so that's essentially it now let's just modify the code a little right here so this was once tensor um let's change the shape this three by two and then instead of ones here let's say zeros here we have zeros and um let's call this well let's say this is let's just copy this out and put out separately so here we have this you paste that out here we have zeros and here we have zeros fine and here we have zeros okay so let's run that and there we go we should have this here so you see three by two three rows two columns that's fine and all values are zeros if you were once then you have all values once and for the ones or the zeros like is similar to the ones like so you could take that as a simple exercise we then move forward we have this shape method right here where we thought that this returns a tensor take note of that it returns a tensor containing the shape of the input tensor so um as we had seen before we could obtain the shape of a tensor by simply um making use of that tensor's name dot shape now if we want to have an output like in our case now we want to have this output um tensor which contains the shape of our tensor then we could make use of this shape method from tensorflow so you see here we have the same four by two by three but this is now in this list um and then we have its shape and we also have its data type obviously it's made of integer so we shouldn't expect to have a float data type right here another method which we could make use of is a rank method which simply returns the rank of a tensor so it takes an input and then returns its rank you have a simple example right here where we have this 3d tensor t and then when you call on the rank method you see that we obtain the output which is three so let's um put that out here let's just paste it out here and test that quickly there we go as you could see because the output um is a zero-dimensional tensor the shape it actually has no shape but you could see its value right here that's three now if you take out this and take out this and then take out this other part here there we go we run this again we should have a rank of two this time around that's fine it's here rank is two we now move to the size here the size method which returns the size of a tensor so we put an input um in the size method and it gives you the size of the tensor we have as usual in the example here which we could copy but before running that we could read this um note here what we told this returns a 0d tensor representing the number of elements in the input of type output type this is the output type here we by default is int 32 so let's get back here and paste this out and check out the size as you could see here we have this size of 12 and that's because we have 12 different positions or 12 different elements in our tensor now let's take this off again take this off take this off there we go let's run that we should have six so that's it you see we have six and here we could also specify this d type um let's say float 32 we get an error here uh got an unexpected keyword argument d type now getting back here you see this is actually out type not d type so let's get back here and then we have um out type so we have out type there we go run that and as you could see right now we have a float instead of an int the next step we'll take is creating our own random tensors and that is essentially creating tensors which take up random values now in the case of tensorflow random normal our output values are going to be from or are going to be drawn from a normal distribution and we are going to explain at least at a high level what this really means for now let's just copy this out and get back to the code and run this and see what we get so right here let's say we have random tensor there we go we need to specify the shape so we'll say let's take let's make um a matrix or 2d tensor so let's say it's three by two now you see we have this mean value standard deviation values which have been given right here by default and we're going to look um at what this actually means shortly the data type is float 32 cdc is known name noon now let's run this or let's print this out and see what we get so here we have random tensor and there we go as you can see we have the set of values which are all negatives and if you notice they are very close to zero so here we have practically negative one about negative two negative zero point three very small number and negative zero point five negative zero point eight seven so this number is very close to seven let's run this again so you see that we're able to randomly generate a matrix with this shape with random values it takes up different numbers this time around we have a mixture of positive and negative numbers but again these numbers are very close to zero now what if we modify this mean so we modify this mean run this again let's see what we obtain you see that we have uh these numbers now which are instead close to a hundred and so what this tells us is that we could make use of this mean and the standard deviation to decide on the kinds of random values we want to have here now to better understand what is going on let's consider this figure from this probability playground by Adam Kerningam from the university of Buffalo as you had noticed in the code when we did tensorflow random normal and the mean when we specified the mean yeah the mean is this mu you have here when we specify the mean to be equal zero most of the values we had were surrounding zero and when we set this mean to a hundred when we set it to a hundred most of the values were surrounding a hundred now this curve you have here actually explains why so you have this curve right here uh which is bell shaped and the idea here is that it permits us randomly pick values around the mean that's around zero so that's why you'd notice that at zero we have the highest probability score here our probability score is f of x so um for values around our values surrounding zero that's this value surrounding zero so there is much higher probability or chance of them being picked as compared to values far away from zero so let's pick out these two values let's say we have um let's say we want to pick out the two values 0.5 let's say 0.5 and we want to pick out negative five you'll see from here that the probability of of us having negative five is particularly zero that's almost zero but the probability of us having 0.5 as you could see from here if you're taking this middle the probability of us having 0.5 is about 0.35 now if we change this value and take a hundred as we didn't with the code let's take a hundred we have that um hundred well it looks like they fixed this to six so we cannot go above this we could play around this like this you go from negative six to six okay let's say the mean is six so we have a mean of six one thing you can notice is that now this um 0.5 as you could see 0.5 let's take this off um 0.5 no longer has a probability of about 0.35 of being picked 0.5 has a probability now of about zero of being picked while negative five also still has a probability of about zero being picked but um on the other hand as you could see here let's take this off you see that values like five have much higher probability have been picked seven eight six they now have much higher probabilities have been picked now this means that if this was a hundred or if we're able to get to a hundred then uh will be values like 97 uh 98 99 100 101 102 and this explains why when we run this code with this mean we had values surrounding a hundred and that's simply because those values surrounding the mean have um higher probabilities of being picked now let's take this back to zero and then talk about the standard deviation so we have the zero there we go see get back to zero and now let's talk about a standard deviation but before um get into any um explanations let's modify this let's say we take 10 here okay it looks like the max is fixed at 2.5 okay so let's let's say we have that 2.5 one thing you can notice is we still have this our mean at zero we still have our mean at zero does it make sense because we haven't changed this but our bell curved um shape now appears wider so an increase in sigma sigma here is what we call the standard deviation so it's what we called std dev in the code so we had a mean and the standard deviation so sigma standard deviation the sigma square is the variance so sigma square is the variance anyway uh what we're trying to say here is an increase in sigma will make this um curve wider and a decrease let's decrease sigma you see decrease in sigma makes the curve thinner and narrower now what this implies is that if we have let's say let's get back to this if we have sigma square as 2.5 there are much higher chances now that we could um randomly select negative five as compared to before so if this is negative five here you see there's a slightly higher probability of negative five and peaked as compared to when we reduce this you see when we reduce this um you see negative five here is practically zero but when we increase see negative five is at least this time around having some negligible value and so this is essentially the row that the standard deviation plays now if we reduce this again we find that even 0.5 because this is 0.5 around here even 0.5 let's take this off even 0.5 which used to have a probability of about 0.35 of occurring um now has almost no chance of being picked let's increase it back and set it back to zero okay and so that's it at this point we could modify this we could take this to six and then take this to 2.5 and you see that we could take values now between zero see go from zero to about um 11 with that said we'll then look at the uniform distribution or how to generate random numbers or random values drawn from a uniform distribution so again we'll just copy this out and then get back to the code we'll paste this here there we go we have the shape let's pick um this uh one okay and then minimum value zero and now we have maximum value so you see that unlike here where we have the minimum the the mean and the standard deviation now we have instead of a minimum value and a maximum value so let's let's let's leave the leave it at this and then let's say we have random tensor random tensor okay so let's print this out let's print out our random tensor there we go we have random tensor let's increase this so we have many more values five from that okay you see we have these values now let's let's say we change this max value and say it's um eight eighty or let's say eight okay let's say eight let's run that and what do you notice now is you have much larger values well values between zero and eight but um before the values we're getting where values ranging between zero and one now this tells us that most probably this maximum value by default is one so here uh you see it's default so one you could always make use of the documentation whenever you're in doubt so let's get back here you see we'll take this to 100 and then see what we get you see we have this values now between zero and 100 now the question you may be asking yourself is what then is the difference between this uniform and this um normal distribution so let's get back again here um in our probability playground and then let's pick out our uniform so there as you could see here many other properties distributions but we're going to focus on the normal distribution and the uniform distribution so we get into the uniform distribution and here's what we get so we have a and b now this is like our mean vowel which we saw in the code and this is like our max vowel so when we say mean vowel negative one max vowel one obviously our values will fall in this range now by default this is zero and this is one on tensorflow so um that's why you saw the values ranging between zero and one now as the term goes it's actually uniform so this means that all this um values here have equal chances or probabilities of being picked and so unlike the normal distribution where we would have something like this something like this now we have this square instead or this rectangle and so simply here all we're doing is picking the range of values which we want to have or which we want to be outputted so we could get here increase this um where you cannot you cannot the mean cannot pass the cannot be greater than the max so that makes sense let's reduce this reduce that see from negative four to four see we take all the values well that's it see there we go we could go from negative one to two that's it now one great thing is also the fact that we could modify this and have ints so let's run this and see what we get you see we have only integers now now let's say we had norm let's run that you see we have an arrow most specify a max vowel in the case where the data type is integer so this actually um reading out here in the documentation so you can see here now that you need to have this max vowel so let's change that to let's say 1000 so you get values ranging between zero and 1000 so we have five values here from the shape we could change this let's say five by five and then all this values range between zero and 1000 another argument which we have been spoken of so far is this seed argument and here we're told that this seed argument when used in combination with tensor flow random set seed we'll be able to create a reproducible sequence of tensors across multiple calls and so in cases where we want to produce reproducible experiments we want to set we want to set this global seed value and also the seed argument value in this uniform or let's say normal function so let's go ahead and copy this out we get back to the code there we go let's just run this right here let's take off some parts and print this out so we print this out then we modify the shape and the max values so we have exactly the same value for the seed year which is 10 and then here we set this global seed to five let's run this you see it outputs 4 3 1 4 3 2 1 1 1 and 1 3 3 now let's create another cell um let's copy this there we go paste this out here and you see we have the same global seed um and then we have the seed of 10 now let's run this again and as you could see we have exactly the same output as this one see 4 3 1 4 3 1 4 3 2 4 3 2 1 1 1 1 1 and 1 3 3 now obviously modifying this and taking for example one gives us something different but if you take 10 again you see you should have exact same output now you also see that here when we take this off if we take this off and we run this cell you see we have different outputs but when we get back here let's let's run this again see we have another set of different outputs but when we take this run that you see we have exact same output we had already and now if you want to have more details about certain seeds you could always check out this um the documentation now we're going to move on to tensor indexing so let's take this example we have this tensor to be indexed and we declare this we have the state of the constant and this 1d tensor we pin this out this is what we get now supposing we want to get this first four elements to get this four four elements we need to consider these are the indexes so here we have the 0th index then here we have the first index here the second here the third and the fourth so we want the first four elements all we need to do is take out this first four elements by doing this so we have indexed and then we have the square brackets we start with the minimum index and then we have minimum index that's zero that's the 0th index which is this we go one two three four now here we specify four there we go and that's what we get we have this first four elements right here now you might have noticed that the fourth index is actually this year but then since we started from zero our third index happens to be the fourth value so if we're taking this from zero we have one two three four so y'all first four values now note that if you want to grow from the zeroth or from the first let's say we want to go from this index here six to 66 if you want to go from six to 66 what you will have to do is this we have that we take out zero we go start from the first zero one so we have this one right here the minimum and then want to go up to 66 then we'll have to go one two three four five now if you put this five right here you'll notice that it's not going to give you exactly what you want so let's run this and then we get that so see that we have here six two four six we actually have this instead of all this so how do we do to get all this in order to go from one index to say a maximum index what you need to do is to add this plus one so you want to go from the first index to the fifth index right here that's one two three four five want to go up to this index i need to include i need to put this plus one it's similar to with a tf.range read scene previously where in order to put out an tensor from let's say a range two to five what we needed to do was specify this from two to six so that it would be like two three four five so yeah what we did was add two five plus one so it's a similar kind of pattern that we're following right here so to go from this up to six to six we need to add plus one so let's run this now run that and there we go is we have exactly what we expected so let's take this off now and we have six so with this we see how we're able to slice out some parts of this array of this tensor from here we could also include steps so if we put out say two steps of two you will notice that we go from the first that's the first index this is the zero index so we'll go zero we start with this one we skip by two so we skip this element and then we go straight to four so notice how we go four and then from this four we skip this element and then we go to six to six so that's it so we could include the steps right here now by default we have a step of one so when we do this you see we have exactly what we do what we get when we don't put the steps at all so that's it we've seen that in general we have a minimum value right here and then we have a max value or rather a max index so yeah yeah we have so that's it in general we have a min index up to the max index plus one so generally this what we get now in the case where we don't specify this min index that's if we have this for example let's suppose we have zero this we don't specify any min index and then we go up to four then this is considered to be zero so you notice that this tool come up with the same answer we have exactly the same answer right here when you specify the minimum value say three and you don't specify the maximum we're just gonna go right up to the end so let's run this we'll see that we'll get zero one two three so we're gonna start from this and then go right up to the end so we'll have four six six six seven so that's it we have exactly as expected four six six six seven because we're starting from three and we're going right up to the end if we want to go from this first off from this minimum index right up to the last but one value that is want to go from this four right up to 66 what we could do is have our maximum value to be negative one so doing this you'll see that we have four six six six so we're going from this index this minimum index to the last but one value now let's take this to two see we have four six take this to three we just have four take this to four this should be an error from this point we'll see how indexing is done with tensors of dimension greater than two so we will start with a two-dimensional tensor which we had declared previously now for indexing to be done we'll start with a comma right here so we have this comma and then if we want to get the first the first three rows so suppose we want to get this first three rows want to get this first three rows and then in this first three rows we want to get just the first two columns so we want to get this colon and this colon in fact fact what we want to get is this one two three five one five this happens to be the first three rows and the first two columns so in order to do this we note as we said here we have this comma and then to this left side we have the rows and then to the right side we have the columns so what we do here is since we want to get the first three rows we're going to go zero one two so what we're gonna have here zero right up to three so we're gonna get zero one two obviously two plus one three so we have this so it's kind of like similar to what we've seen already here but with the difference that all this let's say all this gets to this left side and then for the columns are right here for this yeah for the columns want to get the first two columns so what we're gonna have here is zero two so we'll have that there we go and then let's run it so running that we say we have one two three five one five now let's suppose we want to get this first three rows like this and then want to get all the columns so basically here what we can do is this so just like you're getting all the elements we just had to do this that is with this column yeah we just have to put this column right here and then we're gonna go so let's run that and we should have one two zero three five negative one one five six so we've got in the first three rows and then all the columns now if you want to get a particular row so suppose I want to get just the third row that's want to get only this row right here we're gonna specify its index this is zero one two so here we have two so for the rows recall we have the left side representing the rows and the right side representing the columns so yeah what we're saying is we're getting the second index or this row the second index or the third row which is which happens to be this and therefore the columns to get in each and every column so let's run this we should have one five six now if we want to specify or if we want to get just some of these columns that is if we want to get maybe the this only this column right here we're gonna have two and then we specify the zeroth column so run that and we should have one now if we want to go ahead and take say one right up to the end you see we have five six because here we'll specify want to get a second row that's zero one two second row and then for the columns we're going from one this one because this is zero one two one right up to the end so that's it just like with the rows we could also specify or we could also get just a particular column so suppose we want to get this zeroth column right here which made of one three one two so to have that done we are taking all the rows so we're gonna have all the rows and then we will pick out just the zeroth column so let's run this and we have one three one two which is this column right here now let's take this from here and we'll see we have an error because we haven't specified how this rows are gonna be managed so here again let's say we want to do one two three so right here we have three one three one because we're making for the columns we're taking the zeroth color so we're taking all this and then for the rows we're going for one two three so zero one this our first this one two and then three meaning that we're stopping at two because stopping at three minus one so we're going we're taking this and then this and then since we're taking the zeroth column we just have three and one so that's how we get this so let's come back we have this and then if we want to pick out the first the column the first index we have two five five three now this year means we're picking up all the index indices so if I'm picking up all the indices what we could do is specify just this three dots three dots simply means picking up everything so we're picking up all the indexes or we're picking up all the rows and then we'd specify in the first column so running this should give exactly the same upwards we could see right here we now move to the 3d tensor so we have this 3d tensor right here let's take this off now we kind of like this kind of like very similar to what we've seen already with the 2d tensors but with the difference that we have the two commas now we're asking why we have this two commas recall that with three dimensions we have the first dimension the second dimension and the third dimension so first thing let's put this this way let's have this we have this each of this is a two by three shaped tensor so here we have two by three that's what's not a shape actually and then yeah we have one we have one two three four elements so we have four elements so this happens to be the shape of our 3d tensor right here now if I want to pick out just this first elements right here all I need to do is specify a zero and then I could take all this all this simply means that I've picked out this first index and then taking all the rows and all the columns so let's run this and you should get just this right here so we see we are 1 2 0 3 5 negative 1 so that's fine now if in this index that if after selecting this index I want to say get only this first row what I'll do is I'm gonna take 0 and then right here I'm gonna pick everything so I'll run it and I'll have 1 2 0 I have only this right here now if I want to get this last column what I'm gonna do is I'm gonna specify I want to get all the rows and then for the column I want to get the last column so there we go I have the last column 0 negative 1 because I'm turning from the end actually here now another way we could do this is by saying okay I have 0 1 2 so I'll take the second index run that and I should have the same answer so that's it now if I want to go from 0 to 2 so I'm gonna pick 0 to 2 means I'm picking 0 and 1 so I'm picking this 2 actually picking this 2 around that see I have 0 0 negative 1 2 all right I have 0 negative 1 0 2 the reason why I have this is because right here I have the 0 negative 1 and then yeah I have 0 2 so that's it the 0 negative 1 is because I have picked I've picked this 2 indices so 0 negative 1 is the last right here the last index or the last column pick out the last column and therefore this I also pick out its own last column so that's how this works we just have this commerce we separates this now another way I could do this is I could take all so doing all I should have exactly the same response so instead of having that I could just put out this three dots right here then here I could pick out also yeah I'm supposing I'm taking all this since we're picking up all this indices that's we're picking a 1 2 3 4 all this four elements we're picking up for each of these elements all the rows picking this all the rows and then for each of these we also pick up just the second column so if we yell picking up all those rows and just a second columns that will be left with 0 for this and negative 1 then for this we left with 0 2 this 0 0 there's 932 we now move on to tensorflow.math which we could see clearly here we now move on to tensorflow.math which we could see clearly here we'll be able to use all these math functions which are made available with tensorflow so as you could see we go right up to this zeter function so starting from the apps function this apps here clicking on it will have tf.math.apps and note that for each and every function you'll have the function definition and you will have an explanation so like you have this kind of example where the function has been applied and you'll be able to understand exactly how those functions are used so in case you want to understand how the A10 to function works just click on that you have this we click on A sign actually let's all let's just work with A sign in case we don't work with the A sign you would have this you have this example and then you get this explanation on all these arguments which get into this function like the X you have the name and then what it actually returns so you don't need to always figure out by yourself how this functions work you just basically need to make good use of this documentation now that said let's look at a couple of functions we have the apps function it's actually the absolute value so it gives absolute value of a tensor now to get an absolute value we simply pass in the tensor so like in this case where we define this X so X apps we have the X which is this tensor right here and then we could get an absolute value very easily so we have that we run it and you see we have the absolute value now what's an absolute value up to the values are defined as such if the number or the input is a negative so if the input is a negative it sends it to positive so it turns it into positive if the input is a positive it turns it into a positive so it remains the same or basically it remains the same so that said we could say for example TF that apps of say TF that constant 0.2 if you run that you'll see you still have the same now if we tend this to negative 0.2 you would have still 0.2 so it takes it from a negative number to a positive number or basically we can say that the absolute value of X is negative X when X is less than 0 and it's equals X when X is greater than 0 so if we have a positive number it remains the same and if we have a negative number you attach a negative sign and obviously a negative with a negative would turn this to a positive so that's why when you have this negative 0.2 here you have negative of negative 0.2 it turns into 0.2 so that's it for the absolute value function you could also check out when we have the complex number that is if a tensor if we have a complex tensor you could check out this absolute value function right here the inputs of this absolute value function can also be complex numbers now if we have a complex number say a plus bg then its absolute value is computed as a squared plus b squared all of that square root so if we have negative two point two five plus four point seven five j as see it looks exactly same as a plus bj right here then we'll have the absolute value or get absolute value by doing negative two point two five square plus or rather the square root of negative two point two five square plus four point seven five square that's that let's take this off let's just have one so if we have just this one right here okay now we run this xops xops complex complex and then yeah we have xops complex to complex okay so let's run this and you get this output five point two five now let's do a squared as let's take this if we have the square root we'll be using the square root method so we have to have the square root of negative two point two five square notice how the this pops up right here with the definition nice with the documentation of the square root method so we have a square root text in this input and then name so basically we just find the square root of all the elements we have in our tensor so that's it we have negative two point two five plus four point seven five square so four point seven five square so that's it we completed square root and we see that we do not have the same answer that's because we didn't put this in a bracket so there we go we run this again and we see we have exactly the same answer so we now take on the addition so yeah we look at this addition function this can be seen as element wise addition so supposing we have x1 and then x2 tf.constant we have c7 6 to 6 7 and 11 the type a specified a d type tf.int 32 we could simply do tf.add x1 and x2 so let's run this we'll get our output it's basically the addition of all this element so it's an element wise addition now you could check out other methods like multiply so you could have multiply multiply we run that c5 times 735 and so on and so forth you could check out subtract so we could have you subtract and you have your answers let's check out divide divide there we go run that there we go five divided by seven to rewrite by six and so on and so forth then we also have other interesting methods like this divide no none so in the case where you divide into values and you expect it to have a man like when you divide him by zero expected to have a man that's not a number this method takes care of that exception for you so let's come in here and instead of zero run that again you see we have this now let's of this infinite infinity so let's now say no man run that we have this error because this is actually a math so you should have that we have here that tensors of type interior two are not allowed so let's take for example a float so let's change this we have float float there we go run that again and it's fine so you see that when you have this like if you have it a problem where you just want to have this NAND or this output where it's meant to have infinity give you a zero then you could use this method right here now note that it's not every time that while we're doing this element wise operations that we have the two tensors being of the same shape so that said we could modify this so let's take this to be just seven right here let's take back the add so we have add and around this again you will notice that we were able to do this addition but with a difference from the others where the two tensors had the same shape in that yeah we have this seven which adds up to each and every value we have right here so seven plus five twelve seven plus three ten and so on and so forth now this is what we call broadcasting in broadcasting we have this smaller tensor which is stretched out match the shape of this bigger tensor so that the patient could be carried out so in fact what's going on here is you have this X2 right here which has been stretched out so we can see X2 stretched stretched and then what we actually have is this so finally when we carry out the operation we have this seven let's make sure we have this values right seven and then we have this last seven so basically this is what we have now let's in this out let's copy this and then we have X2 stretched that's that's it now we run it and we see that we have exactly the same output so if I why you put this tensor flow does broadcasting and this tensor right here is being stretched out like this and then the operation is carried out so we could test this out with multiplication let's modify this supposing we have this right here we have that say five forty five so on and so forth okay so you have X2 and then we have X1 let us put out let's do run with the addition let's take off the shape run that see give the same answer we go to multiplication multiply multiply there we go run it again we see we have this element wise multiplication now another slightly more complicated way of looking at this is we have a shape or when we have a tensor where one of the indices has a length of one so if we take this off let's suppose it would take let's take this what I want off so if you take this one off right here you see we have a shape of 1 by 6 now let's define X2 differently let's take off the X2 stretch from here so let's define X2 differently we have in this case instead let's define a 3 by 1 tensor so would have this we have 5 we have 3 so there we go run that let's take this one off let's print out the shapes actually let's print out X1 that shape print out X2 that shape there we go we have this is X1 1 by 6 X2 3 by 1 and then the output of the element wise multiplication is this matrix we have here so now let's explain how we get this matrix yeah if we write this tensor right here would have this it has just one row so we have one row and then six columns as you can see here and then this other where we have three rows and one column just as one column three rows one two three now that said when we want to carry out our operation here we're taking the incarceration the fact that we have three rows right here so what goes on is this one row is stretched out so it's stretched out so that we now have three rows to match up with this three rows we have in this X2 so that said we have now five three and finally six right here so there we go we now have this new tensor that's it and then for this you should guess it right we are gonna stretch this out such that the number of columns match with the number of columns of this so basically we're just gonna rewrite this six times there we go three right here we're reading this six times or stretch this out and now we could carry out our multiplication operation so let's take for example this element we have this six times five right here we should have 30 so that's why we have 30 here and if we take 4 times 5 20 that's why you have 20 here if you take this 4 by 3 that's why we have 12 right here so we have that as a rule of thumb both should have one dimension of length 1 as you could see here and then the other dimensions are stretched so as to match one another our next method will be the TF the maximum method so there we go we have TF the maximum which gives us the max of two tensors or better set the element wise maximum of two tensors now this should be the same with a minimum which returns the element wise mean of two tensors so that said if we have this two tensors X and Y you see that we're gonna have here negative five so negative five zero negative two the mean is negative two negative two zero zero means zero zero three means zero so that's it as you could see here this suppose the broadcast semantics that the broadcasting we've just seen so that's it for the maximum and minimum we also have the act max and act mean which we're gonna see shortly so let's go we have the act max okay that's it now what I max is a slightly different syntax yeah we have an input and then we specify the axis so let's copy this B right here let's copy this tensor is this out there we go okay we have this B oh we have this egg egg arc X arc max for example so we have this tensor let's rewrite this we have this tensor here and then we let's print out X arc max that shape bring out a shape okay now we want to get the arc max but before looking at the arc max of this tensor let's look at a simpler tensor so let's take something simpler let's have this let's say we have yeah let's close this okay so we have this let's print out X max we have TF TM the mat that arc max this and then we have that we have it okay so let's run that we see that we have a value of 2 now why do we have this value of 2 why do we like with this matrix given to us have this so here now what if we modify some of these values say let's take this to be 200 let's take this to be 120 stay 130 and 0 3 run it again oh we have this run it again we see that now this value changes to 0 why does it change to 0 now if you notice previously like let's copy this again this is on here let's take now this or let's take this to that okay you notice that with this the position of the maximum value is the zeroth index now if we have here this is zeroth index this is our first index this is our second index your third and then your fourth so where is the maximum value here you guessed it right is 200 where does it fall zeroth index and so the output is zero let's take some example where does the maximum value fall it falls here whereas the index 0 1 2 3 and that's why we have this so actually the arc max is different from a max and it says that yeah we're actually looking for the index or the position of the maximum value so that's it if one I have the mean we just have to change this to mean so we could have mean run that you will see we should get this year so we have 0 1 2 3 that's it there's a minimum value this position is the third position now if it changes to mean 2 we have that we should have this last position 0 1 2 3 4 so that's it now when we did it with multi-dimensional tensors like this one right here we can specify and exist so let's do this we have we print out TF the math the arc max of X marks or X arc max and that will specify the zeroth axis now we'll look at the output and we explain why we will have the output so here we have 2 2 0 2 2 now let's understand why we have this output given to us notice that right here we have fixed the axis to zero and so that said we are actually fixing the rows so we have a row we have this three rows or let's take this we have this one row we have this row and then we have this row and then we're doing comparisons on the each element or the corresponding elements on each row so it turns out now to be the column so yeah we compare this two that's on this row we compare this two with this three with this 14 and then yeah we compare this yeah we compare this yeah we compare this let's take that off yeah we compare this 3 1 5 yeah we compare 6 8 27 so that's it so when you fix the axis to zero it simply means comparison is done on the other axis so in this case that other axis is a column so we're doing comparisons on each and every column now what is the maximum value here the maximum value is 14 where does it fall in this in the tensor if we have this tensor to 3 14 we see clearly its position if the axis we have the zero position for its position second position so we see that it falls on the second position that's why here you have this too now if we look at this maximum is this falls in the second position that's why you have to we'll continue to this our maximum year is 30 falls on the 0th position if you want to look at it clearly we have 30 extract that we have 30 16 and 23 our maximum false year the 0th index 0 1 2 so that's why you have 0 and then for the next you have 5 and then yeah we have 27 that's why we have this 2 2 0 2 2 now let's take this off and then change this to mean so change that to mean so we have a different angle we have here 0 our minimum is this next is 1 our minimum is this next our minimum is this one minimum is this one next we have this okay so that's it now let's modify this axis so we change axis now we will be working on each row so we'll be doing comparisons on this row comparisons on this and then comparison this so if understood what we've seen previously when axis was equals 0 and we had axis equals 0 then you should be able to pause at this point and then try to solve that for when axis equals 1 so that said let's explain how this works we're gonna have comparisons with this so the maximum years or the minimum since we'll change this to mean our minimum year will be 2 year our minimum is 1 your minimum is 5 so that said we're gonna have 0 since axis position is 0 or we are the 0 index and then yeah we add 0 1 2 3 3 and then yeah we add 0 1 2 3 so we're gonna have 0 3 3 so let's take that off run it and we have 0 3 3 as expected you have other functions right here we have the equal function which could be used to compare tensors so we could compare to tensors and in the case where each and every element is the same we're gonna have this output boolean tensor so you see d type bool and then here we have true true the reason why we have true true year is because this element 2 is the same as this and here 4 it's almost 4 now we see again broadcasting year where we have 2 4 compared with 2 2 now if you have a tool to broadcast it to match the shape so this 2 turns into 2 2 now the first 2 compares with this 2 it is true and the next 2 compares with false with 4 is not the same so it turns into false so that's how we look at this we have the documentation here we could look at other methods we have this power method right here we have yeah we have the power method yeah we basically to get the power or to raise all these elements we're given power let's copy this so we better understand that this is our right here scroll down okay so we have this X we have the Y and then let's understand this power now basically what you understand here is let's reduce this to 0 and then here 1 okay so as we're saying stick this of force we should have 81 there so while we have it you guys are taking 2 to the power of 3 2 to the power of 0 3 to the power of 1 3 to the power of 4 so if we run this this off run this you see clearly we have 2 to the power of 3 is 8 2 to the power of 0 is 1 3 to the power of 1 3 3 to the power of 4 81 so that's how it works now we could have TF the power sorry power and then TF that constant of C2 and then TF that constant constant of say 3 so we have that run it there we go we have 8 on next we will be just reduces we have here the reduce all reduce any reduce Euclidean reduce max mean prod STD some variance let's start with a reduced sum which is kind of like the most popular of all this so we have the reduced sum what goes in here is it competes it takes in the input tensor exists keep that keep dimension set to false by default name known so it computes the sum of elements across dimensions of a tensor so again here we're specifying the axis and we're going to sum through that all the elements in that axis so let's look at some examples this is our right here so we have this input tensor let's copy let's say we have let's work with one of the tensors we had declared previously so let's have this tensor to the right here we have tensor 2d we have it here okay tensor 2d there we go now we want to print this out print of this we have that and then in here we have our tensor 2d tensor 2d there we go we specify the axis let's let it let it let's first run this and see what it gives us its output so we told we have an error oh we didn't close this run that again okay so we have this year first of all this is this 2d you created in the hour let's run this okay so now we have this value of 35 how do we have this value of 35 when you don't specify the axis what goes on yours we just add up all elements so we reduce our elements I will reduce the tensor and then the reduction is done by summing all the elements which make up this tensor 1 plus 2 plus 0 is 3 3 6 11 10 11 16 plus 6 22 24 27 and 35 so that's how we get all this already that's how we get this by summing up all this so this will reduce sum that said we could do a reduce max so we just max there we go we have 8 so what goes on here is we're reducing all these values into just its maximum value so we see with the sum we adding up everything with a max we're just looking at a maximum value so if it changes to let's say we take this to 100 you see you get now a hundred now if you look at reduce mean you run that it should have zero so that's it let's take negative here run it okay we have negative too so that's it now let's go back to our reduce sum and then let's specify the axis let's say the axis is zero let's print back the shape of this tensor to the tensor to the that shape let's run that so we get a shape okay it's 4 by 3 now let's okay we specify the axis zero now we have 7 11 4 or 140 now why do we have this specifying this axis zero means we fixing the axis our way like taking a fixed position on this axis zero and then we paint around with the other axis which is the axis one and which happens with a column because we have the rows axis zero and then we have the columns axis one so what goes on here is we now doing this we comparing already we summing up we reducing this into one we reduce this and then we reduce this so one plus three plus one plus five plus two seven so that's how we have this negative two plus all of that eleven if you add all of this you have 140 and so that's how we get this by when we fix this axis to zero now let's fix the axis to one you'll see that we'll fix the columns and we paint around with the rows so yeah we compare this all right I will sum this up summing this up we have negative one this should be a hundred and eight this should be 12 and so on and so forth so let's run it and we see that we have negative one hundred eight twelve thirteen and that's it so that's how we look at this now we could go back to max reduce max that's it we could go to reduce mean see zero thirty six four fourteen now the mean is like the average value so let's take this axis oh let's let's do that zero run it and we have that now what's the main of this when comparing this we have one plus three four plus one five plus two seven seven divided by four is one point seven five but since we didn't with integers we just have one so let's change this d type let's say we want to have a d type of TF the float 16 so we run that and there we go we have one point seven five so that's it we just calculate in the moon of each on each of these columns because we've fixed we've said to fix the rows and then we paint around with each and every column now we could change this to standard deviation standard deviation we have that zero point eight to nine two point eight six forty one point three eight so that said we now look at the keep deems now take a note look at this shape right here we just have this one dimension tensor here now when we do this to won't send this to true we run that we see that distance or two dimensional tensor with the same elements but we've added this extra axis of extra dimension right here and you could actually get to see this explanation from this so that's it we've looked at the reduces or you reduce max mean STD pro variance none of that now we continue our journey through our mat functions we have this sigmoid which is one of the very popular method the sigmoid we're gonna see this is basically this formula is this guy this function so y equals it takes in X and then what it does it takes the exponential of negative X 1 plus exponential and then 1 divided by 1 plus exponential so basically this is this function and then when we run the sigma we just take each and every element and then pass into this function our next method will be this top k mat function with a top key what we're doing here is we're taking as input our tensor and then we're taking the top or the top say if K equals 2 we're taking the top two values just like in the class of a hundred students and then you want to get the top 10 students oh it's kind of like the same function so that's it we have that let's see how that works in the notebook so yeah we have top K right at TF the mat that top K and let's just take in this tensor 2d right here so we have the stands are 2d we run that you get this output note that this output comes out or with two values so we have first of all this first part of the couple and this second part of the couple the second part has to do with it indices that is the positions of this top K values so by default as we saw the K is equals 1 so by default we have hopes by default we have K equals 1 so that's what we have by default so run that we should have the same answer so that said let's come slide this up okay so what we're seeing here is we going through each in every row so we're going to each in every row we look what's the top value right here is one what's its position zero so we have one year position zero so our value has a hundred position two top value is six position two top value is eight position two so that's how we get these values now let's say we want to take top two top two we see we have the top two values one and zero we have their positions 105 positions six and five that's it eight and three as you can see that's it and by default this is sorted so you don't need to bother about sorting now you could change this to false in case you don't want to do or you don't want to have it necessarily sorted let's now go ahead and look at linear algebra of patients so here we have TF dot linoc good right here we have TF dot linoc and we have all those different linear algebra of patients which come with tensor flow let's look at the matrix multiplication giving yours matmul so TF dot linoc dot matmul what does it do multiplies a matrix a by matrix B producing a times B now note that this is different from the multiply the TF dot mat the multiplier we had seen previously as the TF dot mat the multiplier was an element wise multiplication whereas here we're working with a matrix multiplication so right here we pisses out we have TF dot linoc dot matmul and then let's go ahead and define this two matrices X1 TF dot constant and then X2 so we have X2 just we copy this and then we have that now we have our X1 X2 and then we want to find a matrix multiplication of the two matrices so it will piss here X1 and then we have X2 for now we have all this default values so that's fine now let's run this and then see what we get now what do we have we have an arrow which is normal we've been told that there is a mismatch now why is there this mismatch if we check out the shapes like you know the shapes of X1 and X2 X1 X2 shape you would see that both have shapes 2 by 3 2 by 3 now if you want to do matrix multiplication then this isn't possible since in order for matrix multiplication to be valid the total number of columns we have here must match the number of rows we have here so since there's number of columns which is equals 3 zero from this number of rows there is not it's not possible for us to carry out the matrix multiplication operation so that said let's modify this we have let's now have a 3 by 3 the shape matrix so we would have this for that and then we run it see that works so this works now because this 3 that the number of columns here equals number of rows right here now let's even change this that's let's increase the number of columns if increases number of columns we have this oh we're increasing the number of rows by doing this so let's go back okay let's increase the number of columns to do that we need to do this to let's take two let's take zero so let's run that we see it's still valid because yeah we still have this equal this that is the number of columns here equals number of rows here so since it matches we could still work that out now once you take off so if we take this off you see running this now we still have the mismatch because here we have this to be driven for one let's take one step back we run that and that's what we have now also note that the output of this matrix multiplication has a shape which is dependent on inputs so if you notice the output has a shape 2 by 4 and this 2 here is going from this number of rows and then the foyer is going from this number of columns and that said if now we instead have a 1 by 3 that is if we have 1 by 3 times 3 by 4 our output will be 1 by 4 so let's run this you see you have 1 by 3 3 by 4 our output is 1 by 4 and here's what we get now if you know the voice with this linear algebra terminologies you could check on our linear algebra course available on our platform neural learn AI so you could check on this yeah you will have this course on all this section on matrix algebra you will have matrix multiplication right here and you get to understand all this in depth so you could check on this preview you see how yeah we explained step-by-step with some class exercises how all this works another way you could compute the matrix multiplication of the x1 and x2 is by having this so we could say TF or rather we could simply have x1 right here at x2 so this is matrix multiplication while this is element wise multiplication so we have that add run it we should have exactly the same answer we got here so there we go we have exactly this same answer another very common matrix operation is that of the matrix constables so yeah we'll look at how to do the transpose of the matrix just you just put this dot capital T you run that another way you could compute the matrix multiplication of the x1 and x2 is by simply doing x1 at x2 so we run this you see we have exactly the same response we got from here now we move to another linear algebra operation or matrix operation which is that of the matrix transpose so let's compare the matrix transpose we just have TF the transpose TF the transpose and then we pass in our matrix so we have passed in x1 we run that you will notice that the rows becomes the columns and and you'll notice how rows become columns and columns become rows so let's go back to our x1 let's print out x1 side here x1 there we go we have this x1 right here we see how the shape is 1 by 3 and that of the transpose is 3 by 1 it still contains exactly the same numbers but now this matrix that's the initial x1 has just one row you could see one row whereas here we have three rows so this becomes this this two becomes this this zero becomes this now let's take this now with x2 so we run x2 or rather let's let's bring out x2 first so we have x2 right here now notice how this one two zero that's this first row 1 2 0 2 becomes our first column this second row becomes our second column and this third row 5 4 4 5 6 0 becomes our third column so we have a year 4 5 6 0 so this is basically the transpose operation now how is that related to this matrix multiplication you may have noticed here we have transpose a false and transpose B false so in the case you were trying to multiply x1 by say x2 transpose you just have to put B true recall that x2 is B so because initially we had air I am B so if we multiply this or rather change this to true now you be multiplying x1 by x2 transpose let's run this and we should have an error this is normal why because we're multiplying x1 by x2 transpose now what is extra transpose let us let us say we have TF dot transpose transpose x2 so there's a transpose and then we get a ship so we run that and we see that we have 1 by 3 times 4 by 3 we see that this 3 is not equal to 4 so the number of columns here is not equal to number of rows right here and so we cannot do the matrix multiplication so that set that's not valid so let's modify this let's say we take let's create x3 so we could have x3 which has four columns so we could have x3 with four columns let's change this x3 and then yeah let's say we have this just two okay so we have that we have that we'll print out x3 so we print out x3 shape x3 that shape there we go we run that we have x1 x2 x3 x4 now notice how so it is x2 so this is x2 transpose now because this is three four three four tens of four three so it's normal so we have x3 what we're trying to find here is x3 at x3 matrix multiplied by x2 transpose so matrix multiplied by the transpose of x2 that's it so let's in this out in the out see if that's valid so we have in this error TF TF the transpose so there we go we see that we are having an output right here which is two by three that's because this far is the same as this four and then our up but will take number of rows two and then this number of columns here to read so that's it so it kind of like takes the this output this outer axis that we have here so that said we have in our transpose of x2 and then we multiply x3 by 8 now how do we have this exact same response without necessarily having to say for example TF the transpose x2 all we need to do is just specify that our x2 is going to be transposed and here we have x3 so we have x3 and then x2 so it's just like we multiply x3 by x2 but we're saying that our x2 is going to be transposed so that's it we run that and we should have exactly the same response so that's it we have exactly the same response is doing this is the same as saying x3 times or actually matrix multiplied by x2 transpose so we'll send transpose B is true now if you send this to true this should let's check out what is gonna be shown we have an x3 x3 shape is 2 4 so if we have the transpose of x3 is gonna give us 4 2 times the transpose of x2 x2 is actually let's check actually x is actually 3 4 the transpose is 4 3 so we have 4 2 by 4 3 it's not gonna work so when this they shouldn't work we're having 4 2 by 4 3 now if we change this because we have an x2 transpose let's say x2 transpose is 4 2 and then x3 transpose is 4 3 so for this to match we must modify this so if we change x3 to a 3 by 2 matrix let's have that rather yes actually x3 so this x3 and this x2 so let's modify x2 so that we have a 3 by 2 matrix and if we modify x2 to be a 3 by 2 matrix then x2 transpose will be a 2 by 3 matrix and if we have 2 right here this matrix multiplication will match so let's go ahead and change this into a 3 by 2 matrix 3 color 3 rows and then 2 so let's take this off take this off take this off that's fine okay so x2 now is 3 by 2 and x2 transpose is 2 by 3 so they should match let's comment this out that's fine and run it so yeah we have in an error okay the error is coming from the fact that yeah we need to put a transpose so initially have TF the transpose because now we're doing x3 transpose times x2 transpose so we're gonna have this TF the transpose here that's fine so let's run it now and everything should work fine see we have exactly the same response because we actually do an x3 transpose by x2 transpose this shapes here now match now the way we look at this next arguments here does the adjoints is exactly the same as for the transpose and I join it I joined in a matrix is another operation which will not get into the details but just note that specifying this or saying that I join of a is true for example is the same as just saying we're taking the adjoint of x3 and multiplying by x2 if we say if this is false and then here is true so if we have this then what we're saying is with multiplying x3 by the adjoint of x2 so it's the same as or similar to the way we treat the transpose arguments right here let's now look at how to multiply matrices with greater than two dimensions so let's go ahead and take this tensor 3d we had defined previously we have tensor 3d right here we just put it out down right here we have tensor 3d we've now defined this to three dimension tenses x1 and then x2 so there we go we run this and this is what we get so we have this two three dimension tensors and we want to do matrix multiplication so we could simply have TF the mat model or TF the lean log in your algebra the mat model we have that and then we specify x1 and specify x2 running this should produce an error this is normal again because there is no matching the shapes now how do we look at this now the way we look at this is quite straightforward what we do is when having these kinds of matrix multiplication where matrix multiplication is done in batches what we do is look at each and every batch and look at its corresponding batch in the other matrix and then do the matrix multiplication so what we have here is we're multiplying one this matrix here this 2d matrix by this matrix now if you have to multiply this matrix here by this matrix and then later on this matrix by this matrix and then this matrix by this matrix now you must make sure that there is a matching in the shapes now this matrix right here is 2 by 3 and this is 2 by 3 we've seen already that 2 by 3 multiplied with 2 by 3 wouldn't work because the number of columns here is 3 and number of rows here is 2 which don't match so what we need it do now is we could modify this tensor such that each of this matrices right here have a shape which will match with this so that we could do the matrix multiplication so let's change this now into 2 by 2 matrices so instead of this take this off take that off 2 by 2 comma and finally here we have this comma and that's it so now we have X1 which is 2 by 2 and then X2 which is 2 by 3 2 by 2 2 by 3 will output as 2 by 2 and then 2 by 3 will give us an output which is 2 by 3 so it's going to be 2 by 2v output now let's take that off and then run this while we run that we see this output let's take this off let's take X1 X2 off so we take that off run it again so we could see clearly our output now notice how what goes on here is we have this matrix multiply with this this with this so on and so forth now if you want to check this let's go ahead and recreate this right here so what we're gonna do now is we're gonna just take let's take for some this middle values here off take this off take this off there we go we have this okay so that's fine so now we have this okay we have this to we have this matrix does this to the tensor and then let's take this middle value because we chose the middle so let's take this take this off off and then yeah we take this off okay so we have X1 and X2 and then we're trying to print this out run this we have 10 20 20 11 8 6 so taking the middle values even look at this carefully you see we have exactly the same response so this confirms the fact that while we're doing what we call batch multiplications where a batch is considered to be one of this indexes in the zeroth axis so we have this zeroth axis right here we have three indexes so each and every one of those indexes is considered to be a batch so when we're doing this kind of batch multiplications we just take each and every one and multiply with its corresponding value in the other metrics from here we explain why we are how we use this sparse or this be sparse on this a's bars arguments that we have right here now the way this works is sometimes we have matrices which are full of zeros so we may have a matrix like this let's modify this and then here we have zero here we have a zero yeah we have zero and then yeah we could have a zero zero and that's it so we could have matrices like this which are made of mostly zeros and so what happens is tensorflow has a way of optimizing competitions involving tensors which are mostly made of zeros this type of tensors are known as sparse tensors so when you specify that tensor a particular tensor is sparse tensorflow takes that into consideration when carrying out a computation and this helped us carry out computations even faster since tensorflow now knows that this matrix or the particular matrix or particular tensor is made of mostly zeros so that's it we could check on other methods like the adjoint method which we spoke of previously this is adjoint method right here I mentioned to get the adjoint of a matrix very easily let's now look at the band part method and this band part method we are actually rewriting the tensor but setting some values to zero based on certain conditions so here we have an input we have a num lower and a num upper we have the name as usual so right here we have an input which could be of key dimensions as is defined right here and then we have these conditions so we have this indicator function which is say in band and with those conditions and then we take this in band and multiply by the input matrix right here to finally get the output so let's look at this example right here we see that we have this input we have this which is passed into the band part method and then we have this output you'll notice that this output looks similar to the input what a difference that at some positions you have zeros so this negative 3 is 10 to 0 negative 2 10 to 0 and this negative 2 you're 10 to 0 so that's how we look at it before looking at the special cases let's take an example so we understand exactly how this works note that here this indicator function is defined such that for each and every element of this new matrix in band because we're creating a new matrix in band such that each and every element is defined such that if this num lower num lower is passed in here so if for example like this num lower if the num lower is less than 0 or n minus n n minus n happens to be the position of each element in the matrix on the tensor so if n minus n is less than num lower less than 1 and if num upper is less than 0 or n minus m is less than or equals num upper then this in band will be such that when multiplied with this corresponding element in the input then that input remains the same but if this condition is not verified then that input turns to a 0 here we have this tensor this 2d tensor which was defined we have that condition which was given to us in the documentation and then what we do is we're gonna define this two matrices one is m minus n and the other one is n minus m notice how this tool have exactly the same shape as the input so what's going on right here what happens here is we have m first of all you have to understand that m is for the rows and then n for the columns so what goes on here is we have at this position we have the zeroth row zeroth column so m minus n that's 0 minus 0 is equal to 0 at this position we are at the zeroth row and the first we are not allowed to 0 minus 1 times so this position 0 row minus second column 0 minus n and we have our position so we have this row 0 negative 0 let's call it 0 1 2 0 0 1 row here finally 9 second row alright we add all first row in this column so we have one row and here we add first just row second column 1 minus 2 negative 1 and then we continue with this and we'll get all these values so this is how we get m minus n we're simply taking the index of the row index minus the column index now we look at n minus m for n minus m we're looking at the column index minus the row index so yeah we are the zeroth row zeroth column 0 minus 0 0 here we add the first column and the zeroth row so 1 minus 0 is 1 here we add the second column and the zeroth row so 2 minus 0 is 2 so that's how we get this n minus m matrix right here now once we've got an m minus n matrix and n minus m matrix we can now see how to get the output very easily so we have this output for now we replace it by this axis and we are gonna get for each and every element its exact value after going through this band part method right here since we have as input our lower to be 0 and our upper to be 0 we could take this condition off because our lower will never be less than 0 so we could take that off so we could have this out there we go and now we'll be focusing on n minus n so at this position n minus n which is 0 is actually less than or equal is actually equal the lower so our lower is 0 and our upper is 0 so we have n minus n here is it it's actually less than or equals our lower so we have that to be true and then n minus m which is 0 is actually less than or equals our upper since our upper is 0 so that's true so since all this is true we have these value at this position that's the zeroth row zeroth column position that's when m equals 0 this is where m equals 0 and n equals 0 so at this position where m equals 0 and n equals 0 this is gonna be maintained so this one is gonna be maintained so that's how it gets output maintained now we move to this next one year so in this next we are having m to be equals 0 and then n moves to 1 while the first column we come again and check here m minus n we have a negative 1 negative 1 is it less than or equals our lower yes that's true it's less than or equals 0 and then n minus m n is 1 m is 0 that is 1 is 1 less than or equals upper no that's not true 1 is not less than or equals 0 so this condition is not fulfilled since this is not true so both must be true so since this is not true this value this negative 2 is 10 to a 0 so here we have a 0 now we'll move to the next we do the same here we have a 0 now what choice we're always gonna have a 0 because if the 0 is maintained we have a 0 if we turn to a 0 we still have a 0 so no need chicken on this now we'll go to the next we have to read for this we are at m equals that's m equals 1 and then n equals 2 no n equals yeah we at m equals 1 and then n equals 0 sorry so we have n equals 0 now we have n equals 0 we could compute this m minus n 1 minus 0 is 1 but 1 is not less than 0 or it's not less than or equals 0 so this is not true so this 3 turns to a 0 so yeah we will have a 0 that's how we get the 0 right here now we go to this 5 we have m equals 1 and n equals 1 in this case m minus n is equal to 0 which is less than or equals 0 so that's this is true this here is true and then n minus m is 0 this is also true so we maintain this value so this turns to a 5 when we get this we have m equals 1 in this case and then n equals 2 so we have 1 right here and then we have 2 okay n minus n we have 1 minus 2 negative 2 negative 2 is actually less than 0 so this is valid and then yeah we have n minus m 2 minus 1 1 1 is not less than 0 so this is not valid that's sad since this is not valid this 100 turns to a 0 so that's it you could just repeat this and what you should have will be something like this so yeah you should have 0 and then yeah you should maintain this value and yeah you have 8 here you have 3 and then here you have 2 so that's how you get this oh yeah you should have 0 actually this is 0 here 0 now everywhere if you notice everywhere m is different from n you will never you will never have a situation where m is different from n and then m is less than equal to 0 and at the same time n minus m is less than equal to 0 so that's why all the other values of 0 except for this diagonal values so you see here 1 5 6 I maintained now let's test this so with this command that and then we run this so there we go as a response we get this comes to confirm what was said here where the give this useful special cases and we told that whenever we have this 0 0 that's got an input with 0 0 the output is going to be a diagonal matrix so diagonal tensor so right here we understand why we should have that diagonal now if you repeat the same process you should be able to see that 0 negative 1 we give upper triangular and then negative 1 0 would give the lower triangular now let's go ahead and see what this upper and lower triangular tensors mean so yeah let's take this to negative 1 we run that and what do we notice we notice that this matrix of the input that stands out to D does it here is actually maintained so 1 3 1 2 see everything's maintained except for the upper part now if you have a matrix like this and that you have this diagonal and that all this this is what we call the lower triangular part of the matrix and this is the upper triangular part of the matrix so as you could see here this upper triangular part of this matrix is all zeros and then if we take this to 0 so we have 0 negative 1 if you have 0 negative 1 and you run that you'll notice now that is instead the lower triangular part so here we have this lower triangular part this lower triangular part right here which is not made of all zeros so that said in case you want to get this lower triangular you just have to specify 0 negative 1 upper triangular you have negative 1 0 and then when I want to have a diagonal matrix you just need to specify 0 on both sides that's it for this band part method we look at other methods we have a color scheme decomposition so here we have a way of decomposing matrices we wouldn't get into that here we have the cross product this is completed pairwise cross product cross product of A and B so here you could complete a cross product of two input tensors here you could get a determinant of a tensor so we have an input you could get this determinant we have many other interesting methods here we have the inverse linac dot inf we have your tf dot linac dot inf tensor 2d we run this we should have an error so that's normal so this is normal because to obtain the inverse of a matrix that matrix must be a square matrix that is a number of rows must be equal the number of columns so that said let's recopy this and then put it here we have we really find this answer 2d take this off that's it and then we now run this again we told you I cannot find device for node and we given a set of data types which registered for the operation matrix inverse so this means most probably our data type isn't registered in this matrix inverse operation now we could search on stack overflow we click on this what we get here I think your value should be float as error says I have encountered the same problem while finding the determinant of a matrix and change the detail to float 32 and the polynomial solved so here we have a great way of taking for this kind of errors but nonetheless we understood already that this should most probably come from the data type we're working with so let's change this to float 32 as it was set by the stack overflow user and we have now this other error which corresponds with the error we expected so here we are getting this error and we told that let's take this off we told that oh for doing from three now this is normal because as we said the number of rows must be equal the number of columns for us to find or calculate the inverse of that matrix or the tensor so that said what we're gonna do here is if we take this off and create a 3 by 3 matrix what we have now will be the right answer so we see we found an inverse of this matrix and this inverse so let's let's put this in a variable as a tensor to the inverse so this inverse is such that when you multiply so let's run this when you multiply tensor or when you do a matrix multiplication not actually not an element multiplication element wise multiplication when you do a matrix multiplication of this tool you should obtain the identity matrix so here you have tensor 2d inverse around that and you see that you have the identity matrix so here you have 1 0 0 this is 0 very small number 1 0 this is 0 0 1 so here we have the identity matrix taking even in the documentation you see that we're given a list of accepted data types and the float 16 which we were using previously was in a month so you have to ensure the input data types are in this list we have this matrix transpose here similar to the T of the transpose we have seen already so that's our transpose we'll see the matrix small that's it we have the trace we have the singular value decomposition of a matrix so here you could obtain the matrices singular value decomposition this returns three outputs s u and v where s is a tensor of singular values u tends of left singular vectors v tends of right singular vectors so the SVD is a way of breaking up a matrix in a way that less important information contained in a matrix is eliminated so that said we could check this out right here we have SVD taking the tensor so we could simply just pass in SVD and then get the output so here we have SVD of tensor to the SVD of tensor to D there we go as we did not defined TF the Linux the SVD we run that and then we get this output so we have this three outputs now let's define this s u v equals that and then let's print out s so print out s first to see the singular values there we go we have the singular values we print out u we have tensor of left singular vectors and then we print out v to give us our tensor of right singular vectors so this is what we get right here so feel free to always look at this documentation and the more you use these methods the less time you would even need to always come back to documentation so just make sure you keep practicing so you get to master all those methods we now go ahead and look at a tf.ensom method in terms of flow you could have it as tf.ensom you could look at this documentation right here with some examples before diving deep into understanding how this operator works it's important to know that this operator uses or takes in arrays of all sorts of dimensions that is one the arrays to the arrays up to n the arrays so we'll start straight up with this example with two dimensional arrays I will see how the handsome operator can be used to replace the usual matrix multiplication by matrix here will simply meaning a 2d array and this example will suppose that we have this array A and then this other array B A is of dimension of shape 3 by 4 B of shape 4 by 5 in other for the matrix multiplication to be valid we have to ensure that the number of columns of A have to be exactly the same as the number of rows of B in that case we have to ensure that in generality this J has to be the same as this right here and that's the case for this example and so the matrix multiplication of A and B is valid and what plan this with this would give us C but also note that the shape of C depends on that of A and B as you could see here if we kind of merge these two shapes and then take off this same shapes does this column for the first matrix and then this row for the second matrix would have 3 by 5 so look at this outer values right here this 3 and this 5 and that's what gives us the shape of the output matrix C in general if you have ij jk then we should have an output ik so that's how we have this right here it's very important for you to take note of the shapes whenever you're working with an handsome operator okay that's understood let's now dive into the code we started by importing on pi and B we have that and then we have this two matrices A and B which we've defined already so let's command this for now we have the matrix A which is on the slides and we have the matrix B as you could see printing out this would give us A that shape now we'll print out B shape so we have A that shape B that shape gives us three four and then four five so three by four matrix and then a four by five matrix right here as B and that's it now let's look for the or let's find the matrix multiplication of a and B know that and non pi just suffice to have m beta mod so this matrix multiplication would as an A and then B so there we go we run that now we'll find this is the answer we get which is correct this is our C so we've got an RC now how can we replace this matrix multiplication with the any some operation in order to do that just take note of this syntax right here notice how we have first of all this screen right here so we have the string and what do we put in the string first we have this IG and then next we will follow this IG after the idea we have a comma and then we have the G key and then we have this arrow to I K now this can be read as a interacting with B to produce seed and the way we chosen this I J J K such that it matches up with the shapes of A and B and with a kind of operation we are dealing with given that with a matrix multiplication we have a so supposing it's I J and then if we have a B then we must ensure that the J we had here has to be the same as a J we have here so this is just like the row so either row J the column and yet the row and then you're the color so we have to make sure that the number of columns equal number of rows that's why you have the J and J right here and then since the output is such that it takes the the shape of the output is I K that's what we specify here so we've taken a B and then transform into a C now whatever you have before this transformation has to be passed obviously so that's why we have this I J so the comma here does that we have this I J match now with a and then this J K matches our would be so we've kind of like separated the two different array is to be passed and then we have this I see which is output we get that's why when we print this out now we're gonna have an output so this I K is output we see clearly that this gives the same answer as just doing the matrix multiplication well at this point you will be down like why do I need to do all this when I could just simply put in the matrix multiplication or NP dot mall a B and then I have my answer well that's a very common and normal question and if you follow to the end you'll see that in some examples are in many applications you find that working with the ASAM operator is gonna be easy than working with the usual non-pi functions or methods you used to working with so let's go ahead we've had that I will now understand the ASAM syntax let's go ahead and look at many other examples for next example will be using the ASAM operator to do an element wise multiplication let's take this example right here we have a that's the symmetrics a we have the symmetrics B obviously for LNY element wise multiplication we have to ensure that the two matrices A and B have the same shape so as you can see from this we know a shape and B shape we have 3 by 4 and then 3 by 4 let's comment this for now so we have this tool right here 3 by 4 3 by 4 let's go ahead the computer element wise multiplication the Hadamard multiplication we have that there we go we see that we have 2 times 2 gives us 4, 6 times 9, 54 and so on and so forth so just simply each element on this position multiplying with this corresponding or the corresponding element on the other matrix like if you just pick up this random we have 4 times 5 20 so that's the output here now in this case you see that if we have this matrix A right here of shape ij then B has to be of shape ij and then the output has to be of shape ij and so that's why you would see let's recommend this part right here you'll see that we have this input ij or the first input ij which is a the second input ij and then the output ij so the ASAM operator automatically sees this as a Hadamard operation so here we're gonna have the right output on that oh let's take this off and that's it so there we go we see that we have the same output as with the usual Hadamard operation or the usual element wise multiplication for next we go on to matrix transpose so at this point I'll urge you to like take a pause try to come up with a corresponding matrix or the corresponding ASAM operation for the matrix transpose before continuing with a video now hopefully you got it right you have this matrix A right here which is to be transpose to see we have 2 2 1 6 negative 2 5 and so on and so forth so basically we have a matrix which is of shape ij and has an output ji in this case would have say print the ASAM transpose of A is equals the prints of mp.ansam we have that mp.ansam of ij which is being transformed to ji and then we pass in the matrix A so let's take this off right here and then run that there we go we have exactly the same answer so yeah we pass in A we have this ij no need for any commerce that's if you have to multiply let's say we have a b c g so on and so forth then you have say ij jk k l l m yeah we have four right here and does it so now we have just one so we have that let's take this off again that's fine so hopefully you had this correct we now move on to working with three dimensional arrays in machine learning is very common to how to deal with this type of arrays let's suppose we have this array A which is a 3d array of shape 3 2 by 3 by 4 now let's see the way we can read this we have this old array A and then we have this two boxes in this two arrays this one and then this one so that's where we have this two dimension right here and now for each of these that's for each of these arrays we have 2d array in it that's for the first one that's for this first box right here we have this 2d array which is of shape can be that's 1 2 3 by 4 so that's why we have 2 and then by 3 by 4 so all those a three-dimensional array of shape 2 by 3 by 4 we could generalize this as B by I by J and then for B we have a 2 by 4 by 5 dimensional array now note that we have here that those B's could be taken as a batch size and this is because in general or many times in machine learning computations are done on batches of data so in this case we could have this as a single batch that's this box right here as a single batch and then we have this other one as another batch giving us two batches so we have the batch size equal to and then we may want to do matrix multiplication or batch matrix multiplication where want to multiply this matrix right here of this 2d array with this one and then get this corresponding output and then multiply this one with this one and then get this other corresponding output and we may not want to say use a for loop to say for each of this we multiply this down as they used to get output and so on and so forth we may just want to use a single operation which understands that this multiplication is done for every position in the batch also note that in this kinds of computations the batch dimension remains exactly the same we have here 2 2 2 while 3 4 multiply with 4 5 gives 3 5 so we still have this exact same value of B which is maintained everywhere now that's that let's go straight away into the coding we have this 3d array a right here and of this to 2d arrays the sort of one this one and then this other 2d array we could come in this part and then now we have to print out a that shape and then B that shape there we go Oh print that's fine so 2 by 3 by 4 2 by 4 by 5 now note that the matrix multiplication can permit us do this without any problems that is we could get the we could do the batch multiplications with the non-pise matrix multiplication as non-pise matrix multiplication understands that when data is placed in batches like this all we need to do is to ensure that each element in the batch multiplies the corresponding element in the batch in the other array now if we want to use this we could also use the N sum operator now the way we use the N sum operator here is by again specifying the shapes so we have B I J for A and then we have B J K for B and then for C we still have B I K notice how we have I J J K and I K while B remains the same so with this around this and we'll see that we have exactly the same answer for the two outputs that's when we will use the matrix multiplication and when we use the N sum operator and still up to this point we see that we could always we may not necessarily use the N sum operator now what if we want to sum up all the elements in a given array so yeah we'll look at another different way of applying or making use of this N sum operator so yeah we have this matrix or rather we have this 3d array a right here and then we want to sum up all the values in the array yeah we have the sum gives us a value of 72 and then if one of you is the N sum operator what we'll be doing is we have the shapes that's B I J put an arrow and then when you don't have any output while you're simply doing is you're summing up all the different possible values so that's what this signifies right here so it's just like saying we're summing up all the possible values then we put in or we make sure we have the right shape of a put right here notice I will do this you see we have an arrow so we'll correct that we have G on that and that's fine we could also sum up all elements in a given dimension or given row given column say we're working with a matrix here we're having this in sum let's just this okay so yeah we'll see that we have this matrix a shape IJ and now when we specify only one of this axis right here what we're doing is we're summing up all the elements in each column because the J is like the column so there's a row and then there's a column so we're summing up all the elements in each column and that's how we have this we see we have two for the first column we have two plus two plus one five six plus minus two plus five nine five plus two plus four eleven two plus three zero five so that's how we have that so the way we understand this is we just summing up all the elements in each column and then if we change this to I will summing up all the elements in each row around that you see that for this row we sum this sum this now we sum that now moving to a more practical example and this attention is all you need paper you will have yeah this formula right here where we have the query multiplied with the transpose of a key if this key has a shape of by size by sequence length of the key by a model size and the query shaped by size by sequence length of the query by a model size then we could define this non-py array query of shape 32 by 64 by 512 and key 32 by 128 by 512 and then what we will do now is define the query by transpose operation where we would have MP the in sum of this is considered to be say be the by size by cure by M and then this is considered to be say B by K by M so yeah we would have B cure M comma B K M which outputs B now at this point we have to be very careful we have cure M times K M transpose so this becomes B cure M times B MK and now when we multiply this cure M and MK what you have is cure K so yeah we have B cure K let's take it away again we have B cure M and B K M by dimension doesn't matter take that off so we have this now and then what we transpose is we have MK and when we multiply this we left with cure K so that's how we have that and then here we have as input cure and then key so let's run this and there we go we have this output which is of shape with verified shape which is of shape cure key we have your cure which is 64 by key which is 128 so we have 32 by 64 by 128 output right here our next and final example inspired from the reformer paper in this paper the others explain how the data could be breaking up into chunks and then the attempts of each other again you don't need to understand what's gone on that paper but it's just for you to get an idea of where this problem were to present comes from now supposing we have this matrix A of initial shape 2 by 4 by 8 all to keep them simple let's say 1 by 4 by 8 so we have a batch size of 1 we have a sequence length of 4 and a model size of 8 here you could see this one that's the only single batch we have here then we have four rows and then eight columns B2 has a shape 1 by 4 by 4 so we have four rows and then four columns and the paper we just saw they further break up this data into buckets so and so right here we have four different buckets now we see that this eight columns are broken up into separate four by two matrices here we have the four by two matrices we have four rows two columns four rows two columns and so on so forth now the same is done for B with a difference that would break this far up that would break the four columns up until four by one matrices where we have four columns and just all right at four rows and just one column in the paper there is a task which involves taking the transpose of this B and multiplying by a and in doing that we are in fact multiplying the corresponding batches and the corresponding buckets so you see that again here we have some sort of two fixed dimensions and then we have this dimension or this matrix here which is actually gonna be involved in the multiplication process and so if we have a defined as B C I J or the shape B C I J and B B C I K multiplying the transpose of B or getting the transpose of B would give us B C C B and C fixed and then I K turns to K I and A is still B C I J now we have K I times I J giving us K J so we have B C K J so if we have this 1 4 4 2 by 1 4 4 1 we have 1 4 1 2 and now with a Einstein operator if we have the a given as this here we have B C I J and this B C I K so let's just put in common so we have this B C I J B C I K we had seen already on the slides on that now we're trying to find B transpose by a so what we're gonna have here is nonpiler and sum of B C I K this year we have B transpose times a so we have B C I K comma B C I J which is gonna output B C now at this point we recall that we're gonna have the I K which is gonna be interchange to give us K I K I times I J gives us K J so when I have K J just as we had in the slides and then we pass in B and then pass in a so run that and then look at the ship that's it so we see that we're able to compute this very easily with an answer operator as compared to the usual map mall where we have to ensure that the matrix B is transposed correctly so yeah we're gonna have MP that transpose of B and then we have to ensure that we have zero one so zero one fixed and then we have three two because this is why we do the transposing and then we now passing a from this so we have that and then there we go we still have the same shape we could take this off you see that we have the same answer for both methods but this method looks clearly is cleaner than this one as if we have to go to say 5 or 5d array we'll just let's say for example we have six year and then we have six all we need to do here is to say for example D so put that D that way D and that's fine so let's run this again see the ship and that's fine but with this we have to say 0 1 2 and then we would have to ensure that yeah we have four and then yeah we have three for that to work and so this is an example showing how the answer operator comes to make our work clearer and easier to understand let's now look at the expand deems method with this method we get to add an extra axis to an input tensor and that extra axis has a length of one if we suppose we have this tensor for example tensor 3d then let's print out the shape 4 by 2 by 3 now what we could do is we could add an extra axis such that this tensor 3d now takes a shape of 1 by 4 by 2 by 3 that's that what we do is tf.expand deems and then we pass in tensor 3d tensor 3d that's it and we specify the axis so axis specified to be 0 we're gonna have this output shape so let's run that and then put out a shape here so that's it now let's take a simple example suppose we have X equals tf.constant and then we have this right here two three four five for example we could print out X the shape and then we could print out tf.expand deems of X and then specify the axis to be zero you will see that we're gonna have this shape 4 and then we have this output let's put this shape right here we have this output 1 by 4 notice how this tensor leaves from a 1d tensor into a 2d tensor still we have the same elements but we've added one extra axis right here this extra axis would have been added manually by doing this so if we had done this would have had that extra axis we run that you see we have one by four and now doing the expand deems we have one by one by four so now we'll have from 2d to 3d and then if we keep doing this you see how we'll leave again from 3d now to 4d so we have one by one by one by four so that's how we look at the expand deems method if we want to do this expansion or this addition of this extra dimension in a different axis we could specify that axis let's say the first axis axis equals one notice how we'll leave from 4 to 3 now to 4 1 to 3 so we kind of like fit in this year let's take this off kind of like fit in this year so when you say axis equals 0 is fitted right here x equals 1 is fitted here as this is called so be fitted here as equal to 3 fitted here as as equals 4 invalid so that's it let's take as equals 4 for example you see it's not valid you have 3 so now you have 4 by 2 by 3 by 1 from here we look at another method which is like the opposite of this expand deems method here we have the squeeze method with a squeeze method what you have is you taking this input specify the axis just like similar to the expand deems method but here we instead remove the dimension of size 1 from the shape of the tensor so let's get back to this we had around this and we had let's take this of X squeezed let's define X squeezed sorry X expanded X expanded is this that's from that we have X expanded 1 by 1 by 1 by 4 now let's do X squeeze X squeezed is equals TF the squeeze and then we pass in X expanded so firstly we're gonna specify the axis to be equals 0 and then we print out our X please so we have X squeezed right here X squeezed print it out and we should have this 1 by 1 by 4 so 1 axis has been taken off let's do this several times so here we have 4i range see let's do this twice copy this pieces out here there we go X squeezed X squeezed so we start by squeezing the first time and then we continue squeezing let's print out X squeeze here so here we have X squeezed that's it we run that and we should get this so there we go we've left from this 4d tensor to this 1d tensor coming back to this other example where we had the shape that is we have expanded such that this extra axis was added in the last position so yeah what we're gonna do is do a TF the squeeze with X which is an expanded and then specify the axis so now we're gonna specify the third axis because we want to squeeze out this third or last axis right here now before squeezing out that last axis let's take let's suppose we're trying to squeeze out this zero axis you see we're gonna squeeze out this because we expect the dimension or the length of this to be equals 1 before we could squeeze it so that said if we specify now 3 you see that way it's fine so now we have 4 by 2 by 3 record that this was X exp that shape was actually 4 by 2 by 3 by 1 so squeezing out now wait this works now because we specify the right axis which is of length 1 so that said we've looked at a squeeze and the expandings methods which have to do with modifying or reshaping tensors let's look at the reshape method so here we have our reshape method as usual we have the definition reshape the tensor and then we have some examples so right here we'll see how we're modifying or reshaping this 2 by 2 tensor into a 1d tensor before looking at that example let's take a look at how we could use the TF the reshape TF the reshape to modify the shape of this X expanded that is this X which produces the shape such that we have an output of 4 by 2 by 3 so there we go we have this we pass an X expand it and then we specify the shape we want to get us output or the new shape of our tensor so yeah we specify 4 by 2 by 3 we don't want to get a tensor so let's put that shape even here let's take this shape we run that and this is what we get we see that we have exactly the same output shape right here now let's change this remove this and then we compare these two tensors and we see we have exactly the same outputs so this tells us that what we do we did with a squeezing could be possibly done with a reshaping now let's take another example supposing we have this tensor X reshape we could actually reshape this as we could have TF the reshape and then pass in X reshape such that we modified our changes into 1d array so or 1d dancer so if you put this to save 6 we run that we should have an error so invalid syntax well this is different from the error we expected okay so this is the exact error we get to input to reshape the tensor with eight values but the requested shape has six so that said the fact that you could reshape doesn't mean you just be able to modify any tensor and then change that to just any type of shape you have to make sure that when we shaping the number of values that you are getting the initial tensor actually fit in this new shape of this new form you understand so to take so you have eight values and you don't expect to reshape this into six values so you have to put this eight so running that now you see how we've reshaped this and we have three five six six three five six six and then four six negative one two so it's kind of like we'll flatten out this matrix now let's do another taking a look at another reshaping we could change this to four so here we have four by two now we're changing all these two by four these already two by four so we could put it four by two four by two we run that so yeah we have two by four now we could change it to four by two notice that this actually isn't exactly the same as the transpose of this matrix so we're shaping it from two four to four two doesn't find get the transpose of the initial tensor extra shape we could change this again let's say we have four to four to one we run that see that that works just fine and this is because with this we expect eight values and here we have this eight values now it may happen that you will be dealing with tensors wearing you don't exactly know whether this is a four by two tensor you want to get as the output of this reshaping so you can just put a negative one and tensor for automatically knows that this is two so if you run this you would get this output right here let's bring this out let's have this bring it out so we print this out copy this print it out we have to run it and we see that we have exactly the same output now if you just put in negative one yeah you would expect to get the same output as this so putting this you see it automatically understands that the value should have here should be eight we now look at the concatenate function that's T of the concat which permits us concatenate tensors along one dimension so right here we have this values as input we have a specified axis and then we have the name so at the bottom yeah you can have this value you could be a list of tensors or just a single tensor now let's look at this example so let's just copy this example from here so we're going to copy this example we paste this out right here we have T1 T2 and then let's look at the output I'll get the output of this before getting to the output we're gonna have print T this is actually a list so let's say print tf.constant T1.shape and then we print tf.constant T2.shape so we have that so we've been opening out this to T1 and T2 and then we're concatenating them let's run this we have the shape 2 by 3 2 by 3 and we have this concatenation which is 4 by 3 now what do we notice we notice that this first part here this first two rows first two rows actually T1 and the next last two rows T2 now the reason why we have this is because we'll specify this axis this axis right here to be equals 0 now saying that axis equals 0 simply means we're doing concatenation across the rows doing row concatenation means we're taking this we're taking this first tensor taking this other tensor and then we are completing the rows of this T1 right here so it's just like we were taking let's copy this so this concatenation operation is just like this we have in this year completing like this and there we go so this is exactly what we're doing we actually just completing this and then having this so that's why yeah when you look at this you see it's exactly the same as what we have here so that's how the concatenation operation is done to go back we have T2 T1 now we could modify this and take to one now if you specify the axis to be equals one it means we're doing concatenation across the rows or rather across the columns so that said instead of adding extra rows as we just did previously when the axis was zero now we add in extra columns so we come here and we add extra columns so here we have seven we have eight we have nine and then here we have ten we have eleven and then we have twelve so that's what we're gonna get let's go back take that back that's fine and then let's run this and see what we get we see that we have one two three seven eight nine so we add the extra columns and then ten eleven twelve so with the axis equals one that's when we were specifying that we're working on the columns we add in extra columns when axis equals zero that's what the rows we add in extra rows now what if we are having 3d tensors or let's say we're having a 3d list right here let's take this and that and then this you look at you'll know the shape will get from here so this is going to print out the shapes and then let's specify this is zero so we run that and then we have one two three so shape one by two by three one by two by three and then our output here is two by two by three so in fact we're doing the concatenation on the zero axis which is this axis right here so our output is one plus one now we'll go back slightly so let's get back to what we had previously there's a point we didn't mention so let's run this again we have this now what we forgot to mention was that the output that is when we specify axis equals one our output will be we go to the axis one and then we add this to up so up will be two so we fix the other axis and then we add this to up so we have two by six now when is zero we're gonna fix this other axis and then we're gonna add this one up to make four by three so let's run this I'm gonna get four by three let's go forward we will run this again and this is what we have so as we're saying simply specify axis zero we fix the remaining axis and then we add this up so here we have two by two by three now what if we specify the axis to be equals one if our axis equals one our output should be fixing we fix the one and then we add this to up one by four by three so same way concatenating on the first axis here means we'll be stacking up the rows and we'll get this kind of output so let's run this there we go we have one by four by three and then we have this year which is this and then this other one which is the t2 if we had to do this for axis 2 we would have this inspected see we just stack this up on the columns added extra columns and then here we added other extra columns right here to obtain this new tensor we have now another very commonly used method in terms of flow is the stack method as TF the stack TF the stack and then yeah we just go straight away and then you see the kind of output it produces so yeah we have specific we specify axis equals zero it's kind of like similar syntax with the concatenate so we specify as equal to zero we run that and then we should have this so there we go we have instead of like with the concatenation where we just join this two tensors across the rows here when we specify axis equals zero it simply means we are creating a new axis so if both tensors are all 2 by 3 2 by 3 like this we create a new axis and now and then you axis depending on the number of tensors we have so if we had for example if we have to add C1 again let's add this C1 depending on the number of tensors we have we are going to have this shape or rather we're gonna have this axis having a length corresponding to the number of tensors we have so like here we have this three so that's why here we have three we're taking this back to two we're gonna have two so basically we actually stacking up this tensors and you should be able to clearly see the difference between the stack and the concatenate and then using this new T1 T2 for the stack we have that when we have the stack axis equals zero there's an output get shape 2 by 4 by 3 so from here we could understand that we have T1 4 3 T2 4 3 and then we create this extra axis at the zeroth position so we create this extra axis right here so 4 3 4 3 now we add this extra axis and we have now 2 4 3 where basically this T1 and T2 stay intact as you could see here we have this is T1 and then this is T2 all the values are intact forming this big tensor of shape 2 by 4 by 3 so we leave from this two 2d tensors to this 3d tensor by just stacking up this two tensors now when we change the axis or when we working with axis equal one the stacking up is done such that we having this 4 3 4 3 but the outputs we get is such that this additional axis is at the position 1 so unlike here where this additional axis at the position 2 now this additional axis at the position 1 so just like we have in let's say we have in 4 and then we have in 3 so if axis equals or 0 we add it at this position if axis equals 1 we added at this position if axis equals 2 we added at this position we also understand that if we have 3 of this so if we have let's say T1 and then yeah we take T1 let's run this too if we run this too this is what we now obtain we have here for when axis equals 0 we have 3 by 4 by 3 so let's see why we have this we have in this tool we have in 4 by 3 stack with 4 by 3 and then stacked again with 4 by 3 so we have this 3 tensors been stacked here T1 T2 and then T1 again so here we're gonna add an extra axis but then since we have three of this we have we just extra axis 3 length 3 so we have 3 by 4 by 3 that's why we have this right here and therefore when axis equals 1 we add an extra axis so we add the extra axis here we have 4 by 3 we add the extra axis in the middle so here we have 3 so that's why we have we have 4 by 3 by 3 let's check this out we have 4 by 3 by 3 when the axis equals 1 the stacking is done at this position one year and so what you have here is this 1 2 3 stacked with 7 8 9 and again stacked with 1 2 3 this is because we have T1 T2 T1 so if I take this off if I run this I have 1 2 3 stacked with 7 8 9 so I have this 1 2 3 here stacked with 7 8 9 and then to obtain the next I have 4 5 6 stacked with 10 11 12 I have this 4 5 6 stacked with 10 11 12 and then the next is 5 6 2 stacked with 0 0 2 5 6 2 stacked with 0 0 2 1 2 1 stacked with negative 1 5 2 that's it 1 2 1 stacked with negative 1 5 2 so let's go back to when we had C1 right here C1 we run that and that's it so we understand exactly why we have this responses here we understand exactly why we have this up right here to get this for example we have 5 6 2 stacked with 0 0 2 and re stacked with 5 6 2 since we have T1 T2 T1 now at this point you could pause the video and try to find the output when we have axis equal to when axis equal to we are meant to have this so let's just recopy this is that when axis equals to we have 4 3 4 3 4 3 that's T1 T2 T1 and then 4 3 and then we add this axis at the end so here we have still 4 3 3 now to make it clearer let's just take T1 T2 so if we have T1 T2 yeah we just have this too so in this case we just have 2 now we have 2 we're gonna have 4 3 2 so 4 3 2 and that's it so we have 4 3 2 unlike here where we have 4 2 3 and here where we have 2 4 3 that's if we constrain that we're doing we just T1 and T2 so that's it and that's fine so as you could see our up now before by 3 by 2 and the stacking will be done on the columns so let's look at what we'll get let's have this run right here let's run this then we compare with our values here we see we have 4 3 2 and then this output can be seen as you taking this row that's this first row stacking up with this one and then transposing it so it's kind of like similar to what we had here 1 2 3 7 8 9 but yeah we now transpose it then yeah we have 4 5 6 10 11 12 transpose to obtain this we transpose this to obtain 1 negative 1 2 5 1 2 we transpose this other you look at it that way you can see directly that you have 1 7 there we go next 2 8 there we go next 3 9 here is it and then we move to the next where we have 4 10 so we have 4 10 5 11 6 12 and so on and so forth so that's how we get this output right here with the communication we'll see how the stack method can be written as the concatenation here that's I'm reading as a combination of the concatenation method and the expand deems method so yeah we have this stack of t1 t2 on 0 axis which produce this and then following the exact same method we've been given in the documentation we have this concatenation and then we run it so we run this and we see that we have exactly the same answer the reason why we have exactly the same answer here is simple you see that we have this expand deems so if we have let's suppose that we have this two tensor or we have this least made of t1 and t2 which is a which are the tensors one stack so here we have 4 by 3 and then we have 4 by 3 obviously when we do expand deems of each and every one of this that's basically what we're doing here for all this when we do this expand deems 0 axis will be left with one so either this extra axis we're left with this now once we do this we now concatenate on the 0 axis and we have the output 2 for 3 that's because we just basically add this to up so 1 plus 1 gives us 2 as we saw with concatenation so here we have 2 for 3 and this coincides with just doing a stacking of this two tensors t1 t2 then we get to the method path as TF dot path right here we have the definition and then we have this example so we just look at this example here we have this tensor T and then we define this patterns which we've seen up here so we have to pass in the tensor and the patterns copy with the mode and its constant values okay by default this is 0 and here is constant but we must put in the tensor and this patterns now let's take a look at that as we said we have the tensor we have this patterns which itself is another tensor and the way it works is we are gonna add this tensor right here with a constant value by default this constant value is 0 so we have gonna we're gonna do the pattern with a value of 0 that's why if you notice you have this 1 2 3 here is it here 1 2 3 4 5 6 this is it and then we do this pattern with all values surrounding it being zeros let's copy this and check this out in a notebook we run this here okay so that's what we get now let's modify this and say we want to have constant values to be equal say three you run that and what do we notice we notice that we still have 1 2 3 4 5 6 but all the remaining values surrounding it actually all to me now we are having this output here but how is output generated now if you look at this carefully let's take this let's send this back to a 0 if you look at this carefully the way this tensor is defined that this pattern stands as defined as such that here we have the number of rows above the initial tensor that's if we have this initial tensor here we have a set of number of rows above this initial tensor which we are going to pad with this value 0 so yeah we have one row above one lower below so that's why you notice here let's run this see we have this that's why you notice here we have this one row above one row below and then we have two columns to the left that's this two columns and then two columns to the right so that's how we generate this this all padded tensor right here now let's modify this let's say we want to have three to the right one I have three to the right you see here we have now three to the right still two to the left one row above and one row below so that's it now let's change this let's say we want to have five rows below you see we have one two three four five six and then all this five rows below now we could change this method to reflect our symmetric as we've seen here another way of doing tensor indexing is by using this gather method which comes as tf.gather yeah as usual we have the definition and some examples to permit us understand exactly how this works so that said let's look at this first example where we see how this tf.gather does a similar thing as the usual indexing which we've seen already so right here we have this tensor which we see here and then we have the tensor that's params params three simply means we're selecting the value or the element at the third position third index so if we count here we have zero one two three this is our third index and the output should be p3 so right here you see we have p3 now if we want to do this same thing with the gather method we just have to pass in our params and then specify the index so you see a specified three and then we have this output now let's this is out here unlike with the indexing where we had for example suppose we want to do params or want to take in these values what we're gonna do is just pass in say zero because yeah this is actually zero one so one two three so if we want to take all these numbers we'll have one two three and then we add a foot so here we have one two three plus one to give us those values so you're gonna have p1 p2 p3 that's it p1 always set it from zero so stuff on one so we have p1 p2 p3 which gives us this as expected I want to redo this gather we have this we pass in the params and then we specify this indices so right here we have one two and three we run that and that's fine we have p1 p2 p3 we could also do TF the range one up to four so we have that that's fine we run that there we go we have exactly the same response so let's get one step back we have this so let's pair on with a syntax now supposing we do not want to get maybe this three consecutive elements or any key consecutive elements supposing we want to get this zero element want to get the last element I want to get this element right here so this is 0 1 2 3 4 5 I want to get 0 that's 0 1 2 3 so I want to get 0 I want to get this last and I want to get it second to the last. So that said I'll just have to put your zero. I want to get the last element so negative one and so that one negative three so I run that I expect to have those elements. You see here we have an error so this gather method doesn't understand the syntax so we just have we just have to use the usual syntax yeah we have zero one two three four five so we'll put in here five and then here we have three we run that again there we go we have p0 p5 and p2v you see that this permits us to do this kind of even more complex tensor or slicing. Let's take another example we have supposing we have this tensor right here params and then we want to have tf.gather we pass in params and then we have three one for example now note that this params is to is rather is a four by two V tensor so it's of shape four by two V so we want to get want to gather this three one so look at this again let's start from something simple so let's start from the zero you see that doing this will get a zero one two so yeah we're seeing that we want to get just the first or rather just the zero element and it happens to be this one right here now this is this other this other params here we have the shape we had a shape of params that shape we had a shape of six and then here we have in a different shape so this is 1d and this other params here is 2d so the way we approach the index in here is slightly different now let's look at this when we do this gather we have the index zero selected and the default axis is equal to zero even here our default axis was equal to zero that is with this one right here the default axis here is equal to zero if you do this you should have exactly the same response oh yeah we should run this around that so you see that we have exactly the same response now let's get back to this our axis is zero means that we are dealing with this axis right here which happens to be the columns or rather which happens to be the rows so putting out zero here simply means we working with the zeroth row so we're working with this and that's why the output is as such now if we do this as if we do zero for example let's say three and we run this what we get is zero one two and thirty thirty one thirty two so we get in this and we get in this this is because we're dealing with a zero row and the third row so zero one two three you have this and we have this if instead we want to do this selections based on the columns then we will have to deal with the first axis so x is equal to one running this this is what we obtained yeah we're saying we're dealing with a zeroth column so we have zero ten twenty thirty so that's it right here and then the third column but yeah we don't have any third column so that's why it just by default just places all those zeros right here now let's change it and put for example the second column we have two twelve twenty two thirty two two twelve twenty two thirty two we could change this to zero we run that and we have two twelve twenty two thirty two zero ten twenty thirty now let's modify this let's modify our params so let's have this yeah this and we run it again before running that let's let's go back to zero so let's suppose the axis is zero our ship is one by four by three now our axis is zero so we are on this zero axis right here and if you considering the zero axis here we have only this so basically have only this now we say we want a second so one second element of this 3d tensor but there is no second element since we only have just this one so what happens is we have all the zeros right here now the next is we say we want a zeroth element since this element actually exists we see what we get now is exactly this so it's basically giving us back this year now we turn this to one what do we what are we supposed to get since here oh we are on this axis we focus in on this rose focus on this rose we pick out the second row so zero one two we pick out the second row and the zeroth row so let's run this and we expect to get 20 21 22 and 0 1 2 so let's run that there we go 20 21 22 0 1 2 which is what we expected to get now let's double this so let's have this supposing we have this step of 3d tensor let's modify say this one let's find that 0 0 2 and that's it so let's run this now and what do we get we see that since we are on this we've picked out the first axis let's let's even go back to 0 let's run this yeah we see again that we still have the 0 0 because there is no second element but if we had one right here if you had one yeah we don't have the zeros anymore we have this year here is it and then we have the zeroth and here is it yeah now let's put this back to two and then send this axis to one we run that and here's what we get so as we are saying we are working with this axis this one fixed we're working with this axis we just function with a row so for this element right here we check out the rows what is the second row this is the second row and this one so that's what we have here and then for the next element where the second row we have 0 5 55 and 3 1 21 so that's how we obtain this now we have this other method gather nd so it's slightly different from the gather what it does is it gathers slices from params into a tensor which is specified by the indices so this indices is gonna influence the output shape so you have to say whereas in tf.gather that when working with a gather method indices defines the slices into the first dimension of params and the tf.gather nd indices define slices into the first n dimensions of params that said let's look at some examples so right here we have some examples here we have this indices the params gather nd takes in the params and the indices note that unlike with a gather yeah we don't have this axis argument so the gather nd doesn't take in the axis argument and we just basically have this params and then the indices so let's see how this works we have a param this tensor right here or this list which subsequently become a tensor after getting to this method and then as it was said in the documentation we're gonna have a tensor which is going to get the shape of the indices right here but the way is gonna get this is such that we're gonna have this zero element of params and the zero element of params happens to be this here's a zero element of params and then you're gonna have the first element of params so this happens to be this so let's run this and we see what that gives us we have here a b c d so that's fine we have exactly the same as params now if we take this off so let's run that we would have a b we have this one by two shaped output yeah we have this a b right here notice how since we've picked out the zeroth element we have this this is actually the zeroth element so we take this and then we replace right here and we get this output let's modify this to take e and then f so here we have e f that's it and let's do two for example we run this and we should get e f so that's it we have e f now let's do two with one so two one what do we have we have f now why do we have f we have f here because we're picking up the second row that's 0 1 2 this row and the first column so that's why we have just this f right here this is different from the gather where when we do this that's when we have like this index in this is specified to one and then we let's run this and then we do tf.gather params indices here yeah we should have this year ef that's two and then cd1 so we should note that with those gather nd our output is our output ship is defined by this indices right here so what we have is two one we pick this up and then there we go here's what we have whereas with a gather first of all we have an axis which is zero so we're working on the zeroth axis that is we're working with all the rows and then we can out the second row and zero all right on the first row so that's why we have ef cd stick this off we take this other example here now we have this indices so obviously we know that our output is going to take the shape so we expect to have this kind of output but then what values are we going to put in this output so that's what we're going to look now we have this year zero one since there's a 3d shape it's a 3d tensor we have zero when we say zero we're picking up this first element here the zero element and when we say one we're picking up this element right here now zero one means we're picking up this element and then one means we're picking up this because first of all when you pick up this element you have two elements in this element so we have this year and we have this year so when we pick up this element that's the zero element right here we now have to pick up this first element or the element of the first position so here we have the zero position and then we have the first position so that's how we have c0 d0 now for this other one right here so we already know that we would have an output which will look like this so we'll have c0 d0 they're actually strings so understand that yeah we have one zero now one year means we after this two elements we have out of this one on this element right here we're picking up this one so we pick this one year and then after picking this one we're gonna pick out the zeroth element so we have in two elements this zeroth and this first element we pick out this one so we have a1 b1 so right here we should have a1 and b1 now let's run this and we confirm that we have exactly the values we expected now if after picking that's if after picking for example here if we pick this zeroth element and then we pick the first element so we've picked the zeroth and then we've picked this first element right here now what if we want to pick as if we pick this first element in this element we have two elements that's c0 and d0 what if we want to pick this zeroth element so the zeroth element we should have c0 and now yeah we'll pick this first element with the zeroth that's a1 b1 what if we pick the first element that's b1 so this one right here we run that which we have c0 and b1 so that's it let's take this example if we have this indices given to us and then we have the primes and we're trying to gather gather and the we're trying to get it up right here now what would have us output as usual would be of similar shape with the indices now we have 0 1 so notice how we have in here a three dimensional indices so we should have something like this 3d there we go we have this and then we have this okay so we have this 3d indices now we pick out 0 1 picking out 0 1 simply means we're picking out the zeroth element and then after picking out the zeroth element we pick out this first position here so we have c0 d0 and then from here so we first of all start by picking out c0 d0 c0 d0 and then we have then we have the next 1 0 1 0 we're picking out this first element and we're picking out a1 b1 so we have this a1 b1 that's it we download this first part so we close that so we have this here we have this 2d tensor right here which is like this one and then we move on so for the next we have 0 0 1 1 now 0 0 is simply this so we have the zero and then we have this 0 so we have a let's just copy this copy this we have a right here we have a1 b1 a1 b1 and then 1 1 that's picking out this one and then this one so we have c1 d1 c1 d1 that's it okay so we're done with that we could now take this off so that's fine let's now go ahead and run this and compare answers here we have I suspected this 2 by 2 by 2 that's because we have this two elements and then for each of these elements we have 2 by 2 tensor so that's normal for the shape then here we have c0 d0 a1 b1 a0 b0 c1 d1 we actually having something different here here we have a0 b0 and here we have a1 b1 let's understand why we should have a0 b0 right here we get back to this at this position we have this 0 0 now 0 0 we have the zeroed element and after picking up the zeroed element we have this zeroed position 0 a0 b0 so that should be a0 b0 so that's normal we made an error here we have had a0 b0 we now look at this gather nd method while taking into consideration this argument that's the batch dims argument right here by default the batch dims equals 0 so by default we have this output right here where to obtain this 0 1 when you say 0 1 you're simply saying you're taking the zeroed element and then with the zeroed element you're taking the first index so you have this c0 d0 and then 1 0 means you're taking this first position and 0 so first position and this first position you have this zero position so a1 b1 and that's how we get this now let's turn this by dimension to 1 you see that we have a different output now let's understand how this output is gotten you'll have to note that this batch dims here is kind of like making this method to be batch aware so that said once we have this indices passed in that's this indices here we understand that this corresponds to a given batch that's this batch and this corresponds to this other batch so that said oh we haven't 0 1 and then we have already picked this batch so when we select the bad things equal one yeah we've already picked this zero batch and so doing zero year means we're picking this zero element here and then doing a one means after picking the zero element we're picking this one year and position one so that's why we have a b0 now for the next as I said is batch aware so this matches with this second element here with this element at position 1 now 1 here is 1 and 0 here is 0 c1 so that's how we obtain this right here unlike with a format where we had batch dims equals 0 let's look at this other example we have this indices we have the params now when the batch dims equals 0 saying that you want to have 0 1 is just like picking this so we pick this and then we have 1 which is c0 d0 so that's why we have this c0 d0 1 0 we're picking this we're picking this element at position 1 and then 0 is a1 b1 so c0 d0 a1 b1 and then a0 b0 c1 d1 we've seen this already now when you say batch dims equal 1 while you're saying here is all the indexing you will do here will match with this zeroed element so this zeroed element here will match with this so it's not batch aware and then this first element will match with this that said 0 1 year means first of all we've already selected this so 0 means we're picking out this zeroed element and then 1 means we're picking out this one so we're gonna have b0 so we have b0 1 0 we've already picked this one so we have c we have this 1 0 with this and this so we have b0 c0 now for the next we have is batch aware so we've already picked this and then 0 is this 0 0 is only picking only this one so we have a1 now for the next 1 1 means we're picking this one and then we're picking this d1 so we have b0 c0 a1 d1 now let's run this and we hope to get an exact answer so here we have c0 b0 c0 as expected and then a1 d1 you could continue exploring other methods we'll now move on to racked tensors so we have your tf. racked yes overview we try to understand first of all what racked tensors are about to understand the racked tensors let's take this example we have this tensor 2d good print out this shape tensor 2d shape we have that which is 4 by 3 now let's copy this and then paste it out here and modify this one so we'll modify this so we have this 3 this like this 2 3 and that's it now let's print out tensor 2d let's print out tensor 2d there we go print out the shape what we get here is an expected error can't convert non rectangular Python sequence to a tensor so as you could see here tensors are meant to be rectangular that is if we decide to have four rows like in this case we have four rows that's this this this and this then each of this four rows must have the same number of columns so here we have three columns this one one column there's three columns there's two columns hands this arrow is thrown here but then it happens that sometimes we may be working with data which comes in this form that is we may have data where we could have for example this rose or we have this row or having three columns and this other row having one column this row other row even having say five columns go we have this kind of data and the way tensor flow deals with this is by using racked tensors now let's see how to create a rack tensor to create a rack tensor is quite easy we have tensor let's say tensor racked is equals TF dot racked dot constant so instead of just TF dot constant we have TF dot racked dot constant and there we go so we could also have this is like a simple list so could have this as a list let's put it this way we could have a list and then just pass this in here so we have tensor 2d now let's go ahead and print out a shape we have tensor racked that's it let's run that we have an invalid syntax here okay it's because of this so let's run this again and there we go we see we have for by noon now for because we have four rows that's fixed but then we don't have an exact number of columns so we just have unknown here we don't know exactly the number of columns we're dealing with but nonetheless we have racked tensor which we've just created so we see here we have this TF the racked tensor unlike with this let's print out tensor 2d if you print out tensor 2d you see you have TF dot constant now your this actually a list so let's say TF dot constant TF dot constant and then we have that oh we have an error here let's just come let's use this errors because it's not rectangular so that's normal so okay there we go we see we have TF dot tensor here and then here we have let's run this again here we have TF dot racked tensor so this is an example of a racked tensor wherein we have known rectangular data like in this case here apart from this constant method which we just saw we also have methods like the Boolean mask now with a Boolean mask we pass in our tensor or our list and then we pass in the mask let's check out on some examples so we see here we have this 1 2 3 4 5 6 7 8 9 we're passing the mask true false true false false false and true false false notice how wherever we have the false this data or this element in the data is taken off so this 2 is taken off and we're left with 1 3 so that's why we have 1 3 right here because we have true true and then here we have all false so this is an empty list here we have true false false so you have with just seven so that's it and from this we found this racked tensor right here now we could also do this for rows so instead of just specifying for each and every element we just say true false true and then this understands automatically that this means our first row is going to be left out so it's all right our first row is going to be taken so we have 1 2 3 maintained and then this next is false so it's going to be taken off and then this next is true you could check on this other methods we are going to look at the tf.ragged tensor this ragged tensor class contains several methods as you could see here let's check on this from the row lens method right here to see how we could create ragged tensors it takes a potentially ragged tensor the row lens is specified a name and then validate so let's see how this works we have this input right here and then we have this row lens specified so we run this what do we get we have 3141 empty list 5 9 2 6 empty list now how has this gotten you see that with this row lens we're saying okay we are going to start from this first position or from the 0th position and then the first four or the next four we have will form a row so we have 3141 from in this row and then from here we say the next we're gonna have is gonna be an empty list or the next row we're gonna have is gonna be empty so that's why we have this and then from here we continue from our position where we go to the next three so we've gone to the next four at this point we created this empty and then now we're going to the next three 5 9 2 that's how we have this and then from here we have the next one six and then we have an empty so if we take this empty off of take that zero obviously you don't have the empty list anymore so that's it that's how we can create this rack tensors from the row lens from that we now look at the from row limits method here again we have this input values we have the row limits let's see how that works here since we're given the row limits we have this first value four so we come and put ourselves at this position right here we simply go one two three four elements we fix ourselves about this position we create this first element of our rack tensor there we go and then for the next four we are still at that same position so we move we still add that same position since we are at the same position and that we've already used up this first four positions we have this empty list again we move to the seventh position so we go let's go this way we move to the seventh position here we are at this point we pick this up so here we have 5 9 2 and then from the seventh position we move to the eighth position so we go to this position right here we pick this up we have the six again we have this eight position we still at the same position since we've already taken off all of this we just have an empty list so that's how this works let's now take one last you could always come back to the documentation to explore all those other methods with the from row splits method what we do is we actually checking out in this row split we've been giving here each element and the next element so we have 0 to 4 see right here 0 to 4 we pick out this 0 to 4 and yeah we constrain that this position this position here is 0 this position here is 1 this position here is 2 this position here is 3 and then this position here is 4 so when we say 0 to 4 we actually taken from this 0 1 2 3 4 so it does about this position and we get this first four elements that's how we get 3 1 4 1 and then the next we still at 4 so we still at this position so we're getting this 3 1 4 1 so we just have an empty list here now we move to we have 4 to 7 so we go again from here this is 4 this is 5 this is 6 this is 7 so we have all this 5 9 2 and then we have 7 to 8 we have the 7 and then here we have 8 so we get 6 and then 8 to 8 since we are the same position we have this empty list right here we could also convert tensors directly into sparse tensors using the from tensor method so here we have this tensor defined this is rectangular and then we could automatically convert this into a sparse or rather into a racked tensor this from tensor method takes in this lens parameter where with this lens given to us we're able to take only certain parts of our racked tensor so yeah since we have one it means for the first row or for this row that we have here we're gonna take only this one or the first element the next we're gonna take no element so we have an empty list the next we're going to take all three elements so we have six zero zero so that's it for the racked tensor you can always explore all the sort of methods the documentation it should be noted that many times we will have to deal with data which contains many zeros it should be noted that many times we will have to deal with data which contains many zeros and a more efficient way of treating and storing these kinds of data is by using sparse tensors so we get to TF dot sparse right here chicken sparse tensor there we go we see how to create the sparse tensors right here we specify the indices we're gonna understand this soon so let's say we have 1 1 it's actually 2d so yeah we have 1 1 yeah we could have say 3 4 there we go and then we specify the values this is gonna take let's say we add 11 and then 56 specify the shape let's say it's 5 by 6 there we go we have indices values and dense shape so yeah we've defined this tensor sparse run it and then let's bring this out so yeah we have tensor sparse there we go we see that we have exactly the same values we put out here but then let's understand how this relates to a usual tensor or how we could map this to a tensor now to do this let's do TF dot sparse dot to dense so note that we are not doing TF dot sparse dot sparse tensor we're just doing TF dot sparse dot to dense directly so we take this and we pass in tensor sparse we run that and what do we get we have this 5 by 6 tensor right here which happens to be what we had defined here so we had defined the shape to be 5 by 6 we said that at the position 1 1 that's at this this is the zero row the first row so we have this position first row first columns 1 1 we want to fit in this value 11 and we also said that at a position 34 want to fit in the value 56 so we go 1 2 3 that's sorry 0 1 2 3 and then 0 1 2 3 4 so this is a position 3 4 right here and then put in this value 56 so that's how we're able to map this back to a usual tensor then you'll notice that all the remaining values are zeros so this parse tensors permit us to work with these kinds of tensors more efficiently this last tensor type we'll be looking at is the string tensors so yeah we have TF dot strings and then we have this different methods which we could use before looking at this methods let's see how to create a simple string tensor so right here we have tensor string call it tensor string equals TF dot constant so basically we just have it a list made of string elements so hello I a string so that's it let's bring this out we have tensor string we run that and there we go we have this string tensor right here which hasn't defined the type string and then we have the different elements which make up our string you have as we have seen previously all this string methods right here let's look at this joint method with this joint method we are able to combine several elements of this list together and with a specific separator by default our separators is empty string right here so that said let's do this let's say we want to have all those elements joined together so we just have TF dot strings dot join and then we pass in tensor string so we pass in tensor string our separator our separator is equals let's say for now let's keep it at its default value we run that and what do we get we have hello I am a string now let's suppose that we want to have a separator which is the space so let's have this separator you see hello now there's some space between now let's modify the separator we put a plus and we run that and there we go hello I am a string with this separator right we have all the methods like the length the length basically tells us how long our string is so here in the case where we have this you see of hello tensor flow and you have this Unicode string you see how we're able to get that this length is five one two three four five this is 10 and then this is four we have the lower method which converts all uppercase characters into your respective lowercase replacements we have the n grams method and all these other methods which you could explore so that's it for tensors we now move on to variables to better understand tensor variables let's come back to this model which we had defined at the beginning of this course this values a1 a2 b1 b2 which are given initial values get updated as we train our model and so we need to use variables which can be updated as we do model training that said instead of using the TF dot constant we are gonna have X var X var which is equals TF dot variable so this how define this and a variable must always be initialized so yeah we pass an X and then we print out X var and there we go we have this TF the variable its name shape data type and its content we could include this so let's say we want to have var 1 let's specify it as name so we have that and now we've specified a variables name var 1 here's how we create a tensorful variable notice also that we have this trainable argument right here which says whether during the training we update that variable or not and so if this is false then during training this variable wouldn't be updated if it's true then this variable can be updated during training and then we have this other arguments which you could explore now well this TF the variable has other methods so we have the assign method for example and we'll see an example of how this is used we have this variable defined so we pass in this number 1 and then we assign its value to 2 so now when you print out this V what you're gonna get is this new value for our variable now you see we have the assign add this other method right here with the assign add we're simply doing addition on the elements now variable so let's check on assign sub this assign subtract yeah we're just doing subtraction so yeah we have xvar and we could do some subtraction so we say xvar dot assign sub of this three four so it takes in this three four and then what goes on here is we're gonna have one that's the actual value of xvar no xvar is actually this so okay as one so xvar we have this one minus this three and then this two minus is four so let's run this what do we get we have xvar negative two negative two if we modify this to six we have negative five negative eight reason being that the first time we did the update we went to negative two we had negative two negative two now we're doing the subtraction so we have a negative two minus three negative five and then negative two minus six negative eight so now we have negative five negative eight let's go ahead and add eight so let's just do this we have add we run that we should have zero zero so that's fine we have zero zero because negative five negative eight plus five eight would give us zeros so as usual you could check up on all this other methods in the documentation then another important part which we need to mention is the fact that you could decide on what device you want your variable to be so generally we have CPUs GPUs CPUs let's check on this runtime let's change runtime type you see that personally we're using a GPU so if I say none then that's the CPU TPU and then let's do that so yeah we could have with TF device and then we specify that device in this case we have a GPU so we have that GPU with this we could define our X bar so you define your X bar in this scope and you're actually saying that you want your X bar so X bar that's variable one or X bar to be in this GPU now if you wanted to be in a CPU you just simply specify your CPU and that's it now you could also let's print this out first so we have X bar let's run that and then you could check out a device which is found so check out a device see it's not CPU if we modify this to GPU you run that it's not GPU you should also note that this could be done with just simple tensor so we could have X tensor tf.constant and then 0.2 there we go print out X tensor the device so let's run that and we see we have this now we could let's do this copy that there we go we have now let's put your CPU so we have the CPU and then let's take this off there we go we run that and we compare so here we have a GPU and then we have a CPU for our tensor that said it's possible to initialize so you could initialize this right here like this let's say we want to have 1 2 3 4 so we want to have 1 3 4 that's our X let's say this is X 1 and then yeah we want to have X 2 which is tf.constant and then we want to have here 1 so let's say we have this year let's put this list so we want to have this X 1 and X 2 which is in a CPU initialize in the CPU and then we want to carry out the computations in the GPU so that these computations could run faster so right here what we're gonna do is we're gonna have X 3 which is equals X 1 plus X 2 so as simple as that now obviously broadcasting will happen here and then we're gonna have 1 1 1 which when we add with all this it gives us 2 4 and then 5 so that's it let's now print out our X 1 let's take this off let's print out X 1 print out let's print out X 1 and where it's found so X 1 the device copy that take this off that's it we have now X 1 X 2 X 2 X 2 X 3 here X 3 so let's run this and see what we get we see we have this CPU CPU and then your GPU now we have 2 4 5 so output which is expected so that's it with just looked at tensor for variables and this marks the end of this part on tensors and variables don't forget to like and subscribe so you never miss amazing content like this and in case you want to gain some solid foundation on linear algebra you could check out our cards see you next time what's up everyone and welcome to the session where we build a linear regression model and a deep neural network to predict the current price of a second-hand car based on the number of years that cars been used its number of kilometers traveled its rating condition economy the present state of the economy top speed horsepower and torque and so we're gonna build models which when given this inputs permit us predict this current price as we could see in this results right here where we have in blue the models predictions and in orange the actual price so before moving on it's important to note that we are gonna follow this machine learning development lifecycle where we start with defining the task we'll look at the data source we're gonna prepare the data build machine learning models which permit us learn from this data to the right arrow functions for this learning process and then get into the training and optimization from here we're gonna measure the performance of the model on this data we're gonna validate and test our model and then finally we are going to take up some corrective measures to improve the performance of this model our task here is to predict the price of the used car using some input features in this case we've selected just one feature that is say the horsepower so supposing we have this and then we have the price in thousand dollars and then we want to make use of this input data right here that is this horsepower and the corresponding prices to train a model such that when given the horsepower we'll be able to predict the corresponding price of that used car or when given this 150 we'll be able to predict the price of that used car based on a model which has been trained on this data right here plotting out this data we could have this in the x-axis and then the price on the y-axis as you could see on this plot based on a given horsepower we are able to predict that the price of the second-hand car now if we draw points like this to draw lines like this from those points you see that we have a continuous range of real valued outputs right here and since our outputs here can take up continuous values we term our task as a regression task now let's modify this problem so that you better understand regression problem here we have this house power and then if we modify this such that we are going to insert predict whether the car is going to have the car is going to be noted as expensive or not so instead of predicting the price we may want to predict whether that car is expensive or not so let's say for all cars below say eight let's say 8.5 they are cheap and then so we'll call that cheap see and then greater than 8.5 we have the expensive cars so yeah we have going equals to here we have cheap here is expensive and so from here we may have a different kind of task that is one in which we want to say if based on some input or some inputs because you may have many inputs right here based on the inputs the car falls under one of these categories right here so you could see that this kind of problem is one in which our outputs are discrete see you have just two options and then with this problem where we're trying to predict the price we don't just have some two or three or four options which want to pick from we have an infinite number of options since the prices can go from let's say a thousand dollars we suppose that we'll fix a minimum to a thousand dollars and then the maximum to a hundred thousand dollars so we could fix this to a hundred K from K from one thousand to a hundred thousand but the values fall under this range which is actually an infinite number of possibilities unlike here where we have a finite number of possibilities we now move on to the data and before diving into that let's look at a big picture so here we are having a model which is gonna fit in this inputs and the outputs such that later on we could have this input fed and then we obtain the outputs notice a difference the change in direction of the arrows so initially we have this inputs so we could have this inputs right here let's take this off we could have this inputs that is this X and then this Y so here we go we have this inputs and then after the model trains we will now pass in just this and then we get this one automatically so let's get straight away into how we're gonna prepare this data such that it could be passed into this model right here you already have the second-hand cars data set made available on the cargo platform by Mayang Patel you already have a description of our data set so we have these different features which will be using to predict the price of the second-hand car so that said you could see here you could get more details so basically here's our data set as you could see and then by clicking on this so you could start and then you could get to this compact view column view here you see that we have the ID which is we're not gonna use this because this is actually not part of our data set but just an ID then we have this unrolled old feature right here though isn't actually well explained because we don't clearly see exactly what it signifies nonetheless we have let's take this off we have this mean and the standard deviation so as we had seen the previous video we could have this and most of our values around the mean that is most values we get in this data set for the unrolled old feature lie between the mean this is new the mean and signal standard deviation lie between the mean minus the standard deviation and the mean plus the standard deviation so in this range we have most of the values so this simply means most of our values lie between 602 K as we could see there plus or rather 602 K minus 558.4 K so that should be about 543.6 so if you subtract this 58.4 we have 543.6 so this means our values fall in this range now most of our values fall in the range 543.6 and 602.2 plus 58.4 so that's 660.4 so most of our values fall in this range right here so that's what we mean by having this mean and the standard deviation so we're gonna have this same right here now we have other features let's take this off we have other features as you could see we have unrolled old and unrolled now which wasn't very well explained we wouldn't use this and then we have the number of years that's obviously number of years of usage of the car you see we have missing number of missing values we have known number of mismatches known and valid a thousand so this kind of like cleaned data we have the mean that's 4.56 years and the standard deviation 1.72 years so most of our cars in this data set have been used for most of them will fall under the range for the 4.56 minus 1.72 and 4.56 plus 1.72 years so number of kilometers covered there we go we have the rating that's the car rating at the moment you want to buy it the condition and moment you want to buy it you see there's a mean right here the current state of the economy the top speed of the car the car's horsepower and that's it so your other different features we have and which we will use in predicting a car's current price so that said we could download this for free so we have this downloaded we now get into our collab notebook where we're gonna do the preparation of this data so here we have this three inputs we have tensor flow for our models we have pandas for reading and processing data we have C bond this is going to be for visualization so that's it now you don't need to have any prior knowledge on this tool as we're gonna explain every single step of our data preparation and visualization processes we've now uploaded our data set right here the strain dot CSV file where you download it from the cargo platform you could see it right here so double clicking on this gives us this file we have here now click on a hundred yeah we have 10 per page okay let's say I want to get a hundred per page so that's it we have this different data points right here now that said as you could see we have the ID on road old on road now the years number of kilometers radiant conditions economy top speed HP torque and then the current price so that's it that's for each and every data point and now let's do this so let's break this up so we could do this we could break this up so if we break this up like this you see we have this first section our inputs and we have this other sections right here all this this other section actually we have this this section our outputs so here is now X and here we have the Y yeah we suppose that we have totally n data points that is if we have if you count all these data points and we have n of them then we could have this shape all could represent this inputs in a tensor of shape and by yeah we're not taking this we're not gonna take this column we're not gonna take less we're not gonna take this column we're not gonna use this column because we don't understand it very well we're not gonna use this we're gonna use this so we have one two three four five six seven and eight so we have n by eight that's it that's our input X that's a shape of our input X and here we have n by one so this is a shape of our output tensor if you're able to put this data we should be given the CSV file in the form of a tensor with the shapes then you could pass this now into a model like this let's call this M a model has this in and then train this model such that when you come now with new data the model is able to predict the current price now to read CSV files we are gonna make use of this pandas rivalry right here is here we will define I was imported this as PD so that's why we have in this year so this particularly pandas and then we have read CSV so this method is used in reading CSV files like the train the CSV file we have here and then we specify the separator let's open up this file with Excel there we go yes what we get is very similar to what we've seen already closes up and now let's open this up with a notepad so okay that's it so you'll notice now that this is our same data but the way is put out is quite different as you could see all our column heads are separated by this commerce so we separate each column by a comma you can see that clearly from this and then when you go to the next row we also have the separation by commerce let's do this let's put this link so we see this one see linked right here we have this number right here linked to this we have this next number linked to this and so on and so forth so that's it and that's why we are having comma separated values format so this is the meaning of the CSV format commerce appeared values so we see each and every value is separated by a comma that's each and every value which makes up this row is separated by a comma and that's how CSV's are formed and you should now understand why in this code when we read the CSV was specified at our values are separated by commerce now in a case where you have data and then you have something like this so we may have this semi-colons everywhere so let's say we have the semi-colons we're gonna save it as a different file so we could have the semi-colons there we go you know all the separators common columns are separated with the semi-colons we save this and we've now uploaded it right here so that's what we have see we have this right here now let's go ahead and read this so we've created this data and we could say data.head you see data.head we run that and see we get the first five rows of our data and straight away you could already do stuff like this data.shape you see we already get the shape of our data we have a thousand by twelve now let's get it as semi so let's just copy this so we'll copy that and paste it out here and then let's run it with supposing our values are separated by commerce so let's run that and this is what we get you see we have this head which is not actually well formatted because by default this formatting is meant for comma separated values now what if we modify this we run that again and you notice that there is a difference actually now so let's do this side by side from that let's take this back to comma so we have that and you notice that there's a difference now here as well formatted whereas here is not very well formatted so that said if you want to work with these kinds of data you could always make use of the pandas library let's print out data that's shaped here we run that and then yeah we print out let's say data.shape so let's have that see yeah we have one by one which is not exactly what we expect to get we've now been able to read our data and then put this data in some sort of data structure let's now visualize how each and every feature right here is related with each other and to do this we're gonna use a seaborn library right here so seaborn has been imported as SNS so we have SNS.pairplot which we have here SNS.pairplot we pass in our data and then we have all these features with the output you can check out the seaborn.pairplot documentation on the seaborn website so right here you have the different parameters you have the data you have the hue which has to do with coloring and then we have this data kind which we use and the data kind is a kind of plot for the diagonal sub plots which we're gonna see shortly so yes we see we've selected a KDE we run this the KDE actually stands for canal density estimate after running this this is what we get we see this KDE plots in the diagonal now let's look at each and every one of this so let's take off this IDs because we're going to be using this ID unrode old unrode now that's because we don't understand exactly what the signify so we will run this again here is what we have now scroll up a bit and now we'll say we have the number of years and this plot here shows us how the number of years is related to the number of years you see how this related to a number of years in the diagonal we have how each feature is related to itself so let's reduce this okay so yeah you could see how this okay so you see out here we have this years related to years here we have kilometers related to kilometers radius to radians and so on and so forth that so that was diagonal that was a KDE plot now here we have how the years is related to the kilometers so we see how okay let's take it from this so we have here this kilometers and then how the years later to the kilometers here we have years related to the radians and you repeat the same process to get all those relationships between these different features but one very interesting point to note here is we could already see the relationship between these different features and the current price so let's take for example yeah we have current price so this is a plot showing how the years number of years is related to the current price and this is based on the data we have we see clearly from here that for each number of years spent the current price could go from very low to very high now if we move to the kilometers you see a certain pattern you notice that as we increase in the number of kilometers covered by the car the overall current price of the car drops so this is one very interesting feature to note we see a similar pattern to this years with the rating the condition the economy the top speed HP and tort so here is how each feature is related to the current price at this point we convert our data into a tensor so we have tensor data equals t of a constant and we pick in this data so we do this and we run that and there we go we have our tensor data so that's it let's print out this without a shape so we have that there we go we see we have our tensor of type float 64 obviously you could do some casting so we could say tensor data equals TF the cast of tensor data and then we specify we want to have an eat say all right I want to have a float say 16 so let's run that again a problem which arises now is we have some values which are converted into infinity the reason why we have this is because some of these values are too large to be stored as foot 16 data types so we're gonna put this 32 you see we run that and everything is intact another important step to take is to randomly shuffle our data so to avoid any bias based on the way in which the data was gathered with it is random shuffling such that this other right here is no longer respected so let's do that what we need to do we have our tensor data is now equals TF the random that shuffle so we have that and we pass in tensor data that's it let's print out tensor data let's say we print out the first five elements so let's let's print that out here is what we get but then let's copy this so you could see the difference before and after the shuffling so let's say we have this let's take this off let's have that and then we have our tensor data let's run this again we print out tensor data you see we have this and then we print out this shuffled tensor data you see that here this ID order was respected 1 2 3 and 2 1 2 4 whereas here we have 551 567 229 557 402 so we've actually shuffled our data and then let's take this off now so we've shuffled our data and then we're ready to break this data up such that we have both the inputs that's X and outputs Y or the output Y but continuing let's add this little text here so we have this text we add a text and we say we're doing data preparation that's it put it in bold and we're fine so oh we have that data preparation now right here as we're saying we need to get this X and the Y let's start by getting the X so here we have X tensor data that's it and then we pick out all the rows so obviously we're interested in getting all the rows interested in getting all thousand or 1000 rows but then we're not interested in some columns we're not interested in this column this column this column and this column so we're interested in just this or rather this so that said we have here 0 1 2 3 so we're gonna take from 3 right up to this position right here so that's it so we do that we just have here we pick out everything and then in this next we have 3 right up to negative 1 so let's print out a shape before printing out a shape let's look at this and get that shape before printing out so normally we should have a thousand by 12 but since we're picking out just the inputs now we have a thousand by 1 2 3 4 5 6 7 8 so we have now thousand by 8 so running this should give us just that we run that and there we go we have a thousand by 8 if you bring this out let's print out the first 5 we have that there we go we have just this now notice how here this first 3 values have been taken off and we left with this right up to this value so that's it and then the current price was been taken off we repeat the same process to get the output so here we have Y that's it okay but now we get in all the rows that's true but we get in just that last column so that's it yeah we know why and that's fine so that's what we get for Y not notice that here we have the shape which is actually just 1D so what we could do is after getting this we could expand themes so we have TF that expand themes to add that extra dimension so we have to have the expand themes and then we put that we specify the axis here we have negative 1 so we run that again and what do we get this what we get we have 5 by 1 just like here we have shape 5 by 8 now we have 5 by 1 all like previously where we had just this 1D shape let's take this 5 here so previously we had just this 1D shape now we've added this extra dimension so that it matches that of the inputs we have here from this point another very common transformation which we could do on our data to enable our model train faster is by normalizing this data in fact we actually normalizing the inputs that is we take for every input we subtract the mean and we divide by the standard deviation in this example here the mean of this eight values is around 138 so that means if one normalizes data we're gonna have 109 minus 138 divided by the standard deviation let's say the standard deviation is 150 for example so this means this point is going to be converted into negative 0.193 if we now take a value like 206 we're gonna have 206 minus 138 divided by 150 which will now give us around 0.45 so if you notice our input features have been rescaled before passing into the model and to carry out those features killing TensorFlow has this normalization layer right here which is part of the tf.keras.layers now we haven't spoken much about tf.keras because we have not gotten into the modeling but for now just note that in this tf.keras.layers we have this normalization class which can be defined by specifying an axis the moon and the variance that's that we have from tensorflow.keras.layers we are going to import normalization so that's it that's from that and then we get to use this normalization layer before looking at our data let's see how we could use this normalization layer we define a normalizer then we have normalization we pass in nothing inside there by default the axis is negative 1 we'll look at that shortly and then here we are gonna have the X to be normalized so we have X to be normalized which is this tensorflow tensor right here let's say we want to have 3 4 5 6 7 there we go and then we do normalizer and we pass in X to be normalized so that's it we have as this and we see we have exactly the same output that's because yeah we haven't specified the mean and the variance so we just have this the input pass as an output now let's go ahead and specify the mean so if we say we want to have a mean of let's say let's if we look at this let's say 5 and then the variance of C4 we run that and there we go what do we notice we notice that this inputs are being rescaled into this new inputs now let's add another wheel right here so we want to have four say five six seven let's keep it simple so we have this okay that's eight so we have this next row that's it and what do we do we run this let's correct that there we go we run that and we see we have this normalized properly though it's worth noting that seems by default the axis is equals negative 1 so by default we have this axis equal negative 1 it means this normalization here is done with respect to the columns so if you could recall from our previous section we have our shape 2 by 5 and picking the axis negative 1 means we're picking this axis here and this actually corresponds to the columns that's 1 2 3 4 5 so we have all five columns and then for each column we are doing the normalization so practically what we're doing here we're doing 3 minus 5 divided by the variance we should speak we normalize that 4 minus 5 divided by the variance 4 we have the value and so on and so forth we've made a slight error in this definition is actually X minus the mean divided by the standard deviation so it's not a variance not divided by the variance but if a standard deviation now we also know that the standard deviation squared is equals the variance so basically when we have this variance to obtain the standard deviation we simply find the square root of our variance so in this case our variance is 4 so standard deviation is 2 if we replace here by 2 our mean is 5 that's it and now let's pick X let's take X for example to be this to be right here so if we have 3 see when we run 3 you should have negative 1 and if you pick 4 you should have negative 0.5 so that's it so that's how we get this now let's take another example but in this example we don't specify the mean and the variance now you should know that it's not an every situation where you could get the mean and the variance upfront so there are some cases or in fact in most cases you just have the data so you wouldn't always go and calculate this mean and variance upfront so what TensorFlow allows us to do is to obtain this mean and variance automatically so the TensorFlow permits us to adapt to the data we're given so if we're given this data for example what we're gonna do is get the mean and variance for this column that's this column right here middle 34 and then get the mean and variance for this column for this column this column this column and be able to normalize our data so just like here where we suppose the mean and variance for each and every column was 5 4 yeah we're gonna get the mean and variance for each and every column got an automatically so let's look at that we don't need to put a mean and variance now so we just let that go could take this off too since by default the axis is already negative 1 so we have that and then what we do is we do normalizer that adapt so we're gonna adapt to this our data we adapt to the normalized data and then we pass the normalizer pass X normalizing this to obtain this now so this is what we obtain when we adapt automatically so our data now let's understand what goes on we have 3 and 4 here the mean is 3.5 so that's clear so we have X minus 3.5 and divided by the standard deviation if we have two values 3 and 4 it's clear the mean is gonna be 3.5 that's gonna be in the middle and then if we have a mean of 3.5 so we have something like this our standard deviation should be 3.5 minus 0.5 gives us 3 so here we have our mean minus the standard deviation to give our least possible value so here we have standard deviation of 0.5 such that 3.5 minus the standard deviation gives us 3 and still 0.5 such that 3.5 plus 0.5 gives us 4 so that's it so coming back here if we take this 0.5 and we put in the value 3 we run that you see we should let's correct this you see we have negative 1 so that's how we have negative 1 here and when we put 4 you see we have 1 now if we move to this next you have 4 and 5 and we maintain the same mean and standard deviation you see we wouldn't get those answers right so let's take this with 5 you see here we have 3 which is not what we get here see when we have 5 5 is transformed to 1 and this is made possible because we get the mean and the variance for each and every column so that said when it's 4 and 5 we have a mean of 4.5 standard deviation 0.5 so if we change this now 4.5 0.5 put this to 4 you see we have negative 1 again and then 5 oh we have 1 so if we modify this now let's take this to let's say 10 let's run that what do you see we have negative 1 1 why do we have this is because the mean of this is 7 so yeah we're gonna have a mean of 7 we're gonna have a standard deviation of 3 when we have 10 we run that we see we have 1 still and then with 4 we have negative 1 still we could go ahead and add now one extra row so let's say 32 1 that and this let's run this again and we see that we have our normalized data right here and which has been done automatically unlike where we needed to specify the mean for each and every column here our normalizer adapts to our input data so how is all this related to what we've been doing here all we need to do is we'll be given X so we have X recording in normalization on the Y obviously we're doing this on the inputs so we are having our inputs X we've gotten already so we could print out the shape let's take this off you know the shape of X X shape see we have oh correct that we have a thousand by eight so we're gonna normalize for each of these eight columns and instead of struggling to get a mean and variance for each of those columns tensor for premise also adapts to this our data set so that said we have that given to us we could just we copy this oops we have that is it out we have our normalization normalized this is actually X so we don't need to specify this anymore so here we just adapt to X adapt to X and that's it let's bring out this X so we see kind of the kind of outputs we get let's say we want to get the first five elements again so that's it so what do we see we have that and then let's bring out X first five elements our first five rows there we go so yeah we see we have this five which is converted 0.25 we have this 4 to negative 0.2 3 2 and so on and so forth so this is with an automatically and we are now ready to clean our model using this normalized beater we've gotten to the point where we're gonna trade the machine learning model in our case our model is simply this straight line right here of the form of equation Y equals MX plus C as you could see here we have this inputs X so there we go inputs X outputs Y and we have our weights M and C which happens to be this constants that said we have an X which gets in here it gets multiplied by M so we have a multiplication by M and then we obtain M X so now we have M X and then it gets added to C so here we have multiplication and here we have addition and this gives us Y which actually equals MX plus C so let's say it gives us MX plus C let's take this off we have MX and then we put all this into a box which is what we call our model now notice here let's have this we have our model we have X get into a model and we have our output right here and then what we're trying to do is to get the most appropriate values for M and C such that we have an output which best represents our data set so that said our model could be represented by this line so we could have a model like this we could have this we could have this you see we have so many different possibilities and so when we talking about a model essentially what we're having is a function which tries to be representative of our data set now this means if let's take this off this means if we have a model like this clearly this isn't a very good model this is a very poor model since it doesn't actually represent it doesn't embody the data set we have in here and so a model looking like this will be a better one for now we'll just stick to the model creation but we should note that this model has to be chosen such that it represents best our data set so we're gonna look at this in the error management and training and optimization sections and before we get there that is before we see how to get the optimal values for M and C we are going to create this model using TensorFlow the good news is TensorFlow makes it very easy for us to create deep learning models so here we can define our model TensorFlow.keras.sequential so we have that and then we pass in our normalizer and we have a dense layer so here we have a dense layer which takes in an output of one now this dense layer is actually a Keras layer so let's get in here so from TensorFlow Keras layers we imported normalization previously now we import the dense layer so we have dense we run that again and right here we have a model model summary there we go we could visualize this model we've just created our first ever model with TensorFlow so let's break all this down now we have this sequential API which we use here in creating this model this is just one out of two different ways and ways models are generally created with TensorFlow we have the sequential API we have the functional API and then we have the subclassing method so that said for now we'll work with a sequential API we could look at the documentation for the sequential API in the TensorFlow.keras model so right here we have this sequential click on that and there we go so basically what this takes in is just layers and here we have some examples on how this is used so that said you could see we define a model and then we add layers to this model so basically we're just stacking up layers to this model now this means instead of having this syntax right here we could instead have this so we could take this off we take this off we have that and then we have model dot add tf.keras our legacy normalizer so we add the normalizer and then model dot add the dense layer so we have our dense layer output one and that's it let's run this and we get exactly the same output we had right here now let's take this off get back to our sequential API so as the word goes is the way we build models when layers all form a sequence that is if you're building deep learning models where the way they are constructed is such that we have the input we have the model we have the output and then all the layers which make up this model simply just stacked up one layer after another so we could have this layers which are from the model we'll look at more complex layers later but for now let's suppose we have this kind of model made of different layers layer one layer two up to layer n where we just simply stack up this layers in this way then working with a sequential API is a good option now let's take this off let's take this and then make use of the exact model we're actually currently dealing with right here we have this model and we have this other model so we have all rather we have this layer and we have this other layer right here we have the normalization layer which we've seen already we understand that our inputs need to be normalized before being passed into our dense layer right here so here we actually have this dense layer but then this should be the first time or maybe it's the first time you get in of this dense layer so we try to explain what goes on in this dense layer but you should note that this model is simply made of the normalization layer and the dense layer so without the normalization layer let's look at how the dense layer works with a dense layer as you could see here let's suppose we have an input so we have the horsepower input getting in here we have our input then in this dense layer we take this input the horsepower which we call X multiply by M and then add a value C this M happens to be the weight and then the C is what we call the bias so that said we have this and then here we have M X plus C so here's our output which is actually equals Y predicted so let's call this Y predicted and that's it so what do we do now well we have many variables once we have many variables we still have our same dense layer but there's a difference so here we have this variable let's suppose we have eight variables so we have one two three there we go we have this eight variables now all these get into our dense layer so we pass this into our dense layer then what goes on in this dense layer is that for each of these inputs we have this M1 M2 M3 M4 M5 up to M8 and we have this M1 times X so we have M1 X similar to what we had here here we just had one M since we had one input but now we have M1 times X that's the input here let's say X1 plus up to M8 X8 then we add that bias so we add this bias C right here and then we have plus C so yes our dense layer right here this is exactly what goes on in our dense layer and then we have obviously our output Y predicted so that's it we now understand exactly how this dense layer works now you'll notice that we have one two three four five six seven eight weights plus this bias making us nine different parameters and that's why right here you actually have this nine params given to you you have this non-trainable params 17 this comes from the normalization layer now it's non-trainable simply because we've already fitted this normalization layer to our data or we've adapted this normalization layer to our data so we don't need to modify the parameters mu and variance anymore we don't need to modify the mean and the variance anymore actually so that's it we understand exactly what's going on and why we have nine trainable parameters here so from here we should note that the way we constructed dense layer is quite simple all you need to do is to see how many outputs do I need to have now in our case we just want to output a current price and since we want to output this current price which is just one value our number of outputs here equal one so that's why we pick this number of outputs to be equal one now if we needed to predict say two current prices or maybe you needed to predict the current price now and the current price in the year say 2030 then we would have output 2 so that's it we just have one output that's why we do that from here we have this model the summary which you can call very easily and see this kind of interesting summaries of our model so that's it we now look at how to plot this model out that's quite easy we have tensorflow.carazadutils.plot model and now we pass in our model we specify to file want to plot this model out and then generate a file so we just say model.png show shapes show shapes there we go true so we want to show shapes we run that we see clearly here we have an input layer we have our normalization and our dense layer notice how the inputs and the output has been specified so this known here is actually the back dimension and since we could treat data of any batch size we just have this as known now what's our batch size you see our full data set we had right here generally we are not going to load all this data set at once into our model as we may be limited by memory requirements so that said what we pass into our model is actually passing batches so we could work in batches of two so we could just pass out just two values or two of these data points into our model we could pass just this two and so on and so forth if our batch size is say eight we just pass all this into our data into our model at once and then move to the next batch and so on and so forth but now note that generally you will not want to have a very large batch size as this will lead to a problem known as over treating which will treat subsequently in this course but for now just note that working with batch sizes of 32 and below is actually a good idea so let's come back to our model which we had here we now understand why we have this nodes here this actually our batch dimension now we could also in addition to this normalizer or before the normalizer would define an input layer so let's let's just declare this here let's just add this here let's import this we have input layer there we go so we have our input layer which we're gonna add from this Keras layers if you notice here we just had this node so our inputs are not specified the input shape is a specified output shape not specified so what if we come and specify this here we have input layer input layer sorry we have this and then we have the input shape which we could specify so yeah we have a shape of the batch by 8 so we have a batch dimension so if it's 32 we have 32 by 8 so this is what's getting to our model at once and not maybe the whole thousand samples we have so instead of sending all this we send just a batch of 32 now when creating this input layer since obviously upfront we don't know the exact batch size we'll take this off and just have that so we just have this shape we pass that we have this error let's add a comma and we also have to know that this is a list we're passing in so we have this first element of the list this next element on this other element and I would do this or we use the model add option which we saw already so we run this let's have this here and there we go so we see we have exactly the same kind of response as previously and then let's run this again so we see this difference now we notice that we don't have known again here now we know exactly the shape of our inputs so that said we could now get straight away to the next section which is that of error sanctioning so at this point with how to build a model as you could see y equals mx plus c and we'll now see how to know how well or how representative this model is of our data set so to check this out you'll notice that for each and every output we have so supposing we have this points right here we are going to compare the actual output with what the model gives us so our model tells us that at this point so we add this X this is X axis at this points right here we should have what our model tells us is we should have this output this so this is our selling price are cut into the model but then what we actually have is this so what comes from our data set or the actual selling price is at this point if we look at this other points right here you see that our model does quite well as our model or the actual price is this and if we extrapolate we see our model tells us our price is this so the actual price and the models price is quite similar now if you take another point like this you see this performs poorly here our model tells us we should be around this selling price and the actual selling price is around this so this tells us that if we want to choose the best possible values for m and c then we have to choose them such that we minimize these differences so these differences we have here that's this is what the model gives so we have to try to minimize these differences by minimizing the differences we actually sanction in this error so with sanction in the model every time it makes these kinds of errors so with these kinds of large errors we have here we're trying to sanction the model and then when we kind of like work perfectly like this our model receives less sanction or in the case where it's actually perfect our model is if no sanction that said this sanction is actually quite simple like oh what we'll do is when you have an error or simply let's say we have an output so we have this error but the way we calculate this error was we have an output and then we have the already we have the actual output and we have the models predicted output so we have the actual output here let's call it YA and then we have Y spread so this is what our model produced right here now what we could do is we subtract this two and then we square this and you see clearly that if the Y actual on the Y predicted both the same then in that case we will have zero because YA Y spread is the same so YA minus Y spread is zero and if we have zero it means we haven't to give that model a zero sanction for that particular prediction like in this case right here now if there's a very great difference between this two then we are subtracting it and also squaring so you see clearly that if you have maybe two and four so if our actual is four and what the model produces two then two minus four would have already four minus two anyway actually because we are going to square this so we have four minus two we're gonna give us two and if we just Y spread minus YA we would have two but now we're squaring this so we have two squares so we now have four so we're squaring this error so any time we make an error we amplify that error now if we want to get an overall error we could use what we call the mean square error function with a mean square error function what we're basically doing is this but we're repeating this for each and every point and then we're finding the average of all those errors and as usual tensorflow already has this built in so all we need to do is just make use of this function which has already been built here we have our loss function you notice that it's under the tensorflow.keras, tf.keras let's roll up here we have tf.keras we've seen the sequential now we're looking at the losses so here we have this losses we could also check in this layers here and you'll see that we have our dense layer so we start to understand how this tf.keras section here is structured so here we have our dense layer which we've seen already the number of output units we're going to look at all this subsequently for now let's close this and get back to our mean square error so here we have losses mean squared error that's it there we go we have here some examples so we can look at this y true that's the y actual under y predicted by our model now doing the mean square error here we have 0 minus 1 that's 1 1 square is 1 1 minus 1 0 we have 1 plus 0 plus 1 obviously 0 minus 1 is 1 is negative 1 negative 1 square is 1 so for now we have 1 plus 1 because 0 minus 0 is 0 so when you add up all this we have 1 plus 1 that gives us 2 now how many different elements we have we have 4 elements so 2 divided by 4 it gives us 0.5 so we've summed up all these square errors and then we divide by the total number of elements we have which in this case is 4 so that's exactly how we obtained a 0.5 now we have that we just simply come right here and then we have from tensor flow that keras the losses we're going to import our mean squared error so that's it from that that's fine and we could make use of this mean square error now the way we make use of this is actually when we compile in our model so we have model compile and we define our loss so our loss is a mean square error which we just defined mean square error that's it and what this does is as we compile in our model we can take into consideration the fact that our error sanctioning function is going to be the mean squared error now for regression tasks apart from the mean square error we also have the mean absolute error and here every time we do the subtraction of the y true that's why actual and the y predicted instead of squaring this what we do is we calculate the absolute value so it's quite similar to what we've seen already with a mean square error just the only difference here is we have the absolute of the subtraction of the difference between the actual and the predicted value of y to better understand when to use the mean square error or the mean average error or rather the mean absolute error let's take this example you see with this example we have the horsepower and the current price most times the horsepower is positively correlated to the current price that is if we increase the horsepower generally overall we have an increase in the current price and for reduce the horsepower we have an overall reduce in the current price now we may have this point here which is actually what we call an outlier we see that we have a very high horsepower but the price is very low and in the case where we're using the mean squared error we'll be having y minus y is y a or let's say y yeah y actual and then y pred so we have y actual minus y pred and then we're gonna square all this you see that this is just one or is kind of like a minority in our data set and when modifying the weights of our model we don't want these to weigh too much or to have so much priority in the way we choose the values for M and C and so that said using a loss function like this which actually squares this errors is not a good idea because since this error here is going to be very large so if you continue with this this let's suppose around this so our model others are this X our model pick something like this will be around this and then the actual beyond this so we have this large error squaring this large error again gives us a very large value and so this will have very much to say when we're picking out this values for M and C as basically we're modifying these values based on this error so based on the error we're trying to modify the values such that we pick the values which minimize this error actually and as we've said in the case of the mean square error for these kinds of outliers we making them too important when we should focus more on these other data points now with a mean average error or with a mean absolute error we have YA minus YP and we computed absolute value now here clearly we understand that our overall loss or the loss we get with this kinds of outliers is reduced compared to this since here we just simply finding the absolute value unlike here where we're squaring and so in working with data sets where we have outliers like this is preferable to use the mean absolute error now there is a loss function known as a Yuba loss which actually permits us make use of this mean square error and the mean absolute error in a more intelligent manner that is when we have an outlier we are going to use the mean absolute error but when we have a normal data point we use the mean square error by normal data point we mean a point where the Y true minus the Y prepped which here is this X isn't or is less than a given threshold and by an outlier we're talking about points where the Y2 minus Y prepped is greater than a certain threshold so here we have the definition for the Yuba loss we see how this X X actually Y2 minus Y prepped so when coming the Yuba loss if we have this condition we use this variance of the mean square error where we have 0.5 times this Y2 minus Y prepped square and then when is greater than D so here we're dealing with outliers as these are threshold when we dealing with outliers we have 0.5 times D squared that's times this different squared plus this D times the absolute value of Y true minus Y prepped minus D so there's a formula the Yuba loss formula again we don't need to write all this from scratch since we have this with tensor flow so that said all we need to define here is our delta which is that threshold that defines whether a data point is an outlier or not so here again we have our losses we have the Yuba loss and then we have the mean average or rather the mean absolute error this actually Yuba simply so we run that there we go so yeah we yeah we're gonna compile our model let's take this we have the mean square we have the mean absolute and we could also have the Yuba so if you run this you see your model compiles you could just simply specify Yuba but we need to pass in our Delta so we're running this we use the default Delta which was given to us if we want to specify our own Delta we just specify that way so we could take 0.2 and stuff like that so yeah we have all the default data is actually is one so that's it let's take this off we're gonna be using the mean the mean absolute error now we do a lot of experiments so after we're gonna train our model we're gonna measure its performance and so if you measure the performance and you see that it isn't good enough you could come and modify the error function using so you could change the mean absolute error we about to use to mean square error or maybe to the Yuba loss and from there you try to see which loss functions permits you get the best possible performance for your model from here we'll move on to training optimization recall our model was linear function which we were trying to eat such that this year this M and this X are rather this M and the C right here peaked so that this errors here are minimized so just looking at this we could have another line drawn here we have this line we could have this other line so we could have the line and so many on infinite possibility of an infinite number of possible lines we could generate and so in order to get this M and C what we use the method commonly used today is stochastic gradient descent now let's understand how this works we'll just use this or write this out in this one formula we have a weight which has to be updated we call that initially our weights random so we randomly initialize our weights so this means here that if we randomly initialize M and C and then let's suppose we we have M to be 0 so we have M zero and then let's say C is one then in that case we would have something like this we would have a line like this and it's clear that this line isn't very representative of this data we have here and so what we could do now is to update this M and C such that the take up values which permit us adapt to this data set which will be given and the way that's done as we've said already is with the SGD algorithm and this is how it goes so we have a weight so we have a weight previous so we have the previous weight or in the case where we are the first step we have the initialized weights minus a learning rate so we have a learning rate the rate at which we're learning this data which we've been given times the derivative now in case you have no background and calculus you shouldn't have any worries as TensorFlow would take care of this so yeah we have learning rate times this derivative of the loss function with respect to that weight so yeah we have the weight previous so that said if initially we have zero and then here we have C let's say C as we said already is two now what happens is for each and every weight we're gonna take this so we want to get the new values for the weights or for M so we have M it's gonna be equals its initial value is zero so the previous value is zero minus now let's pick a learning rate generally learning rates are picked in the other 0.001 0.1 0.01 0.0001 and you could continue to 1 times 10 to the say negative 6 so yeah we're supposing we've picked 0.1 and now we have 0.1 so that's our learning rate times the derivative of the loss now what's the loss we call we have already seen three loss functions in our error management section so here we have in 0.1 times the rate at which the loss changes with respect to that particular weight so we have 0.1 times this we respect to M and the same is done for C so we're gonna do 2 minus our same learning rate times DL on DC we've now updated M and C and then now if we compute the error we would have Y actual minus this new values of M and C are going to be used here so we're gonna have our new M times X plus our new C and we compute a new error so we'll have a new and so from here we expect that as we keep on training so as we keep on training we want our loss to keep dropping up to say a value of zero since we want to have zero loss though in this case we see clearly that it's not possible for us to use just a straight line and obtain a loss of zero since we cannot pass through each and every point with a straight line so if we're able to build this kind of model then would have a loss of zero since for each and every point the actual output is the same as our predicted output and so we'll obviously have zero loss but with a straight line that's not possible so as a recap we see that we have our inputs and we have our outputs we have M and C that's our model parameters we pass our inputs using our initialized M and C we get outputs we compute a loss that's we compute the difference between what we actually supposed to get and what our model predicts get this difference we could use a square or just absolute value function we get this loss and based on this loss we modify these values and then we repeat the same process to our training converges our training converges simply means we've attained a point where our loss doesn't increase anymore so if we have training like this we could get to this point to see at this certain point we keep on training that is we keep on repeating the gradient descent step which we've seen already where we updates by doing W equal W minus a learning rate times DL over DW so we repeat the step which we've seen already and if our loss doesn't change much or doesn't even change at all then our model has converged and we could stop training at that point as usual TensorFlow does all the hard work for us and so here we have model of feeds we pass in our X that's our inputs XY we specify the number of epochs let's say 100 verbose equals 1 now we understand what this X and Y means that basically our data set now the epochs or the number of epochs here is specified the number of times we are going to update our weights so the number of times we are going to go to the gradient descent step and for verbose which it has to do with the outputs from our training step so yeah we've run we've compiled let's correct this we have okay so we have that we compile that's fine and we run this you see because our variables equal one we are able to get these kinds of outputs so here we're going to see in real time the values of our loss let's stop this so we stop that and then we send variables to zero you will notice that as the training goes on we don't get to see those loss values anymore so let's take this back stop this you see variables equal one now we're able to see a lot so as we go from one epoch to another we are able to get the mean absolute errors that the total mean absolute error which is an absolute value of our model predictions or the absolute value of the difference between our model predictions and the actual current prices when we get into this tier the Kerasone model documentation and we go to compile you see we could have this right here we have the definition of compile we see how by default our optimizer is RMS prop now note that this optimizer or those different optimizers are essentially variants of the stochastic gradient descent algorithm so yeah we get into optimizer as you could see we have added Delta out of grad item and you see you have SGD so this is the SGD we've seen already right here so we can have this SGD will specify the learning rate this point we understand what learning means we have the momentum we could increase this parameter so that we could speed up training and we could also specify whether we're having a nester of type momentum right here so when we see nester of true then we're having a nester of type momentum so that's it of all these optimizers the most commonly used is the Adam optimizer so you'll see that many sand is in practitioners generally use this optimizer by default the learning rate is 0.001 beta 1 0.9 beta 2 0.999 epsilon or 1 times 10 to the negative 7 and the AMS grab time is set to false to better understand the learning rate let's take this example so here we have loss versus a weight like the M or the C and now we have the derivative of the loss with respect to the weight so let's consider that this derivative is positive so let's take a point like this we have the weight here is this particular weight let's say W I would have picked out this point and we have this derivative of the loss with respect to the weight or the partial derivative actually and then we have let's say we have the slope so the slope is positive to update this weight from W I to a new weight we apply this formula so we have W I minus a learning rate times this positive slope now if this learning rate is too small let's say 10 to the power of negative 10 so we have a very small learning rate it's clear that we are not going to have a great difference between the W we obtain after this update since this is going to be multiplied by this positive value and W minus a very small number will give us value very close to W so there's not going to be a very great change after this update we're gonna have a very small change actually smaller than this now if L R does a learning rate is too large let's say we take a learning rate of 10 to the power of 3 in that case we have W minus a thousand times this positive value then the change is going to be too brutal and instead of say going slowly towards this point we are going to actually skip that point and get to this other point right here and if we keep doing this we actually go away from this point so we're picking out points which are far away from this point now why is this point important this point is important because at this point as we could see here the loss is minimal so recall that as an aim of this error sanctioning section where we're trying to minimize the loss that said many times by default or from many experiments starting out with 0.001 is going to be a good idea but note that you could always change this so we could feel free to change this you could take something bigger recall that if you take a bigger value your training is going to be faster but risk divergent that is risk not leaving you towards that minimal loss whereas if you take it too small then you would converge but your training is going to take too much time so this value is kind of a great one that is very commonly used now particularly for this item optimizer we have the parameters beta 1 and beta 2 now your beta 1 max and beta 2 max are all 1 so taking a value like 0.9 and 0.99 here means we're taking higher values of beta 1 and beta 2 you want to speed up training you want to increase the values for beta 1 and beta 2 by default this value is set to 0.9 0.99 now when doing or carrying out computations respect to the item optimizer we're trying to avoid dividing by zeros so we have this epsilon parameter here which is by default 1 times 10 to the negative 7 now if one of the AMS grant variants of the item optimizer you could set this to true so that's it we now have from tensor flow that harass the optimizers we're gonna import Adam so we've imported that that's fine now we're gonna make use of that so actually in this compile we have optimizer equals Adam and that's it so we'll define our optimizer and we'll define the loss so let's run this again we actually had an arrow so let's take this Adam run that that's fine and there we go now we've been getting these values for loss what if we start this in a variable so we could have history equal model of feet so this time around after doing the training we'll be able to recall all this loss values we got during training there we go we have history the history we run that and here we got all our lost values let's climb up and you see here we have this dictionary we have lost and then we have this list of all the lost values we now import matplotlib so we now paste this out here we specify this plot we have history the history which we've seen already I was specified a loss because this actually a dictionary will pick the loss we specify the title y-level x-level the legend and then we show this lot so this is what we get right here we see how we drop our loss value is dropping but the way that which is dropping is actually quite small so what if we speed up the training by modifying the learning rate so right here we have learning rate equal 0.1 or let's just pick a learning rate of 1 and run that we run this and we see that the loss actually drops faster this time around from here we move on to performance measurement we actually measuring how well this model performs and one common function used in regression problems like this is the root mean square error you should note that is not in every case where you would have the performance measurement function similar to that of the loss as this two actually quite different concepts for the performance measurement are making use of the performance measurement we are able to see whether these two models so supposing we have two models model one and model two have the same performance or do have different performance and which of them outperforms the other so if we run performance measurement we'll be able to see whether model one out performs model two or model one under performs model two that said we could include this metric in here so we have the optimizer loss and the metric so we have this root mean square error metric which has been added and which has already been imported so imported this year note that this is a metric not loss so we have this metrics with important means square error and if that's fine we could go ahead compile and start our training again now you'll notice that both the loss and the root mean square errors are being printed out so this what we could see now after the training will be done we'll be able to plot this to out so here we have the loss to see we run the loss and we have the root mean square error so this is what we are putting to get this exact string right here you could do history dot history so here we have the history and you would see that okay here you have the loss so we have the first key loss then we have this list of all losses and then we have the next root mean square error and then we have this list made of all the root mean square error values so that's what we that's how we get that we've plotted it out and this is what we have another method which comes with tensor flow models is the evaluate method so here we could have model dot evaluate and we simply pass an X and Y so we evaluate our model this way so you see we've evaluated our model and we have the models loss and the root mean squared error from here we move to validation and testing to better understand validation and testing let's take a simple example imagine you get into class that is you get a class on the first day and you are being taught some course materials so you've been taught this course materials throughout the whole term or say the whole semester and then at the end of the semester the teacher who is feeling lazy tells each and every student to feel free to produce their own exam and after producing this exam they should sit for this exam produced by themselves so they could feel free to come up with any kind of question sit for this exam and after sitting for this exam the Mac describes by themselves it's clear that most students will have max greater than say 18 and 20 and this is simply because this exam has been set by the students themselves now it may happen that a student has followed through the course work and mastered everything and then set real questions or tough questions and got this Mac of 18 same as other students may not have terribly gone through this course work and produce very easy questions which will get them a Mac of greater than 18 without any much effort or without mastering the course work and so that's why the strategy is very dangerous as we need that external validation from our teacher and so far what we've been doing here is kind of like the first method where each and every student produces the exams writes sits for the exams and then marks the exams by themselves and that's simply because here we're using our full data set training on this full data set and evaluating this performance without taking into consideration data that a model has never seen so the idea is to be able to create a model that when it sees new data is able to come up with a reasonable current price which is as close as possible to the actual current price and so when dealing with machine learning models it's important to say break data set into two parts so if you have like the example we're working on 1000 examples you could break this up such that you have eight other examples which you train your model on and then you could test that model on another 200 examples so here we have never other model has never seen this part of your data set right here and this is one very important use of the shuffling because with the shuffling we are sure that there is no bias in the way the data set is is constituted as now if we just break this up we have one part for testing another part for training which has been randomly built and so as we're saying if a model for example has a performance of say root mean square value equals let's take say five on the training data so on this training data and then on the test data it has a root mean square value of 50,000 then it's clear that this is a very poor performing model as it does well on only on data it has seen but on data it hasn't yet seen it doesn't perform well and so recall machine learning is all about empowering the machine to do stuff humans and them to do and obviously humans use some intelligence and doing all those tasks so if you want to build a model you have to ensure that it performs well on data it has never seen and so we always have to split our data set so firstly we shuffle our data set very important we split this data set before proceeding with modeling and training now sometimes we don't want to wait until a model are strained before testing it out and discovering that it performs very poorly on data is never seen so what we want to do is while doing the training we want to be able to see how it performs on data it has not seen and that's why we have a validation set so that's why I need a validation set we'll see now we need a testing set as you can see here and then now we need a validation set and that said yeah we could reduce our training 600 and then we increase this so we have 200 so we have our data set of a thousand data points which is broken up now into our training validation and testing now note that if you have a data set of say let's increase this of this how many this is a hundred if you have a hundred million data points then taken for example like previously we chose this to this is actually 60% 20% 20% but if we have a hundred million data points we could always use 90% year year we could use 5% and year we could use 5% reason being that since the total data set is very large getting 5% of a very large number is already quite a huge number of samples so that is good enough for us to build our validation and testing sets we get back to our data set preparation let's include a cell here we specify our training and validation our training validation and test ratios so we're gonna use 80% of our total data for training or 10% for validation and 10% for testing the data set size we have a year the length of X is a thousand and we run this we added a code cell actually right here at this then we have this X train Y train made of elements in the top 80% position as here we have data set size so if you have 1000 times this ratio would give us 800 and we have the first 800 elements same with this we print out the X train shape and we could do same for Y train so that's it you have this shape right here so that's it we have booking this up thousand now we have a hundred let's do same so let's add this and do same now for the validation so instead of training we have valve and down here we have valve so we have valve and now we want the next 10% so what we're gonna do here is we simply copy this that's it here we have the validation ratio alright we have we're going from the train ratio actually so we have the train ratio up to the train ratio plus the validation ratio because we're going from 0.8 now to 0.9 that said we would have this plus the validation ratio so that's fine next we let's just copy this so we've copied this there we go take this off we get the next values so that's it we've got in this next values oh we have X right here we have X valve X valve and here X valve so we run that let's close this right here so we have this we run it that's fine okay so now we have the next in our next hundred and that we now get the last hundred so since we have no other section we just all we need to do is just specify this specify this to validation actually but right up to the end so we have that we run it and there we go we have the data set which has been split into the training validation and testing we have X train Y train X from Y valve yeah we have X test this should be X test so let's change this X test X test X test okay so we run that and that's fine now one important point to notice you have to avoid information leaking from the training into the validation and the testing so this means even when doing normalization and you're trying to adapt your normalizer to your data you have to not use the validation and testing you just have to use the training so here we're gonna use only the training set to adapt our normalizer to our data from here we have to modify the way we do our training so here we just have to specify that we have X train we have Y train and then our validation data is equal we have this X valve and Y valve so that's it so here we specify X valve Y valve as your validation data and then you have X train Y train as your training data now note that when doing this so with a validation data you actually specify X valve Y valve but there is another argument which is the validation split where you just specify the fraction of the training data to be used as validation data so again here as usual you could go through this documentation you see you have shuffling which you could do but we should have done already you have all these other arguments which you could check out in this documentation getting back to the code we run this we run our training now training is done you could notice how we have this extra outputs here the validation would mean squared error and the validation loss so you could see you notice this right here let's get straight away and to put in the spots again let's do history let's add this so here history history here we see we have the validation loss go up we have the room in square error we should we should get the validation room in square error so let's scroll down there we go we have the validation would mean square error just right here so now we have the training and validation which has been outputted during training process we call the use of the validation is for us to be able to see during training how well our model performs on data it is never ever seen before so let's go ahead and do some plotting here we have our loss we have this right here we specify the validation loss there we go let's run that okay so we see our validation and training loss let's this year we have our loss so we specify the legend okay there we go we have our training data and we have our validation data we see that our validation data our model does better on our validation data actually as we have lower loss values for the validation data we could repeat the same process with the root mean square error so here we have that there we go we now put in your validation so here we have validation and then we have validation we run that and it's kind of like similar here the validation does better or the model does better on our validation set from this point let's take this off we get back to model evaluate so let's evaluate our model by just specifying this wouldn't help so you could actually put this X valve Y valve so we're not evaluating anymore on our training data but not a validation data and we could also now evaluate on our test data recall our validation was used during training we saw how it performed during training now we're gonna evaluate our model on data it has never seen so we evaluate on our test data and this is what we get you see we have this loss and this root mean square error value at this point we'll train our data or we've trained our model on our data and we can get to that long awaited part that is testing our model so here we are not just evaluating our model or doing some validation but we are passing in data and then we are allowing our model to predict the car price for us so that said let's get straight away to that we have X test which we know already well which we built already there is it extend our shape there we go let's pick out a value from this X test let's just say X test 0 so what we do is model that predicts so now we've trained our model we will now do model predict how we pass in X test so here we don't need to pass Y test we just pass X test so here we pass X test and our model should predict the car price for us now passing X test we have all those predictions so because we have a hundred you see shape there we go we have a hundred different elements so for each and every data point of for each and every input in our test and set we have all these outputs let's just pick out a few let's say we want just the first so there's it there is one which is normal because the input shape here has to be of type batch size by 8 but yeah we send in this shape 8 so we should do expand games in case you don't understand this well let me repeat again or let me just do this test zero that shape that's a shape you see that's the zero shape now X test the shape that's the shape now this model as we defined previously you know input right here you know input takes in batch size by 8 so here we have to ensure that if we're passing in any data it has to be of the sheep so now we have put in this and what we know to do is to expand them so that will leave from the shape 8 to the shape 1 8 so that it takes this form and when is like this it means our batch size is 1 so that said let's do expand teams we've seen this in the previous section we have expand teams there we go we add this axis equals zero that's fine and we run that so that's it everything works fine now let's take this off we run that again and we have our response so this tells us that for our first point in our data set or first input our car price prediction is this now oh this is very interesting to view but what if we compare this with what the model or what the data already presented to us because we it's true we level this as test but we know the actual values so we know the actual car price and what if we compare it with this predicted car price so that said let's look at why test 0 and check out this value so here we have a hundred and thirty thousand dollars and here we're predicting eleven thousand nine hundred and forty dollars so clearly our model is performing very poorly so visually understand this form model performance we are gonna plot a bar chart showing the predicted and true values shown side-by-side let's take this to say 20 and there we go as you could see right here scroll okay that's fine as you could see we have this blue which is for predicted and then the orange for the actual values of y so we have the predicted and the actual prices which is shown to us now the world is out is quite simple we have we define we have the figure size we have this bar so yeah we put now some bar charts and we specifying the position of each element respect to the bar chart now there's in here if you plot this out so we're not we also have a non pie to get this so let's run that and you see that this is a list comparison a value from 0 to 99 so this permits us get the position for each and every element so we have a hundred different positions and then the width obviously this Claire let's reduce the width so you understand that you see that visually let's take this let's increase it to one so we've increased the width you see this becomes very large now let's take that back let's see 0.1 you have this you see comes very team and that's it so we could play around with a width now once we fix a position say we'll fix this position we now move with steps further so that's what we do and we have that passed in here we have the level X level Y level and then we put that out so that's how we come up with this plot we shows clearly that our model is performing very poorly this takes us to our next section on corrective measures this one model of performance actually has a name it's called under PT so when you're training loss looks like this and your validation loss normally validation loss should be above the training loss but in some cases like as we just saw it was below the training loss so we have a validation so let's say we have our training and then we have our validation losses which are like this and even after training for a long period of time or for so many epochs we are not able to go below a certain threshold this is known as under feeding and the idea here is to bring this loss values like this so we want to modify our model so that we could have better loss values that's we could reduce our loss values as much as we can that said in order to do this it suffices to make our model more complex for now the model we're having is like this so here we're having this simple regression model where we have our inputs right here and one m2 m2 m8 our weights and then our bias C we add this up and then we have our output so what if we don't we take this off and then we add up more neurons right here so instead of just having one we're gonna stack up more neurons but they are doing basically the same thing so kind of like repeating the same process here so we're gonna have this link to this this one link to this the same way is linked to this one we had previously this link to this and so on and so forth and then we'll do the same process for this link then from here so here we have this dense layer from here again we could add more dense layers so this is what we call a hidden layer here we have a hidden layer we add more so here we could have one and you see here all these are all linked up this way and it's actually the same operation mx plus C so basically we're taking this weight times this plus bias and then we get in this take this times its weight and we get this plus bias we have this and then we continue this same operation so we have this stacked again we have first hidden layer in layer one in layer two and we could even go to another one again so let's add this to here let's add this other hidden layer let's do this there we go we've created this turret hidden layer so here we have our inputs and we have three hidden layers and obviously since we just producing a single price from there we could have this so we now have this our output layer here so input output and then hidden layers here we have this so we have this dense layer and we have also this dense layer let's try to write out or sketch out some tensorflow code here so basically we're gonna have dense so we have a dense layer right here and what does it output that is this first dense layer has key outputs basically comma and then we move to the next dense layer so if I have an at least we have this dense layer right here the outputs unlike the previous one we have just one output now we have this dense layer with two outputs we stack the next dense layer so you see the way its easy to carry out all these operations with tensorflow we stack the next layer with four so we have four outputs then the next we have two outputs and then finally we have one output it's very important to ensure that we have this one output because that is matching up with our data we have a hint where we have one output and we put this in bracket and then have this sequential and all that so that's it we have seen how we could do the real returns of flow and there is one point we need to mention that's the activation functions activation functions since we put non-linear functions which add even more complexity to the model you saw how previously we had these inputs and then we had this one output and we linked them up like this now we've made this model more complicated we've added more hidden layers so that we could learn more complex information stored in our data set now another thing we could do is add in the activation functions as we have said already and make this model even more complex common activation functions are the sigmoid activation function, the tensh activation function, the relu activation function rectified linear unit and the leaky relu so we could have this relu that is when x is greater than 0 we maintain x that is x remains the same or the output equal the input but when x is less than 0 the output becomes 0 now here when x is greater than 0 the output is the same when x is less than 0 the output is negative of certain alpha times x hence the term leaky as this alpha is generally a very small number so if we have 0.1 we could have negative 0.1x which is kind of like giving us very small outputs which are kind of close to 0 now we have the sigmoid activation function 1 divided by 1 plus e to the negative x see looks like this the tensh e to the x minus e to the negative x divided by e to the x plus e to the negative x for now we are going to use this relu activation function and you will see subsequently that all these activation functions could be gotten from tensor flow parast activations so note that if we are to apply for example the sigmoid activation function right here what we will do is just after summing up or just after multiplying the weights by the inputs and getting the outputs once we have this we are going to do sigmoid of this so we do sigmoid so actually for each and every neuron right here we have the sigmoid of this so here we have sigmoid of our computations or the output of the computations here we have sigmoid here we have sigmoid and in the case in this layer we have the sigmoid then for each and every neuron we have sigmoid if here we have for example relu then for each and every neuron we have relu activation we now have all that is needed to make our model perform better and stop under feeding and so we come right up here we have a dense layer now let's say we have 32 so our op will be 32 neurons then from this we have 32 again and then we stack up another one and finally we have this layer right here recall that we've added up this other dense layers but at the output we need to have just one neuron so it's important to ensure that our output here is one we now specify the activation so we have activation equal relu and here we do the same for our output here we are not going to specify any activation because we don't want to interfere in the way our model comes up with its outputs that said we could run this and to make our model even more complicated we could increase this value so we could take this to 128 let's run that again you see that we have many more parameters we are training on again we are still having the 17 non-trainable parameters so when we had this when we had this you see we had 17 and 9 trainable now that we have this you see the number of trainable parameters increases while the non-trainable parameters remain on the same spot we have the 17 non-trainable parameters which is logical since our normalizer remains the same now with that said we've done that we could put our model so we see this plot right here that's a clear plot to see how we live from the input to this dense layer see here known by 8 known by 128 this next dense layer this other dense layer and finally this dense layer from there we compile our model let's take this so here we have 0.1 let's run this and we are ready to fit our model so let's run that we now notice how this loss right here is a smaller order of magnitude as compared to what we had previously notice how this drops from 100,000 about 144,000 and goes down to like 30,000 so that's it for this training we now go ahead and view our plots so there we go we see we have totally different plot now we live from this and then we actually drop to about 30,000 also notice how this time around we have a validation loss which is higher than that of the training and this is normal because obviously the model was trained on the training data so it would turn to perform better than data it wasn't trained on now if a model performs very very well on the training data and doesn't perform well on the validation then we have a problem which is known as over feeding and we are going to trade this in subsequent sections so for now we just continue we cut out the root mean squared error notice recall how we used to have values of this order 234,000 but now we are around 30,000 okay let's run this what do we have okay so we have something very similar to the mean average error that's our loss that's fine we evaluate our model so that's it we have this loss for our model on our test data so here is our loss on the test data here is our root mean squared error from there that's fine okay we do this white bread and here is what we now obtain we notice how this model performs way better than that previous smaller model and as you could see here for this particular test data point you see our what our model predicts is exactly the actual current price our model isn't perfect but it's doing quite well at this point we've taken up some creative measures and our model now performs better what if we look at how to load our model even faster and more efficiently this can be done thanks to the TensorFlow data API so right here you have tf.data you can go to the overview and then for now we check on this data set right here this class is made of many methods so you have those different methods here well now we'll start with the fromTensorSlices method so from here we're going to adapt our code to use TensorFlow's data API in order for us to gain from all the advantages that come with it now note that when you're working with a data set of say a thousand elements you wouldn't see or you wouldn't clearly notice the advantage of using this data API but as your data set gets larger it becomes very important to master how to use this API now that set we're going to redefine our XTrain so yeah we're going to say train data set train data set and we're going to have tf.data.data set .fromTensorSlices so that's it now we're going to have this tuple which we create and then we pass in XTrain and YTrain so that's it now we've gotten this the next thing we're going to do is we're going to shuffle our data set now we've already done shuffling but in the case where shuffling wasn't done previously you could do the shuffling very easily now so here we have this.shuffle and we specify a buffer size buffer size let's say 8 and let's see exactly what this buffer size actually means so here we have a seed and you have reshuffle each after each iteration now let's look at how this buffer size or what it means exactly so here we have been told if you did a set contains 10,000 elements by buffer sizes set to a thousand then shuffle will initially select a random element from only this first 1000 elements so it's just like saying I'm picking this first 1000 elements and I want to carry out my shuffling only from this first 1000 elements such that when I'm picking a random value obviously when we're doing shuffling we're picking out random values in a random order so when I'm picking up my random value I'm going to pick it up from this first 1000 values then as I said here this once this element is selected its space in this buffer is replaced by the next 1000 first elements so if we have a data set like this with one, two, three, four, five, six elements and then we want to shuffle with a buffer size of three in this case we'll initially select or pick up a random value from this here from this buffer from this first three elements and then if suppose we pick this random value right here what we'll do let's take this let's annotate this one, two, three, four, five and six so if we've picked up two we have picked up two now we're going to have our next buffer to be like this we're going to now have one, two, three and four so this next element is going to be added to our buffer and then we'll have this five, six, five, six right here and once we're done with this so once we pick up another random number let's say we pick up four, we pick up four here our next buffer will be one, three, five so we have one, three, the next five one, three, five and we repeat this and so on and so forth so that's how we work with this that said yeah we could also pick reshuffle each iteration so after each epoch we're going to reshuffle again so that's it we've had our trained data set shuffled and then we have again trained data set bashed so we could just add this right here actually we could say we've shuffled then we batch this by size 32 and then from here we could do prefetching so let's do some prefetching prefetch we're taking a documentation to see what it takes this argument to see again here we have a buffer size now we are told that this allows later elements to be prepared why the current element has been pre-processed or been processed and so if we have a data set like this so let's suppose that we have this three elements element one, elements two and elements three so currently we are actually training on this data or on this batch here we're actually training on this and generally when we train on this we have to train after training we have to load this data process it in case we need to process it and then so let's let's suppose that this is processing time so for our key this is training time after we process then after we train after we process this here so after we process that and then we train initially we even process this so initially we will have this so we process this one we train we process we train we process with train now what if instead of going to these steps we load, so yeah, we start by loading, we load. And then while we're training, we actually loading up our data. So we could be loading up our data while we're training. So this could be done. And then after training, we have this loaded up data already. So we just continue with the training. And then from here, again, while we're training here, we load it up. So loading up, suppose we load this up like this. And then from here, from this point, we train. So recall we have one, two, three train steps or train blocks. So at this point, we add our last block, so that's it. So from here, we see that we take less time to load and train our data as compared to this first method. So working with prefetching is very important. Now, this prefetching takes up a buffer size, which can be auto-tuned. So here you have tf.data.auto-tuned. In case you don't want to specify the buffer size, you could just simply allow tensorflow dynamically tune this buffer size for you. Now that's set, yeah, we have this. So we have prefetch and then tf.data.auto-tuned, that's fine. So that's our data, let's take this off. So we now have train data or train data set. Let's add this and let's run this actually. And then for T in train data set, we print out C. Now with, okay, we'll batch that. So let's print out C and then let's print out just the first element actually. Let's say for x, y, so we could print out x and y separately. So we have x, y, there's a first, that's it. You see, we're going to have this, like this, let's go up. Okay, so yeah, we see we have this right here. What's its shape? So we have 32 by eight, which is normal batch size by eight. And then we scroll down, scroll down. Yeah, we have 32 by one, which is normal. So that's very normal. We've bashed our data and we're ready to train this data. So let's just recopy this and do the same for the validation right here. We have a validation, val data set, and we have val data set. Don't forget to change this. So you have val, that's it. That's fine. We run this, our val data set. And now we go to the test. So right here, you could also do the same for the test. We have that, test, test, there we go, test, test. So that's what we need to do to convert this so that we could use it as the, or we could work with a TensorFlow data API. We don't do any modifications in the model. All we need to do here is come and specify, train data set. And then for validation, we have our val data set. So here we have val data set. So that's fine. You see training has started and everything seems okay. So we'll now go ahead and do this right here. So that's what we get. So it still continues dropping. Actually we could, from the previous training, we would have continued training. So we could have increased the number of epochs. That's fine. Let's look at this as well. And we could evaluate our model. Now note that this actually isn't very different from what we had previously as you may feel like this is producing a different kind of plot. But the model is in its current state already had a loss or was around, had a loss of about 30,000. So if we have to recompile our model, so right here, if we have to run this, recompile our model and feet, you get to see that it isn't much different from what we had previously. So this doesn't actually come to better our model or model performance actually, it comes to speed up the training. So you wouldn't expect to have better loss values with a tf.data, but instead you could attain this better performance faster. So let's run this again. And you get to see that it's kind of like similar to what we had already. So that's fine. We run this as well. You could always increase the number of epochs. You'll see that here, that's what we'll get. And so you could predict with this, fine. And now we could go ahead and view this on our bar chart. So we look at this similar to what we had already. So that's fine. We now master the basics of working with a data API. Though in subsequent sections, we'll look at even more interesting ways of working with this API. That's it. Hope you enjoyed this. Thank you for following up to this point. Don't forget to like, subscribe, and share. See you next time. Hello everyone, and welcome to this session. According to the World Health Organization, the estimated number of malaria deaths stood at 409,000 in the year 2019. In this section, we are going to build a machine learning model based on convolutional neural networks to diagnose malaria in a person based on cell information gotten from a microscope. In this section, we'll start with loading this data. After loading our data, we are going to visualize this data, process this data, build a model suited for this data, then train this model, and finally, evaluate and test our model. As usual, we'll start by defining the task, which in this case entails correctly classifying whether an input cell contains the malaria parasite or not. Then we'll go ahead to prepare our data. This data is going to be made available to us from the TensorFlow data sets. Then we'll build a model. The model or the particular model we'll be working with in this section will be the convolutional neural network. Then from here, we'll define the error function. We'll go ahead and train our convolutional neural network. We'll check out on some performance measurement metrics like accuracy, F1's core, precision, recall, and many others. Then we'll do validation and testing. And finally, we'll take as many corrective measures as we can. In essence, what we want to do is to build a model like this, which takes as input this segmented cell from a team blood smear, and say whether this segmented cell right here is parasitized or unparsitized. We're supposing you have no medical background, so we'll briefly look at how malaria diagnosis is related to those segmented blood cells. To start, we get infected by malaria once we are beaten by a mosquito. These mosquito bites usually lead to the passing of Plasmodium pacifarum parasite into our blood system. And so to diagnose whether a particular person has got the malaria or not, it's important to get that person's blood. Here you see how the medical practitioner has to select the finger to puncture, usually the third or fourth finger. Then to obtain this blood, you puncture the side of the ball of the finger. In the case the blood doesn't well up, you would have to generally squeeze the finger so you can obtain the blood. Then always grab the slide by its edge. So to control the size of the blood, drop on the slide, touch the finger to the slide from below. From that cdc.gov website, we'll get to this microbe notes website where we'll get more colored images. Now we've obtained the patient's blood or the person's blood and there are two possibilities. One is getting a thin smear like this and one is getting a thick smear. In our case, our data set is gotten from a thin smear like this. So here we have this thin smear, which when passed on a microscope, produces images like this. So here we have thin blood film and we have thick blood film. From this point, we now segment the cells. The cells are now segmented and that's how we obtain an image like this or a segmented cell image like this, which now could be used by a model to predict whether that patient has got the malaria parasite or not. It should also be noted that we're dealing with a classification problem since our output can only take two discrete values. This type of classification problem is known as binary classification. Since throughout the section, we'll be dealing with image data, it's important we understand how image data is represented. So here we have this image of this bird and then when you zoom in, this is what you get. If you zoom in again, you should have this. So this tells us that this image right here is formed by combining all these little boxes we have right here. And these boxes are known as pixels. That said, here we could localize this pixel. We have this pixel, this one, this one, and so on and so forth. So basically, this image is made of all these little boxes which we call pixels. If now we take this image from our dataset, you could see that, notice from here, we have 86 by 82 PX, 86 by 82 pixel image, meaning that we have 86. Notice how here we go up to 86. So at this point, we have 86 and the width. So we have 86 of these boxes. If you have to go from this point to this other point right here, we would go through 86 of these pixels. And then if you have to go from this point to this one right here, you have 82. Now check out what is displayed at this position. So as we move, you should be able to see what's displayed. You see that at this point, for example, we had position 85, 34. So we've gone 85 steps horizontally and then 34 steps vertically. We're considering our origin to be at this top left corner right here. So we have 85 steps and then 34 steps in this direction. And then you should be able to localize these pixels right here. So you could notice how all those pixels now, when combined, will be able to form this image. To reduce this, you see that it becomes less evident to notice those pixels. So that's it. It should also be noted that each of those pixels contain values ranging from zero to 255. So here we have values ranging between zero and 255. And for each and every pixel, we have three different components, the red, the green, and the blue. So if we break this image here into these three different components, we would have something like this. Now note that these values are actually normalized. So basically what they've done here is they've taken all these values and then they've divided by 255. So if you want to get the un-normalized values, you should be able to take this and multiply by 255 to obtain the original values. That said, notice how at this point here, this position, we have this. At this position, we have this and this. And so we could represent image data in terms of the height, the width, and the number of channels, which is in this case equal to three. So we have height, that is the shape of our image tensor, height by width by three. And then another common format is the grayscale format. With the grayscale format, we, this number of channels equal one. We have just one channel, and it could be represented as a 2D tensor. Another interesting point to notice, though we've said for each position or for each pixel, we have a given value per each per component. We have to note that all these values fall between black and white. That is for black, if you want to get a black value, that the pixel value will be zero. And then for white, the pixel value will be 255. Obviously, when we normalize these values, here we're going from zero up to one. So that's it. In this data section, we'll make use of a dataset contained in the TensorFlow datasets module right here. So here, you just have to pick your problem and you will have a whole lot of datasets available to you. In our case, we're dealing with image classification. So we select this, we scroll down and we should get malaria. So here we have malaria. So we double click on that and here we go. We have our malaria dataset. Now you could visualize your dataset and you could get some information of all description about your dataset. The malaria dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells from the teen blood smears like images of segmented cells. So here we are going to explore data. You could come right here, draw the border. So that's why you have this border and then group by level. Notice how we have the parasitized level and the uninfected level and both have equal number of images. So in fact, what the scientists have done to get at this dataset is they have gotten this segmented blood cells from tested malaria patients and this cells right here from tested non-malaria patients. Let's click on this. You see, we have our cells. There we go. You could always visualize all the cells here and then let's get more information about the image content metadata aspect ratios. You see, there we go. We have the format, this is PNG, mega pixel resolution, mode, RGB mode. We're going to look at this shortly. Pixel height, 112, pixel width, 91. Also note how not all the cells have exactly the same pixel height and pixel width. So they all have different pixel heights and pixel width. We have other information about a dataset like the homepage, source code, versions, download size, dataset size, autocached, splits. In this case, we just have a train split. So it's left on us to split the dataset into train, validation and testing sets. We have the features here. We see we have an input image and we have the output level. Note how the number of classes here equal two. Now we could look at some examples. And finally, we have the authors. In order to load this data so we could make use of it, we're going to make use of tensorflow.dataset.load method right here. You see that this method takes several arguments and this here have default values, but we need to specify the string. So we need to specify the name, which in this case is malaria. This also has some outputs. So if you get this or if you succeed to load the dataset, then you should be able to get this output. Notice now how this integrates with a tensorflow data API, which we talked of in the previous section and which is an efficient way of dealing with tensorflow datasets. So here, all you need to do is to load the data and you should have your tensorflow dataset with the dataset information. Let's now dive into the code. We start by importing tensorflow, numpy, map.lib, and we shall import tensorflow datasets as TFDS. So we import out tensorflow datasets, that's fine. And we could go straight away into loading this dataset. Here we have tensorflow.load and we specify the name malaria. So once you specify this, you should obtain your dataset. Recall we have our dataset, let's say dataset, and we have dataset info. So that's it. Let's run this. You see, we downloading this dataset. After loading the dataset, we obtain an error. So let's go ahead and check in the documentation. By default, this with info information right here is false. So trying to get this dataset information right here, we'll try an error. Let's now add this with info, right here with info. We set that to true and we run again. So you see that now all is well, we have our dataset. We have our dataset information. Wish we could check out right here. So we have dataset and then we have dataset info. That's it. Clearly we see prefetched dataset. We have the image and we have the level. We have for data dataset, we are going to print out our data. And then after this, we just take a break. We could also do for data in dataset.take. So just to take one element of our dataset instead of doing this, let's take this off. So let's run that now. We have an error, we told the object has no attribute take and that's because let's run the dataset again. We run that and you see, we actually have this dictionary made up the train, which is our dataset and then we have the types. So let's specify we want to get just the train right here. We have dataset train and we will run that again. So now we have our data. You see, we have this 103 by 103 by three. That's our image. And then if we scroll down, you see we have the level. So we see it's one. Let's scroll back and let's say we take two values. So let's take even four values and see. So we have those four values. You see, let's scroll down. You see here we have one. Scrolling, we have this other one, zero, different levels, different images. And here we have one and here we have one. So that's it. We see how we obtain the images and the corresponding levels. We now pass in more arguments like the shuffle files as supervised and the split. Here we have more elaborate explanation of how we can work with a split. You see, we want to split our dataset and so say train and test. You could do this, you could simply pass this and all you could specify the percentage of the training set on the whole dataset and could specify the percentage of the test set. So there we go. We simply add as supervised. There we go. True shuffle files. There we go. True. And then finally we specify the split. So we have our split. That's it. We are going to pass this out. We have the train test. So we could just simply do train test. Let's take this off. So for now we have the split, train test and let's run this. We obtain this error telling us that there's an unknown split test and that the splits we have here should be part of this list train. Now, actually in other datasets available in TensorFlow datasets, we already have the train test split. And in those cases, we could make use of this kind of split. Now we only have the train. So here it's needless even putting the split or you could just, let's just take this off. Let's take this off now. And you'll see this should work fine. So that's it. So we have our training set. We'll split this separately. And since we've shuffled the files, now if we run this, we'll notice how the shape is going to change because our files are now shuffled. So let's go up, scroll this up. There we go. And you see, we have different files. Whereas when we don't shuffle or when we set the shuffling to false, you will always have the same image at the same position. So that's it. Info specified and that's fine. In order for us to create this year, that's to create the train validation and test sets. We are going to make use of the take method and the skip method. So let's take this example. We have this little dataset, we'll create it. And we want to print out the dataset after skipping seven values. So here, you see, if we skip seven values, we have this. Now, if we take six values, you see we have those first, rather your take. So let's put your take. We have those first values. So let's do this. At every level, we print out dataset.s non-py iterator. So we print out a dataset. We have dataset take, we take the first values. Let's define this percentage. All the ratio, we have the train ratio is equals say 0.8. Val ratio equals 0.2, no 0.1. And the test ratio equals 0.1. There we go. We have the train ratio, validation ratio, and test ratio. Now we are going to pick this up. So we have the length of our dataset. We could, let's comment this part and then print out this length. There we go. We have length of dataset. And as you could see, we have 10 elements. So when we run this, you have this element here and our length equals 10. Now that's set. To obtain the train dataset, all we need to do is simply take the first 80% of our dataset. So here we're going to have this, let's say train dataset. And then we take 0.8 times the length of, let's define this. Let's say we have length, all dataset size. Dataset size. There we go. So we have defined our dataset size. We have that, dataset size. That's fine. Now that we have this, we print out the string dataset. So you could see here how we are not able to convert this. So let's, cannot convert 8 to eager tensor of type int64. Right here, we're supposed to pass in an int. So let's have that. There we go. Okay, so we see that we have the first eight elements right here. We have this, this, this, eight. Now we have gotten this 80%. Let's take this, let's say we take this to 60. So let's reduce that. And you have this 60% right here. Let's modify this here. So let's now have the train ratio. So here we have train ratio. We run that. And there we go. We have the top 60% and if we get back to 80, we have top 80% or rather the first 80%. Now we have the first 60%. So we have this first six elements right here. And we're left with this other four elements. Now let's go ahead and see how we could get this out of four elements. To obtain this, all we need to do is to use this keep. So we're going to have that and define our validation. So we have your vowel. Now, instead of taking the first six elements, we're going to skip this first six elements. So we could start getting this other elements right here. So here we have skip. And then we specify that we're skipping this first six elements. So here we're going to have six. So that's it. And we print this out. Let's have this vowel right here. So that's it. We have our validation. You see, we get this first six elements. But recall that our validation is in fact this two elements right here. So what we're going to do again is after skipping and getting this last elements, which essentially is made of the validation and training set, we are now going to take out, take the first two elements which correspond to the validation set. So right here we have vowel data set is now equal. Notice how instead of data set as previously, now we're dealing with a vowel data set. So we have vowel data set dot take. So here we have a vowel data set of take. What are we taking? We're taking this first two elements, which could be obtained by having this int of vowel. So where we have vowel ratio times our data set size. So times our data set size, we have the vowel data set. And now let's run this. There we go. You see, we have now six, seven. Now in order for us to get this next, all we need to do is to skip. So we've gotten this as validation data set. Let's call this vowel test, actually. Let's say this is our vowel test and here we have our vowel test. So we get the vowel test, which is this. So up in the validation, we take this first two and then to obtain the test, we skip this first two. So right here, we are gonna copy this again and then paste this out. We have the test, which takes the vowel test and then takes the vowel test. But the difference is, yeah, we have the skip. So we run this and there we go. It's not exactly what we expect. We should change this. Yeah, we have test. So that's it. Now we've gotten all the, we've gotten the train set. We've gotten the validation set and we've gotten the test set. We could play around with these values. Let's suppose we have no validation. So we have that, you see that we have this empty list and then all this is used for the test. Now we could create a function from there. We'll call this function the splits function. So we have our splits function. What it takes in is the ratios. So we have the train, the train ratio, the validation ratio. We have the test ratio. That's it. And what does it do? Simply does exactly what we've just done right here. So let's copy this out. Let's cut down and then have it here. Yeah, we're taking the data set too. So we have our data set and that's it. Now we need to define our data set again. So we just have this data set and let's take this off. So we cut that off. Let's put this here. And we have our data set size from the data set. We define a train data set, no quinting again. We have the validation. We have that, take this off. We have the test and then we go ahead and return. So we return our train data set, our validation data set and our test data set. So that's it. So we've returned all this and let's send this up. So we have this method right here we've built. And then from this, we have our data set. We're going to pass this in here. So let's say we want to have vow. So we have vow data set. All right, a train data set. We have vow data set. We have test data set. And this equals split of data set and train, train ratio, vowel ratio and test ratio. So that's it. So we pass all those in and then now we could from here print out. So here we could print out our train data set, for example. So we have this train data set. We could print out our vowel data set. And then we could print out our test data set. So we get this right. That's it. We run it and we told split is not defined. This actually splits. So let's have that S run again. And that's fine. There we go. We have our three data sets. So we have our train, we have our validation and then we have our test data set. In here, we could always have this. So we could see a list as NumPy iterator. So yeah, we have this. There's a list. So we have that and that's it. So there we go. We have the train. We could do the same for the validation and the testing. This is where we now obtain. We can now test out the splitting with our own data set. So we'll comment this part and go with a data set. We will load this, here we go. We've loaded that. And then we shall pass this data set now in here. So we should have this. Let's rerun this, run this and that's fine. So now we have this error list object has no attribute take. So better understand this error. Let's run the cell right here. You see this data set is actually made up of a list. This list is made up of the data set and the types. So when we doing this, we actually taking the data, we're taking all this list.take. So what we do is we should pick out just the data set. And to do that, all we need to do is specify here that we are picking out just the data set. So that said, we have picked out a data set. We could now run this. Before running this, let's make sure we take out just an element because running that full data set is going to be very time consuming. So let's take this one and pick this out right here. So that's fine. Now we will run this and wait a little. Now, as you could see, you have this right here. That's the image and the label given to us. Notice how we have this empty list because we've set here that we want our validation ratio to be equal to zero. So since we have this at zero, we have the empty list for the validation. So our validation is actually empty. Now you could modify this, let's take 0.1, 0.1 and right here is 0.8, we run that. And here's the response we get. So that's fine. We have all the, we have the validation training test set, which are known empty lists. We'll now get into visualizing our data set, visualizing some elements in our data set. So we have for I image. So here we're picking out the image and the level and image rates. And we have the train data set. You could as well pick out the validation or the testing data set. So we have that. Let's take say 16 elements. And then for all this, we'll come up with subplots. And here we pick out four by four because we are having to plot out 16 different images. At this point, we do plots.imshow of our image. So that's our image we're plotting out and then we'll run this. Here's what we get. This is actually what we expect. And now what we could do is by the side of every image, we could put out a level. For each and every image, we'll put out a title. We have the title and what we'll do is we have data set info because you're interested in getting the corresponding level name. So we have say a level zero, want to get his name and we could have a level one, want to get his particular name. Here we have this and we have the features. To understand this better, you could check out this data set info right here. You notice that we have this features and then we'll pick out the level. That said, we are just gonna go straight away into this. We have level and then we convert this into a string. So we have the string and then we pass on the level. Now we pass it on this level we've got right here for the corresponding image. And then once we have this right, we run it and we take this off. So that should be fine now. We have this images and the corresponding levels. You can see how this is pasteurized. This, for example, is uninfected, but this isn't very clear. So let's take off this, put that axis. We set this to off, run that. And that looks better, but we could make this better. If you wanna know which of these is the level zero and which is the level one, let's take this from here. And then we print out to zero. You see that this level zero, it's given to us as pasteurized and one is uninfected. So that's it. At this point, we're gonna dive into data processing. Our data processing unit from now will be made of two parts. The first part will be the resizing part. So if we have an input image of say 102 by 102 as width and height, we could have this. We're gonna transform this into an image of a fixed width and fixed height. Now that said, all our images, irrespective of their width and heights, will now have just one width and one height. In this case, we'll consider an image size of 224. So our width and height will be converted to 224. And subsequently, we'll see why we are picking this image size right here. After this resizing, our next part will be on normalization. So we'll have an input image which will normalize such that all the data falls in a given range. Here we have the standardization process and we have the normalization process. In this case, we are gonna use a normalization process and we're gonna explain why. In the previous example on the car price prediction, we actually work with standardization. In standardization, each value is subtracted from the mean and then divided by the standard deviation. That's actually each value on each and every column of X right here is being standardized based on its mean and the standard deviation. As you'll notice, these values, for example, are normally distributed. That is, we have a mean value on average value and we have a standard deviation or range of values where most of times our values will fall in. So it's gonna look bell shaped like this as we've seen before. We have the mean and then we have a certain standard deviation right here. And that's why if you look at X2, you wouldn't have say 5,600, 7,100 and then here, for example, come and have a value like say 12, this isn't very typical since most of the times these values fall under a given range and there's a certain average value for which most other values just fall around. So that's why it's typical to have these kinds of values. Let's take this off. For the next, you'll notice how these values fall under a given kind of range and with this too. Now, this is for when we deal with standardization and that's why previously we used this actually. Now, in the case of image data, the choice of whether to standardize or to normalize will depend on the kind of data we're dealing with. So if we have images where most of its pixels revolve around a particular mean value, then we'll wanna standardize and if this image is made of pixels where their values are mostly different from one another, then we would want to normalize. Now, in our case, as we continue in this section, we are gonna go with normalization, that is we will have X minus X mean, which is zero, divided by X max, which is 255, minus X mean, which is zero. So simply do X divided by 255. So we'll normalize the inputs before passing them into our model. Now, it should be noted that for some other datasets like, for example, ImageNet or ImageNet, we have ImageNet dataset, or the RDE20K image segmentation dataset, they have their known mean and standard deviation values. Nonetheless, for whatever problem you have to deal with, you may work with standardization or normalization, experiment for yourself and see which one works better. Now that said, since we're dealing with a TensorFlow data API, we are gonna use this map method to help us in this pre-processing. So there we go, we have trained dataset. We'll start first with the resizing. Trained dataset equals trained dataset.map, and then we'll call this resizing method, which we shall define. Let's add this and put this up here. So we have our resizing method. This method takes in the image and the level. So we have our image and the level. And the level right here. But note that we're doing processing on only the image. So we're gonna pass this in, and then we're gonna resize this using the resize method, which comes with TensorFlow images right here. So we have TensorFlow image, and then we have this resize method. Here you could see the arguments which are passed. For now we'll focus on the image and the size. We will also note that you could change the method using resizing. But for now we use this default values given to us. So that's it. We are gonna return the image. So let's have our TensorFlow image dot resize, and it takes in our image. And then we have the image size. So we'll have in size by in size. Note that we are gonna define the in size here. So we'll have in size equal to 124. So we have this resizing. Our image is returned to this, 224 by 24, 24 of a shaped image. Now we have resized this. We just put out a level. So here again, we just basically taking the image, resizing it, and then the level remains the same. So we could run this, and then we have our trained data set, which now has been resized. That's it. We could say for data in train data set, we're gonna print out, oh, let's take one value. So yeah, we take just one value, and we have our, let's say image, actually image and the level. We print out, we print out the image and the level. Let's run that, and there we go. You'll notice that the shape of this image is gonna be 224 by 224, irrespective of whichever, or which image we pick, and then we have its level, which remains unchanged. Notice how this is a plot 32. Unlike previously where we had an unsigned int. Now you could always do casting right here to modify this depending on what you're working on. So that said, we're done with the resizing. We could now look at reskilling. So let's put this as resizing and reskilling. So we have resizing, resize, let's just say resize, reskill. So our resizing reskill function is such that after resizing, we reskill by dividing by 255. So we divide all our values by 255, our level remains the same. We run this, we run that, here is resize, reskill. There we go, we run that, and then we have this. So that's fine. Just as we did before, that was in the previous section, we're gonna shuffle our data, we're gonna put this in batches, and then we'll do prefetching. So all this was explained in the previous section. We are now ready to build our model, and up to now neural networks have been performing quite well. We had seen previously that if we have three neurons, that's one, two, three in the input, and in the output we have three neurons too, then there'll be nine different connections here, and hence nine different weights plus the bias. But for this example, let's consider only the weights. So we have nine different parameters for three inputs, three outputs. And if here we have five, considering only the weights, we would have three times five, that's 15 different parameters. Now, if we're dealing with an image like this one, so let's consider this image, which is 224 by 224 by three image, three for the three different red, green, and blue channels. So we have this input image now. So unlike previous examples, where we would have features, or specifically in this case input features, we do not contain as many elements as this. We now have this case where if we want to count the number of features or the number of pixels, we would have 150,528 different input features to take into consideration. Hence, instead of having a total number of three right here, we are going to have 150,528 different values. And if we want to do the same competition which got us this number of parameters, we would have 150,000 times three, that's approximately 450,000 different parameters. Now, what if we modify this number of neurons right here and take say 1,000 neurons in this output right here? We would see that we'll move from 150,000 or rather from 450,000 to 150 million different parameters where each and every parameter has to be trained and optimized. This becomes clear that deep neural networks like this are better still than layers of fully connected layers aren't scalable. Since when we increase the number of features, the total number of parameters also increase considerably. Hence, we need to build a type of layer unlike this one where each neuron isn't connected to all the previous neurons. And this layer happens to be the convolutional layer. In order to better visualize this, we'll use this demo platform from Wrightson University. So here we'll put in a figure, let's say four, and then we'll see exactly how this confnet work. So we have this and then we have this input right here. So we get this input, we have some weights and then this output features. Note how to obtain this particular pixel right here, only a few of these inputs are used. And so yeah, we call this the receptive field. As you could see, we have the receptive field right here. And if we take this other example, you'll see it's on receptive field. You see if to get this value, only these four values you see below, actually this one, this, this and this, as you could see here, let's scroll back. Okay, so only those four values have a role to play when it comes to giving us the corresponding value we have here. And so unlike with a dense layer, where to obtain this value, we needed to link this to each and every previous neuron. Now only just some neurons in this neurons receptive field play a role in getting this value. Another great tool for better understanding the convolutional layer is this CNN explainer. Actually CNN stands for convolutional neural networks. This is created by J. Wang, Robert, Omar, Park, Das, Frank, Kang and Polo. We are really grateful for this tool. Before getting back to the explainer, let's take this example, right? Here we have a four by four image. So we have 16 different pixels. If we flatten out those pixels, that's if we put those pixels out like this, such that we have those inputs. And then in the output, we have four different neurons, then here would have 16 by four connections that will have 64 different parameters, excluding the bias, but with the conv layer, we could leave from this four by four to just a two by two. So we could leave from this four by four to this two by two with just nine parameters. So we would have what we call a kernel right here or filter. So we have this filter, which is three by three, which actually corresponds to our weights, which we've seen already. This kernel here, kernel size three, will produce this output of two by two, which when we flatten out can give us this output right here. And so instead of working with 64 parameters, we're working with nine parameters. If we want to replicate that same example in the CNN explainer, here's what we get. We have this input right here, which produces this output. Now, notice how we specify that a kernel size equal to three. And because a kernel size equal to three, we are able to get this output. But how is this output gotten? You would see that at this top left corner, we are going to feed our kernel, which is of size three by three. So we put in our kernel right here, and then we take each and every value of our kernel and multiply it with a corresponding value in the input. To obtain the first value, you see how we pass this kernel on the input. So at this top left position, we pass this kernel. Notice how we have a three by three kernel, which is passed on this input. And we have the output right here. Then to get the next position, to get this next output, you see how this kernel is passed on this next part of the input. And the way we get this next part is by simply sliding through the image. You see how we're left from this, we slide that through the image and we got this, we got this next output. And since we've gotten to the end, we move to the next position, which is this, we get this next value right here. And then from here, we slide again to this end, and we finally get this value. So that's how we get all this outputs from the inputs and the kernel or the filter. We can now go ahead and increase this input size. Let's take, for example, seven. You see, we have a input size of seven, seven by seven input image. And then we have this output five by five. The reason why we have this five by five is because we have this kernel size of three. If we get to increase the kernel size of five, you see our output reduces. You see, when the kernel size gets to six, our output reduces. Let's take this back to three. So you take back to three, and this input of seven by seven gives an output of five by five. We see how when we get to this, we have that output, move, slide, slide, slide, slide, move to this next here, slide, slide, slide, and so on and so forth, right up to the end. So from here, we see how the size of the receptive field of each of these outputs equal to three, which is actually our kernel size. The next thing to notice is reducing the kernel size permits us extract more features from the inputs. You would see that since we have this input seven by seven, with kernel size of three, we have the output five by five. So we've extracted much more features from this input as compared to when we push a kernel size of six. Here we extract less features from this inputs. Now, though using a smaller kernel size permits us extract much more complex information or complex features from the inputs, working with larger kernels permit us extract larger input features. One logical question which may come to your mind is, how do I get the size of this output feature map right here? To get this, we'll use this formula where this output width is equals the input width minus the filter size plus one. If we take this example, we have an input width of seven minus a filter size of three plus one, this gives us five. So that's how we obtain this right here. Now, if we take, say, kernel size, let's take the kernel size to be four, for example. In that case, we have seven minus the kernel size four, which is three plus one, which gives us an output of four. And that's how we obtain this output right here. Now, in a case where you're designing a convolutional layer and you want to get a particular feature map size, say, for example, in this case, if you want to get an output size of three, then all you need to do is specify this kernel size to be equal to five and you should get this output. Now, in some cases, you may want to have a particular feature map shape that using just the kernel size, you wouldn't be able to get that. And so to match up with a particular output shape would include a pattern. Let's look at this pattern. You see, we have a seven by seven. And when we take this to one, you see that we go from seven by seven to now nine by nine. Notice how it's written here. After pattern, we have nine by nine. So we'll leave from this seven by seven. Let's take this in this box right here, this internal box. And then the pattern is also nine by nine. Notice how after doing the pattern, and if we pair around with our kernel size, we go from one, one to eight, eight. So we go from one, one to eight, eight. But with zero pattern and the input size of seven, you see, we go two, three, four, and six. So we cannot go up to eight, eight. We're gonna have an output of eight, eight when the pattern is zero. Whereas when we increase the pattern by one here, increase the pattern by one, that's one. You see that we could go from, we could get an output of say, let's put a count to three. We could get an output of seven, of eight, you see, up to one. So that's it. Now, or we understand how this pattern works. Basically, we just have our input and then we add this surrounding elements. Right here, we just add our input image. Now, the pattern generally we use is the zero pattern, though there are other pattern methods, but the zero pattern is one of the most common since it's easy to use and it's computationally less expensive to work with. So that said, here's our pattern. And then we also have another advantage of working with a pattern, which is that of ensuring that the corner pixels have an influence on our output features which are generated. Now let's take this back to zero pattern. We have zero pattern and we have this input image right here. If we're doing an image in which most of the information or most of the relevant information is centered, then there is actually no issue since this photo right here will go through each and every pixel we have where we have this person and the image. Now, if we modify this image such that we have a person's, let's suppose we have a person's face here just at the corner of the image. You'll see that unlike with this image where this person was centered and that our kennel was able, our filter was able to pass through each and every part of this image. With this one, we have a different scenario. We've just modified this so it looks similar to what we had here. Now, what we could do is if we monitor the number of times this filter goes through the person's head, we would see that we'll have here one time because our filter passes here once. The next time after the sliding, we'll have this. So second time, the next time this other sliding will lead to this four times. So at least it passes through the head and then here we have five times. Now we have one, two, three, four times actually. But in this case, since we are on the borders, you'll see we have this one time and that's practically all. So we see that in this example or in this case where we are on the borders, this influences the outputs in a smaller way or exerts less influence on the values we get in this feature maps which are generated. So in an example where this, let's suppose an example where all this wasn't there and that where our information lies most is this, you will find that it would have been better to at least pass through this head region just as we did with this year where we pass through the head region four times and we're able to extract very useful information from this image because our filter goes through this many more times. Now to remedy this situation, we have the pattern. You see that when we increase the pattern, so let's take this to one, we've increased the pattern, let's increase the pattern to, oh, okay, yeah, let's fix to just one. Anyway, we just increase the pattern to one and then we'll retake that example. Now we're taking the example, we'll see that if we maintain our filter size of two, this filter here will at least touch the head although slightly and then the second one does this twice. So in this case, our filter touches the head twice as compared to previously where it touches the head only once and hence this useful information has more influence on the output features which are being generated right here and which is very important as practically that's what we're trying to do. We're trying to extract information from this input and pass to the output. Another hyper parameter which we could look at is a stride. Now note that for now we've looked at this one this and this. So here's what we call the hyper parameters. We have the stride and we'll understand surely how this works. For now we'll deal with a stride of one. Why one? Simply because you just slide in through one step we're going to the next. So you just slide one, one, one and so on and so forth. Now if we move this to two, we start from this position right here, fix that. You start from this position. There's a problem. You get back to one and back to two. You notice that as we go to one this turns blue and then to two it turns red. The reason why it turns red is because there is no valid output or there is no whole number which with a stride of two can produce an output. So in fact what we're going to do is we modify this input right here. So modify this input, this works now. So it's possible for us to leave from this input of six to this output, but with a stride of two. So let's now understand how strides work. We start with this. Notice now how we're going to skip two steps instead of just one step. So as we go, you see we skip two steps. Notice two, two and that's it. So we move to the next. And now even moving downward, you see, it's not like with a stride one where we just had one step, like with a stride one we just did this one step, but with a stride two we're going two steps below. So that's it. Now that said, increasing the size of our stride actually reduces the size of our output and hence reduces the amount of information we extract from the inputs. And so in general, we get better results by working with smaller kernels and smaller stride values since we're able to extract more information from our inputs. Generally the kernel size is used to read the kernel size of three is generally used in practice and a stride of one. So we may decide to use pattern or not. And the new formula when we include the pattern and the stride is given as such. So we have the output equal to input size minus the field of size plus two times the pattern divided by the stride plus one. In this case, if we have an input size of six, so here we have six, we have six minus field of size three plus two times the pattern, also our pattern one so it's plus two, divide that by the stride, stride is one and plus one. There we go, we have an answer of three plus two, which is five and then plus one, which is six. So that's how we obtain this output size right. Also note that one good thing when working with a library like TensorFlow is when you don't know the exact pattern to use such that you have particular output size, you could specify the pattern to be valid. Once you specify the pattern to be valid, TensorFlow automatically calculates the pattern for you such that the output you want matches up. Up to this point, we've been supposing that our input image is two dimensional, that is we have a height and a width. Now what if we use the kinds of images we have in real life, that is 3D images where we have the red channel, the green channel and the blue channel. So if we have this RGB image right here, RGB image, we'll see how we get the output. Now the way this is done is quite straightforward. So what we have is in this case, we include this pattern. So you see the zero pattern, we've included this pattern and then we have our kennel right here. Now notice how we have this kennel and then it goes through this first part or this top left corner. And what we do is we multiply each value right here. We have negative one times zero plus negative one times zero plus negative one times zero and so on and so forth up to plus negative one times zero right here, you have negative one times four. So basically what we're doing here is kind of like a dot product. So we're taking all these elements, multiplying by the corresponding elements and then adding all this up. Once all this added up, we should have 41. So you could take this as a simple exercise and then you should be able to get 41. And then we have this other kennel right here where we repeat the same process. We have this and then we have this kennel, we have this. So here what we do is we have, we take this, we obtain this value, take this obtain this value, this obtain this value and then we add all this up to get this answer 41. Now note that we also have the bias which we've already included. So for now we have this 41. Now we move to the next step. So we slide through next step. Yeah, the stride is equal one actually. From here we do the same process, negative one times zero, this negative one times zero, add it up and all of that. So all that added up, all this added up, all this added up, we have 12. We repeat the same process right up to the end. So that's how we obtain our output right here. You can also check out on this other more visually appealing example in the CNN Explorer website. So here we double click on this and then we see exactly what's going on. We have here, you see how we're forming this outputs by sliding through the kennels over the inputs. So here I click on this, you see a slide through this and you see how all these values are multiplied and then added together to form the output. So that's it. From this point, we move to this explained visually project by Setoza, filtering is of course part of image processing. And since we're dealing with image day to year, it's important for us to better understand how this works. So right here, we'll choose a kernel or filter, or let's pick this sharpen filter right here or this sharpen kennel. Notice how once we pick a kernel, those values change, that's the values of the filters actually change. Let's take this outline, you notice the change. Now let's keep this outline and then we'll notice that every time we, let's hover over this input image, we have the values right here, that's to the right. So for now we have this and then let's notice that here we have this negative one, that's negative one, times at this position, notice at this position, we have the value 255. So 255 times negative one plus 255 times negative one plus unknown value, unknown year, because we are the borders, so there's no value. And then we have 249 times negative one, 255 times eight, 233 times negative one, 255 times negative one, and then all this sums up to give us a given value. Since we have this unknown values, year is unknown, so we have no value there. But if we change this position and then we fit around the I region, you see we have a value of negative 533. So we notice that no matter the image we put in year, we'll always get an output where the outlines are being highlighted. So if you check out this image, which we've just imported of Elon Musk, you see that the output is this image right here, which highlights the edges. And so as it's explained here, the highlight large differences in pixel values, which generally occur at the edges. So around those regions, you see these large differences are being outlined as compared to this zone year where there's no difference. And since there is no difference, we just have this black region. So here again, you see that we have the filter, that's it. This is the exact same value we have right here. We have the filter and then this example shows us how this output is gotten. Now the major difference between what we're doing right here and the convolutional neural networks is that with this, we know this can of values. So we know that we have this metrics negative one, negative one, negative one, negative one, eight, negative one, negative one, negative one, negative one, which is an edge detector. So we know this and then we know the obvious output. But with the convolutional neural network or with convolution layer to be specific, what we do is initially we just initialize these values and we let the model doing training to learn these values automatically. So these values are learned by the model during training automatically. One of the very first convolutional neural networks or convnets was built by Yann LeCun in 1989. And right here we have the structure of this convnet known as the LeNet. This LeNet takes in an image. Here we have a 28 by 28 by one image. Here we just have one channel, black and white image. And then we pass this to a convolutional layer, which we've just seen after passing through convolutional layer, we have the sigmoid. But also note that here we have a five by five kernel. We have a plus two pattern. So if it's zero pattern, we'll add zeros around our input and then add another zero or another group of zeros around our features. Then from here, we have this output 28 by 28 by six. So shortly we'll see exactly how this output is gotten. So we've seen we have the sigmoid activation. From here we have the pooling layer. Now this is a sub-sampling layer and we'll understand how it works. So we have two by two average pooling, kernel and with a stride of two. From this we have another convolutional layer with five by five kernel. Here is no pattern and we have this output. Activation, pooling, flattening, with flattening, all the features have been modified. So we'll leave from this 3D tensor to a 1D tensor. And then we have this dense fully connected layer followed by sigmoid followed by another dense fully connected layer followed by sigmoid. And finally, we have this dense layer with output 10 neurons. And the exact reason why we have this output of 10 classes or 10 neurons since we have 10 classes is because we were predicting whether an input is either a one. So those inputs are images of handwritten digits. So we want to predict whether the handwriting digit is a one or a two or three up to nine. And so here we have this and then we also have a zero so that makes 10. So zero to nine gives us 10 possibilities and that's why we have 10 different classes right here. Now for the AlexNet, it was built to correctly classify whether an input image belongs to one of a thousand classes in the imageNet data set. So here we have this AlexNet with a different architecture as we could see right here. And now we'll go ahead and understand how all those outputs are gotten. And so we'll be rebuilding the Lynette architecture but this time around considering an input 64 by 64 by three. So right here we have the RGB channels. We have R, G and B. If we pass this input now through this convolutional layer we are going to have this output. And how do we get this output? To get this output, we have to take into consideration this parameters which have been given to us for a convolutional layer. That said, the photo size is equal to five. So we have here photo size of five, there is no pattern, stride equal one and then the number of filters equals six. We'll calculate the number of parameters shortly. Anyway, these are the four most important parameters. So here we have this and this is what we call our filter. We have this filter. You see that we have the dot products which are completed to get the outputs as usual. And so we take this, like this R, we have this. Then we follow through with this and then we follow with this. So we take this and then complete the dot products and with the other two. And then from this, we add all this up to obtain each and every value for this very first feature map right here. So to obtain this feature map, we are using this filter. Now, this shows us clearly that when we specify that the number of filters equals six, it doesn't mean we actually have just six of this as you may feel like number of filters equals six means we have six filters stack like this. Here we have five and then six. So we feel like we have just six stack like this which is not actually the case. What happens is we have three, there's these three channels and then each of our filters also have the three channels as you could see. And so what we call a filter here is this three or rather is this five by five by two B kernel right here. Now we've done a competition and we've had for this, we repeat the same process. So we take this for the R, we repeat this, we repeat this and then we obtain this next. We do all this the same way up to this position and then we have this right here. And that's why when we have six filters, we always gonna have six channels. Then if you do five by five by three and all that times six. So if you have five by five times three, that's 75. 75 times six should give you a total of 450. And since for each of those filters, we add the bias. That is we have an extra bias for each of these filters. We have one, two, three, four, five, six biases. So adding this plus six, we have 456 parameters to be trained. So here we have 450 weights and six biases. That said, we understood why we have the six channels. Now, how do we obtain the 60 by 60? The way we obtain the 60 by 60 is by applying this formula, which we've seen already. So here we just have the output equal 64 photosize five. So we have minus five, add in no pattern. So pattern is zero, stride one. So we have 64 minus five and then plus one. This gives us 64 minus four, which is equal to 60. And that's how we have this output right here. From here, we move to the subsampling pooling layer. For the pooling layer, we have these two parameters. That is we have the number of filters and then already we have the filter size, not the number of filters, the filter size. And then we have the number of strides. To obtain the dimension for the pooling, the formula is slightly different, obviously, because here we don't have the pattern. So we have X minus F divided by S plus one. And then we'll leave from this to this feature map right here. Notice how we still maintain the number of channels, but our input feature map has been subsampled. For the particular case of max pooling, if you want to understand how this works, let's say we are at this position. Let's take this position. Let's take a position where we have some values. Notice how as we pick this value, we see the max is 0.02. So you should look at this. So we see here the max is 0.08 and so on and so forth. So basically what we're doing is we're just simply sliding through the whole image. And then for every, since our kennel size is equal to two, we have a two by two kennel size here. So for every four values, we are gonna pick out one to represent them. And in this case, this one is the max. In some cases, we will instead take the average of this. That is known as average pooling. But for this max pooling, which is the most commonly used, we take the max of all these different values. That said, to obtain this, we have X. That is in this case, we have X equals 60. So let's take this off. X equals 60 minus F. F equals two divided by the stride. Our stride is two and then plus one. So 60 minus two, 58 divided by two, 29 plus one gives us 30. That's how we get this. We still maintain a number of channels. Here's another example showing even more clearly how this max pooling works or pooling in general works. So if we have this input, so we're picking out just one of this, suppose we're picking out just one of these channels and we have this, we are gonna have our kennel two by two, that's it. And then what we get is output is zero since the max of zero, zero, zero is zero. We do, we have a stride of two. So notice how we've shifted by two positions, stride two. And then the max here is two. We move again, stride two, max now is two. We move, you see max is one, move, max is zero. And then you go to the next, max three and so on and so forth. So that's how we obtain this new feature map right here. Notice how we have this is five, 10, five, 10 and then here we have five. So we leave from 10 by 10 to five by five. After this subsampling layer, we have the activation. We've already looked at activation in the previous section. We've seen the sigmoids and we've seen the relu, we've seen the tangs and we've seen the leaky relu. So that said, we understand that and now we move to the next convolutional layer or filter size equal to five, add in no pattern, stride one, number of filters 16, number of parameters 2,416. So you could take this as an exercise to be able to show that the total number of parameters we have is 2,416. That said, we're going to have this output right here. We have the relu added and then we have this output. So we have the output 26 by 26 by 16. Recall number of filters dictates the number of channels right here and not the filter sizes, the number of filters. And then this 26 by 26 is gotten by using the same formula we've seen already. From here, we have another subsampling. We've understood that already and then we flatten all this out. So after doing the subsampling, we have this 13 by 13 by 16. When you multiply all this, it should give you 2,704. And this is what we call the flatten layer. So we pass this to a flatten layer to obtain this. Now this takes each and every value we have here in our feature map and then just simply places it in this one dimensional output right here. Then from here, we have the dense layer, which we're used to working with already. And then we have a thousand neurons in the output. And finally, we have 200 neurons. In our case, we should have two since we're actually predicting whether it's a parasite or it's a parasite, so we should just have two. Anyways, we understand all this now and we should be ready to dive into the code. Before moving on to the code, let's get to see what the feature maps of a trained convolutional neural network looks like. As you could see right here from this example from the Stanford website, we have this car which is passed into a convolutional neural network and then its output, we have it predicting a car. As you can notice, this first layers actually serve as filters for low level features like the edges. So you'll notice that this input or this feature maps from the first layers produce visually interpretable outputs. Now you will notice also that as we go farther or deeper, we have this outputs which are less visually interpretable. And the focus more on high level features like the car parts, which per meters correctly classified that this image contains a car. And so the first layers are for feature extraction and then the last layers are for classification. Both subsections actually need each other as if you do feature extraction and you don't have layers which are able to permit us correctly, classify this image, then we will not achieve our goal. And in the same sense, if we have a good classifier, but we get this kind of inputs directly, where we've not extracted useful features from them yet, then our classifier is not gonna perform well. You could also check this ConvNet-Amnes demo by Andres Caparte, where he trains a model on the Amnes dataset. Here we have for model, the input, conv, pull, conv, pull, and softmax. So here we have this on Amnes dataset. And then we have in this passed in, as you could see, notice how in this first layers, the feature maps actually contain much more visual content as compared to this final layers right here. See, if you look at this final layers, you see that we have this feature maps, which instead contain content, which permits us see whether a particular input belongs to a particular class. If you want to build convolutional layers with TensorFlow, you could make use of this TensorFlow Keras model right here. So that said, we're just gonna come to the layers, see other Keras, and then we pick layers, and then we'll find the conv2D layer. So here is the layer we're searching for, double click on that, and then here we go. We have the arguments, that's what we pass in this layer. The filters right here corresponds to the number of filters. So let's take this off. Here is the number of filters, we've seen this already, and F, and then the kernel size corresponds to the filter size. From there, we move to the stripes, which take a tuple, let's take this off and check this out. So here we have this tuple actually, and then what's important to note here is the fact that if you want to do a striding of say one two, that is one two, then in the height dimension, your striding is gonna be equal one, and then the width dimension is trying to be equal two. So if we have this feature map right here, and we pass a kernel, we are gonna be sliding and skipping two steps in this horizontal direction, while when we come in, when we're going downwards, we are gonna be skipping only one step. So that's what this means. Now in the case where we are having exactly the same number of steps sliding horizontally and vertically, then the striding, like in this case, can just be given as equal one. From there we have the pattern, by default the pattern is valid. Here we tell that for the pattern, if the pattern is valid, it means there's actually no pattern, so we have zero pattern. And then when the pattern is same, the result of the pattern are the, we've padded with zeros to ensure that the input dimension equal the output dimension. Also note that this is only possible when the stride is equal one. So that said, oh, if we have a feature map like this, let's say we have 60 by 60. Then we wanna pass this to a conv layer to get an output feature map. And if we want this output feature map to be the same dimension as the input that is 60 by 60, then all we need to do is specify that the pattern is same. So once we have this, the output will be the same as that of the input. From here we have the data format. Now note that there are generally two kinds of data formats. That is we have height. So for an image, we have the height, width, and then the channel. So if we have a 224 by 224 by two V image, then we actually making use of this data format. Now we could also use this format where we start with a channel. So we have the channel by height and by width. And so here we have three by 224 by 224. Note that by default we have the channel last. So by default we have this first convention last right here. But if you wanna take this convention, all you need to do is to specify that you're working with the channels first. Then from here we have the dilation rate. So we'll look at this GitHub repository by Vido Mulung where he uses animations to explain how convolutions work. So you could scroll down and you would find the dilated convolution animation, which when we click on, we have this. So there we go. We have this example in which the dilation rate is equal to. So the way this dilation works is we have this kernel which initially had no spaces between its values. So we had something like this. We had this three by three kernel. And now what we do is we have this spaces or this holds between these different values. And so now what we get is some sort of five by five kernel right here. If we're working with a dilation rate r equals three, then instead of a single whole year, we are gonna fit in two other holes. So then I have this, we just have one, two, and then we fit this. Then we have one, two, and then we fit this. So that's how we build this out. We have this and then that. Now to get the shape, we have one, two, three, four, five, six, seven. So we have a seven by seven filter now. It should be noted that the dilated convolutions are used in problems where we wanna keep increasing the receptive field as we keep going deeper in the network while maintaining the number of parameters. The next argument is the groups. And according to the documentation, this is a positive integer specifying the number of groups in which the input is split along the channel axis. This means that in this case, for example, we could break this up into three different groups. So we have one, two, and then three groups. Then each group has its own group filters. And then the output is a concatenation of all the group results across the channel axis. From here, we have the activation we've seen already. And then we'll see whether we're gonna use the bias or not. We have this kernel that is the weight initialization. And then we have the bias initialization. We have this regularizes, which we'll see subsequently. And then we have the kernel and bias constraints. In order to create our convolutional neural network, we're just gonna make use of this sequential API which we had built previously. And then we'll define the input layer. So here we have this input shape 224. Let's, we had defined this as in size actually. So we have in size, that's it, by in size by three. We could also define a number of channels, but I just let it to be equal to three. Then from here, we have no normalizer. And then we have this conv2d layer right here. So we have this arguments which are default. So we'll then bother to check on this. We'll take this off, take this off, and that's fine. So we'll have this, there we go. We have our conv2d, our conv2d which we copy. Or rather we cut that off and then we put this after the input layer right here. So there we go, we have our input layer. We want six filters and then with a kernel size of five. So that's it, our stride. We could take this to be equal to one. There we go, padding is valid. So that's what we expect. And then just here we have our activation. So we have activation, we could say sigmoid. We have the sigmoid activation. Since we're replicating the ReLU or rather the LeNet architecture. So there we go, we have the sigmoid and that's our conv2d. From the conv2d, we have a max pooling layer. Let's take this off, let's take the dense layers off. From this conv2d, we have the max pooling layer. Now what we could do is we could just simply import all this. So let's go ahead and say, import the conv2d, max pool2d and the dense layer. Coming back to our documentation, we could check out on the max pool2d. So there we go, we have max pool2d. We'll specify the pool size, which is equal to and then the number of strides we call two. Padding is valid and data format, we're not going to specify that. So we have our max pool2d and then the pooling size with the strides. Copy that. We actually don't know the stride of two. So we replace that, we have a stride of two. The next step is this one right here, similar to what we had seen previously, with a difference that now we have 16 number of filters. And so we could simply copy this and then paste this out right here. We have conv2d, max pool2d, now conv2d, max pool2d. And then here we have 16 number of filters. The padding is valid, activation sigmoid. Then this next max pool2d is still the same as what we had previously. And so we move on to the flatten layer. We add this here, we have flatten, there we go. And we run that. So while that's running, we just simply come right here and then add that flatten layer. So we have this flatten layer, which is in charge of flattening this or converting this into this 1D output. From this flatten, we now have a dense layer, which is what we've seen already. So we just have to put this dense layer here. Let's take this, this, and there we go. So we have our dense layer and then the activation is sigmoid. So we're respecting the activation using the Lynette paper. Then we have this here, we'll take a thousand. And then we have that. We add another dense layer, add this other dense layer. Now we'll make sure that as we create this final dense layer right here, it has an output of two neurons since we are dealing with a binary classification problem. So that's fine. Take this here and then run our model. We have it at input layers now defined. So let's simply add that here. We have the input layer. So run that and then check that out again. There we go. We have this input layer, which we run, which we add and then we run this now, everything is fine. So that's it. We have 45 million parameters and no non-trinnable parameter. Notice how this dense layer here is responsible for a huge percentage of our parameters. So we could reduce this. Let's take this to a hundred and then this to like 10. Let's run that again. And there we go. We have a smaller model this time around. A point to note is this number of parameters which we pre-calculated here. Here we have 456 and here we have 2460. I will see how we get exactly this number of values here. For the dense layer, it's quite obvious as we just have a hundred times 10 plus the 10 biases giving us a hundred times 10 is 1000 plus 10 giving us a thousand and 10. And then here we have 10 times two 20, 20 plus two bias is giving us 22. It should also be noted that there was a slight error here as we don't have like with this, we have five by five by three but here is five by five by six since we have the input number of channels equals six. So this means this is five by five by six and here is five by five by three. All this five by five by three and all this five by five by six. And if you take five by five by six and multiply it 16 times, you should have 2416. It should be noted that here we're trying to replicate the lunette architecture and this is in no way the state of the art kinds of models we use today. So in the section on the corrective measures we are going to use even better models. Now let's just work with this. We're now building the model. Let's now move on to the error sanctioning section. For error sanctioning binary classification problems we generally use the binary cross entropy loss. Let's look at those formula of the binary cross entropy loss from those ML cheat sheet website. Just as most error sanctioning functions with a binary cross entropy, we're trying to penalize the model when the actual prediction is different from the predicted value. And so in this case of a binary classification problem, if our actual prediction is meant to be a one and our model predicts a zero, putting this in here would have one log zero. So we have one log zero. And then here we have one minus one, one minus one is zero. So we have here zero, one minus one, log of one since P is zero. So we have one minus one log of one. If we plot out the curve for the log we would have something like this, we have this plot right here. Here is one. And then we'll notice that as we are approaching zero, that's X. And then here we have Y equals log of X. And as we are approaching zero, that's as X is tending to a zero, we would have this log which is going to us negative infinity. And so the log of zero is a very large number. Now with this case, we have zero, so this is taken off. And so this means when the actual prediction is one and the predicted value is zero, our final output becomes a large number. Now let's modify this. Let's say we have zero here and then here we have one. In this case, we would have a similar scenario because here we're going to have for the Y, Y is zero. So you have zero, zero times whatever we would have here will be zero. We would have zero here and then here we would have one because one minus Y, let's erase this one minus Y. Take this off. One minus Y in this case will give us one since we'll have zero here. And then we'll have log of one minus P, P is one. So we have log of zero. And again, we have a large value because log of zero is taking us to us negative infinity. And then we have one times that big number, giving us a big number. And so our model is sanctioned because it hasn't correctly predicted the expected output. And then supposing the actual prediction is one and then our model actually predicts one. In that case, we would have year one. So we have one, log of one, and then we have one minus one. So here we have one minus one, and then log of one minus one, that's log of zero. But here we have one minus one, which is zero. So zero times this will cancel out. And then what will be left with will be this. And as you could see here, when X equals one, log of X equals zero. So log one is zero. And then we have a final answer of zero. So our final output here is zero, telling us that the model has done its job correctly. If you do the same for when actually zero and predicted zero, you'll see that you always have this year zero. Now note that this zero, log zero is actually a standard limit, but we wouldn't get into that. And you could check on our calculus cause to better understand that. For now, just note that this will give you zero. And then yeah, we have also zero. So our final output is a zero. And our model now makes use of the binary cross entropy loss to update its weights. If you have a zero, and then let's say we have a 0.8 right here, we compute this BCE as a binary cross entropy loss, Y true, Y red. There we go. You see, we have this value of 1.6. Now let's modify this. So let's have say 0.02. I run that, you see we leave from 1.6 to 0.02. And then if we take say 0.2, you see we have this. Now you could always stack up outputs as we had in the beginning. So let's have back this outputs. And then we have BCE, Y true, Y red, and run that again. You see, we have this value here for our loss. And then if we take this to one, you see now we have your zero one, you see the loss is increased. Another argument we could pass in here is a from logits argument. You could check on the other arguments in documentation. For the from logits argument, by default, it's actually false. But then this default value of false is supposing that the output of our model, the Y red will always produce values in the range zero one. Now, the way our model has been constructed ensures that all our values will be produced in that range because we have the sigmoid right here. And if you could recall, we had the sigmoid, which was like this. So with our sigmoid, we have this function which looked like this. And that as we increased the value of X, the output was going towards one. So as we increase the value of X, output goes towards one. As X becomes very small or take a very large negative number, X goes towards zero. So in fact, it's gonna always ensure that the output lies between zero and one. That's why it's very important to have the sigmoid here. Now, in the case where this, we don't have the sigmoid, that is our output doesn't necessarily lie between zero and one. What we're gonna do is say that we're gonna use from logits equal true. So specifying from logits equal true simply means you are trying to say, you are not sure that your output or your model output will always fall under range zero, one. Now that said, if you run this, you see we have a totally different response. So you have to be very careful when working with this. If your values range between zero and one, make sure you use a default from logits equal false. From here, let's go ahead and compile our model. We have our optimizer, we have the loss, our losses, the binary cross entropy loss. So here we have binary, binary cross entropy. There we go. And then the metrics for now, let's take this off. So we just come in this part for now. We're not gonna take that into consideration. So that's it, we've compiled our model. So we could run this, compile our model, item is not defined. Let's go ahead and define all this. And I'm now defined, let's take this metrics off. And then we have the binary cross entropy, binary cross entropy, there we go, we run that. We're running this now, everything works fine. And so we go ahead and train our model in a similar way we had done previously, right? Here we have our train data and our validation data. Let's also reduce this learning rate, let's run that. And then we start with the training. We have those arrow which reads logits and labels must have the same shape. This shape is given versus this one. Now we'll try to understand together why we're having this error. So let's go right up to the model creation. Now, when creating the dataset or when processing this dataset, we had these kinds of inputs and outputs. For the inputs, we had 224 by 224 by three and for the output, we had just one. And this was because our output could take either a zero or a one. But the way we'll define our model is slightly different. With this model definition, we actually have a shape year of two. So let's take this, we actually have a shape of two, meaning that we could have two outputs. Whereas the delay our dataset was constructed was such that we could only have one output. When our output is zero, we suppose that it's parasitized and when our output is one, it is uninfected. And so because of this, we are going to modify our code such that this output year is one. Let's run that again and there we go. So we've modified our model, we recompile our model and then we get to train our model. We still have the same error, but this comes from the fact that we changed this name from model to lunet model so that when working with other models, we could always find ourselves. So we have lunet model and then here is lunet model. Lunet model. So that's it, we run that again and that should be fine. So we've started out with our training process. You see, we're training, but this looks very slow. Let's check on the runtime. So let's click on change runtime type, hardware accelerator and we see known. So we actually using a CPU year of which we should be using a GPU. So that's it, let's check on this and then let's select GPU. So that's fine now and we save. We will run that again. We told this model is not defined and this is because every time it changes runtime, you have to restart all this. So let's rerun this and then get to the training. The training process is now much faster than what we had previously as you could notice. And this is because we're now using a GPU. After training, we have this error. Now the fact that the model trains and then gets right up to the end of the epoch before trying this error should give us a hint that most probably this error is coming from the validation set because for the training, everything went well. And then at the end where some validation has to be computed give us a validation loss, we have an error. This means that that validation data set has a problem. So here we have validation train data set, valid data set. And then what we notice is that these two are slightly different. So here we've not done the resizing. We've not yet done the pre-processing of the validation set as we did with the train data set. So let's get back and check on that. So right here, we have this. We have this, let's add this code and repeat this for the validation and the test set. So that's what we get. And here is it. Let's ensure that we had no pre-processing before this. So here we have this train. So we have to repeat the same for the validation. Everything we do with the train, we do the same with the validation. And coming back up here, we have, okay, everything is okay. So yeah, we need to make sure that this train and validation have the same pre-processing. We have the train and then the validation. So we have validation and your validation. We repeat the same two for the test. So we have test data set, equal test data set, the map, resize and rescale. That's fine. So we run that and then we go ahead and check now on this next one. Here we have the batch size defined, the train shuffling and batching and pre-fetching. So yeah, let's take this off since it's already defined. We have the validation. Here is validation. There we go, validation and that should be fine. So we have the validation which has been processed. So we've done this, we run this and then we have the test. We do the same for the test. But when we check this out, we see that we are trying to shuffle this test which is actually a very useless operation since we do not want to shuffle our test set. And then we do not also want to do this batching. So this tool already useless and then the pre-fetching useless. So practically we don't need to do this for the test set. So that's fine. And then we have train and then validation. Train, validation data set, we run that and here we go. Now we have this known and known here because we have completed this batching the first time for the training and then we've redone this again. So it's like a batch on a batch. That's why we have this. So we have to get back and make sure that we work in from this. So we have to, we run this to make sure we have this train data set calculated again. So now we have our train data set which has been completed from the original data set and everything should be fine. We've done that and we'll visualize, resize and rescale. Yeah, this could be done too for the testing. So that's why we allow this because we need to resize and rescale for our test set. So we have that, fine. And then here we go. We run this right here, run this validation and train. You see, everything is now okay. So we have that, that's fine. Our model, we have a model. It's needless actually, reinitializing our model since we've already started with training. So we just keep it from here. But for demonstration purposes, let's just rerun this again. So we've been training for a while and we're getting this very poor results. We notice how the loss isn't changing at all for the validation. It's kind of a similar situation. We're having these changes within five, four, four, five and that's it. Another thing we could do to make this debugging faster is we'll for now take off the validation. So let's take this off for now and then let's stop the stringing. We now make some changes to our model right here where we, instead of having this activation, we have relu. And then here we have this relu. Right here we have relu. There we go. We have relu and this is a sigmoid. We would also want to reduce the size of this receptive field. So we take the current size to three and your current size three. So that's it. Let's rerun this and see what we get. There we go. We run that and now we don't have the validation. So the training should go faster. We see again here that nothing has really changed with the training. So let's interrupt this and get back to our model. There we go. We have our model. We're now going to include batch normalization. In batch normalization, values of the same, all values belonging to the same batch are standardized. So we have X becoming X minus the mu, mu, divided by the standard deviation. So that's it. Let's add this batch normalization layer right here. Batch normalization. There we go. We have the batch normalization. After this conf 2D again, we have batch normalization. And then with the dense, we have batch normalization. We'll do the same for this. Let's not forget a commerce. So we should have this here and have this here. Let's copy this and then paste it out here. We have batch normalization. So that's fine. We have included batch normalization. We will run the model, batch normalization not defined. Let's have this here. We have batch normalization. There we go. So we take this and then we will run our model. We're running this. That should be fine. Our model is now recompiled. So we recompile our model and then we feed the model. What do we notice? We have this loss which now drops normally. And so we'll see how important it is to work with a batch normalization layer. Apart from this, the batch normalization layer serves as a regularizer. I will see that subsequently. Given that our model is now trained properly, we could halt this training process and then include performance measurement. So here we have the metrics and we have accuracy. So our performance metric here is accuracy and we run this. So when training we'll be able to see how the loss and the accuracy will evolve. There we go. We recompile the model and we go back to training. A model's accuracy is equal the total number of times that model predicted an output correctly divided by the total number of predictions. This means if we have a model right here, let's say we have model A right here, and then we have another model say model B, and that we allow these two models to carry out say 1000 predictions. So we have totally 1000 predictions. Now, if model A does 800 correct predictions, like correct predictions, then its accuracy is 800 divided by 1000. Now you could put this as a fraction or in percentage. So here we have 80% accuracy. Now, if we have model B, which does 980 correct predictions, so we have this accuracy of 98% for B. And in this case, we'll see that model B outperforms model A. It should be noted that the accuracy as a performance metric isn't always the best choice of a performance metric when it comes to classification problems as others like the precision, the recall, the F1 score, and many others exist. For now, we'll use the accuracy, and later on we will look at the other metrics which we could use when we're dealing with classification problems. After training through 20 eBooks, we have this resource here. Now let's plot this out. So let's plot our loss and then plot the accuracy. We have these two plots, there we go. We see that the training and validation losses both keep dropping, and then the accuracy tool keeps increasing, though the training accuracy is slightly greater than that of the validation accuracy. Our next move will be to evaluate our model. So let's go ahead, we have learnnetmodel.evaluate. We evaluate this model, and here's what we get. We receive this error telling us that there is an incompatibility when they expect that shape and the shape of this test dataset right here. So recall that when building the test dataset that was on this position, when building the test dataset, we didn't include this batching. So let's just do that straight away. We have, let's add a code cell and that's it. So here we have test dataset. Now before doing that, let's print this out first. Let's print out the test dataset and you see the train dataset. So we have that. You'll notice that with the train, we have this batch dimension, whereas here we don't have that. So to include that, we have test dataset across test dataset, and then we include a batch of one since we just tested on single elements. So we have that, we run it, and there we go. So here, if we run test dataset this time around, see we should have this batch dimension. Now let's go ahead and evaluate our model. There we go. On data, this model has never seen. It has 94.16% accuracy and a loss of 0.2. This sounds interesting. And note that we could continue with this training. So you could train for more epochs as compared to this. And many scientists have made this remark where sometimes the forget to stop the training and then the comeback and notice that they've gotten an even better performing model because they allow that model to train for longer time. After evaluating our model, let's look at how to do model predictions. Now the whole idea of model predictions makes sense since we have trained our model inputs and outputs, and then now we want to pass in an input and let our model automatically come out or come up with the output. That is to say whether the image contains a parasitized cell or an uninfected cell. That said, all we need to do right here is our model.predict. So we have this predict method right here, and then we pass in our data. This case we have test data set, and then we take one value, that's it. Then we pick this up. Now we run that. Model not defined, we have the net and run that. So that's it. We're told that this is an uninfected cell. Now we'll define this method parasite or not, which is defined such that if we have an input X, then that if that X is less than 0.5, consider that we have a parasitized cell, and if it's greater than or equal to 0.5, then it's an uninfected cell. Recall that the way the data was created was such that parasitized was zero, and then uninfected was one. So we are having a threshold value of 0.5. This threshold value is defined now such that every value less than it is considered parasitized and everybody greater than it is considered uninfected. So that's it. If you now replace here with parasite or not, parasite or not, and we run this again, we are told that this is uninfected. We are going to do a test on nine different elements. So right here, we take nine of this, and then we do the subplots. First, we do the initial, and we specify this because we don't want to batch dimension. And then we have the title. On the title, we have the actual output, and we have the model's predicted output. Now that said, we run this. Here's what we get. We see that for this year, we have UU. So the actual uninfected is both uninfected. UU, UU, here we have UP, meaning that the actual is uninfected but it predicts parasitized. Here we have PP, correct, UU, PP, UU, and PP. Thank you for getting to this point. In our next section, we'll look at corrective measures. So we'll look at how to slow and save our model, how to build other types of models using different APIs, how to use different kinds of metrics, visualizing what our model sees, using callbacks, data augmentation, dropout regularization, early stopping, batch normalization, instance normalization, layer normalization, weight initialization, learning rate scheduling, custom losses and metrics, and sampling methods, custom training, tensor bot and hyperparameter tuning, weights and biases logs, weights and biases artifacts, and finally weights and biases sweeps. That said, see you next time. In our previous section, we've built a deep learning model based on convolutional neural networks to help detect the presence of malaria in blood cells. Nonetheless, in the real world, we are not always going to be using our models on a collab notebook like this. Hence, we need to be able to save this model so it could be used externally. In this section, we'll learn how to save and load a model and also do the same process with Google Drive. That is, we'll be able to save our model in our Google Drive and then later on when we want to use this model, we'll just load it from our Google Drive. That said, don't forget to subscribe and hit that notification button so you don't miss amazing content like this. We've built this very performance model though we could improve on it. But then once we close this, we do not save this model's current state. And so if we have to come next time, the model will have randomly initialized weights which will be different from the weights we've got now after training on this data set. Another issue is in case we want to use this model in another scenario or in another environment like say on a browser or on a mobile phone, we'll need to find a way to export this model from here. And so TensorFlow allows us to save our model. Now we'll have to differentiate between a model's configuration and a model's weights. So a model like this, let's suppose we have a model which is defined as such. We have the input, which we pass into a count layer then we have batch normalization, we have pooling for subsampling and then we flatten and after flattening, we pass through a dense layer and we have our output. So suppose we have this small model. Now, all the parameters for the creation of this model are known as the model's configuration. In the model's configuration, we may have it that the model for example, like in this case here, the model starts with a count layer with six filters, kind of size three, batch norm and all this. So these are our model's configurations, but this model's configurations are different from the model's weights. The model's weights are those filters we have, for example, in the case of the conf 2D. So we have the model weights and the model's configuration. And upon summarizing the model, we see clearly here that we have this conf 2D and then we have this number of parameters. And so whenever we want to save a model, we have to take into consideration this configuration and the weights because for this same configuration, we could have different weights. And so there are actually two main options. The first option is to save the full model. That is to save the model configuration and the model weights. Another option will be to save only the model weights. So we could save only the model weights. Now, this option is used when, for example, we don't want to, or we don't even know this model configuration upfront. So we have used this year, we've defined the model's configuration, we've trained it, we've got a new weights and this is the current model state. But if we take this to another environment where we don't have this configuration, then if we've saved this model's configuration and weights, all we need to do is just to load this configuration and weights which have been packaged as the full model. Now, in another case where we are able to get the configuration and all we need is just the weights then we'll just save the weights and then reload this weights since we already have the configuration. Either ways, we'll always need the configuration and the weights. Nonetheless, it's important to note that the most important part of this is actually the weights since working with a randomized or randomly initialized weights after we've trained our model isn't very useful. And sometimes we may take many days to train this model. So imagine you've trained your model for like 10 days and then you wanna reuse that model and the weights have been randomly initialized. You find that those 10 days have been wasted both time-wise and monetary-wise. So you have to ensure that you save your weights properly such that you could reuse them. And then the great thing with TensorFlow is you could also continue training from the state. So this means that at this point where we've gotten this model's performance year where we have 94%, we could keep training from year so that we could get even to say 99%. So you have to ensure that your saving is done properly. Now let's get into that. But before getting to that, one last point. Also note that with the first method here, with this first method, apart from this model configurations, we also have information like the metrics. So the metrics you use like the accuracy, the loss you use, the optimizer. So the optimizer information you use and all that. So this kind of hyper parameter information has been saved here. So next time all you need to do is just to load your model and then make use of it. Whereas here all you're saving is just the weights. That's it. Let's save our model. This case we have lenetmodel.save. Actually, lenetmodel.save and we give it a name. So we say lenet save model, for example, that's it. Now we have this lenet save model. Oh, we run that. So, and we check this out here. So we check out these files. And what do you see? You have this lenet save model folder. In this folder, you have the assets, which in this case is empty. You have the variables, which actually contains the weights. So you could download this from here. You could download this and then upload it next time. So from here, you see click on download. That's fine. We have the variables, which contains the weights. That's it. And then we have this saved model that put about file here, file here, which actually contains our configuration. We've had our configuration saved and our weights saved. Now let's load this. So we've saved this and now we can now load it. Note that you could always download this. So you could download the weights right here, download this. So let's click on download, download all this. And then next time, all you need to do is just to load it. Now let's go ahead and load this. The loading is quite simple. Here we will define a new model, lenet load that will have loaded model equals tf.keras.models.load model. There we go. And now we specify this exact same name. We have lenet saved model. Now what we'll do is we're going to do a lenet loaded model model and then summary. So that's it. We're going to run that and we're getting this error here, which is unusual. Changing this name actually makes this work. So lenet and then here. So we have this lenet and let's run that again. We save that and then we load this and we have our model right here. So this means if you have to come back to this notebook, all you need is to load this model, which has been saved right here in this lenet folder. And so just like with this, we'll replace this lenet model by lenet loaded model. So let's load this. Let's use this loaded model and do some predictions. There we go. Yes, we'll get UU, UPP, PP, PP. Here we have one error, UU, UU and PP. So it's kind of similar to what we have with the original model. From here, we could also evaluate this model. We have the net loaded model. We evaluate that and let's see what we get. Recall previously we had 94.16%. So now we expect to have something around that value. There we go. Exactly the same output as previously. Now we are going to look at how to load and save with the HDF5 format. Now this HDF5 format is a lightweight version of this TensorFlow model saving method. Here there's only this slight difference. All we need to do is to say, include this file extension. So we have your HDF5 and then we save that. Now you check this out. You should have the HDF5 appearance. So here we have the net HDF5 and then you could see its width by 53 megabytes. Now let's load this model. To load it, what we have here is the same code we had previously. And then here we specify HDF5. So there we go. We run that and we have exactly the same summary. So that's it. Now we're yet to work with custom layers but you have to note that in the case where you built custom layers, then those configurations aren't stored when you're dealing with this HDF5 format. And so that's why generally it's preferable for you to use this first formatting which we presented. That said, we're done with this first method where we save the configurations and the weights. Now let's look at this next method where we save only the weights. So in this case, for example, where we're having this notebook, what we could do is simply just save the weights given that we already have the model's configuration defined in here. So let's get straight away into looking at the save weights method which comes with TensorFlow. So we'll take that off. And then right here we have the net model which we defined already. And then we save this weights as the net weight. So here we've saved this weights. Let's put it in a folder. So we have the weights folder. We run that again. So we could see clearly our weights. Click on that and there we go. We have our weights. And this weights, we have the checkpointing. We have the weights. Now notice how this weights here does in this variables. Click on this variables. Okay, so notice how there is some similarity between what we had here and this. Notice how this is the same as this. And then this index here is the same as this here. Because we said that these variables contain the weights. And then we have the checkpoints. Subsequently, we're gonna look at checkpoints in with TensorFlow. So for now, just know that this is how we save the weights. And then upon defining your model, so you've defined your model, you can now load just this weights and not the whole model. But loading the weights, you're saying that you don't want the optimization or the optimizer configuration. You don't want the metrics and you don't want the last configurations. So that said, let's look at how to load this weights. Now here's all we need to load the weights. We have the learned model, the load weights, and then we load this weights. Let's do this so you see clearly that this loading actually works. So the first thing we'll do is we are gonna re-initialize our model. So we'll rerun this. So we rerun this. We'll compile our model. And then we run this evaluation right here. So we evaluate the model. And so you see that when the models weights are run on the initialized, we have very poor results. So there we go. After random initialization, we have this. Now what we'll do is we'll take the net model and then we load the weights. So we're gonna load the weights and then pass in our weights slash the net weights. That's fine. Let's run this again and then get our new models performance. There we go. Let's see that our model now gets back to the 94.16% accuracy we had initially after doing the training. At this point, we've been able to load and save our model right here on Google Collab. But as we know, at the end of the session, or after closing my notebook, all this information will be lost. So let's see how to save this information on Google Drive. We'll start by imparting this drive. So we'll have from google.drive, from google.collab, we're gonna impart the drive, let's run that. And then the next thing we'll do is to mount this drive. So we have drive.mount, and then we specify the location. So we have your drive and running that, you will be asked to put in an authorization code right here to get this authorization code. So if I just click on this right, this link given to us here, so we click on that link. And then once this pops up, we have this, you select your account. Once you select that account, you now go to connection. So you've connected and then you copy your code. So your code is copied. Now you put this in here and then you press enter. Once that's fine, you see we have here mounted at content slash drive. So mounted in this location, you could see clearly from here. And this tells us we are in this directory content and in this directory content, we've created this other directory drive. Now we click this open and then from this, I can get access to my own Google Drive. If now I want to copy this Lynette folder into my Google Drive so that next time I could just load it from my Google Drive, I'll make use of this CP command right here. So what we'll have is CP, some option, and then we have the source and the destination. So here we are going to specify, or we are going to use this R, so recursive. So we're going to use this to copy directories recursively. And that said, we run the command. We have this here and then we'll specify this folder's directory here. So we have content and then Lynette. So we copy in this Lynette and then to what destination to my drive. So we specify my drive. We have my drive and then in here I have Lynette. I'll let you say Lynette collab, so that's it. From here, I'm going to run this, my drive, and then I will search for Lynette collab. So that's what I have now. I have this Lynette collab right here. And then our next step will be to copy from the my drive to the Google collab such that next time in case where we have not, for example, saved this year, we'll be able to just quickly get that information from the drive onto the Google collab. Also note that this is really used through a data sets. So what we could have here is a data set in our collab and then we could transfer that data set to our drive and vice versa. Now let's do the same thing. So here we are going to copy this back, but this time around we are going to copy this into Lynette collab. So we're going to create a new photo here, Lynette collab and then take this information. So this time around we're copying from our drive into Lynette collab. So that's it. And then Lynette collab, we run that. And let's click on this, click again. And guess what we see? We have our Lynette collab right here. Thank you very much for filling up to this point and see you next time. Hello everyone and welcome to this new section in which we'll look at different ways of creating models other than the sequential API which we've seen so far. In this section, we'll look at the functional API. We'll look at building collable models. We'll look at building model via sub-classing. We'll also look at building our own custom layers. Previously in those cars, we said that there are three ways in which models are built in TensorFlow. That is the sequential API using the functional API and then finally model sub-classing. As of this point, we have been using the sequential API as you could see right here. Now you may ask yourself, why do we need to use a different method in creating TensorFlow models when so far we've achieved close to 99% train accuracy and around 95% test accuracy. Now, as you may have noticed, so far all the models we've been building have taken up this kind of structure where we have an input, we have the first layer, the next layer, which has been stacked in this sequential manner right up to this very last layer here and then we have the output. So the question we could ask ourselves is, what if we have a model which takes in say two inputs and has three outputs? These kinds of models are very popular in deep learning and we shall look at them subsequently. But before getting there, you could just imagine a problem where instead of classifying whether we have a non-parasitic or a parasitic cell, we wanna know the exact position of that parasitic cell or in general that cell in the image. You would find that you would have one output which classifies whether it's a parasitic or not. So we have this first output, parasitic or uninfected and then this other output which gives us the position of the cell or the exact position of the cell in the image. So here we see already how we could get two outputs from this, let's take this third output out. So here we see we could have, let's even take this one. So here we have this one output, two outputs model and with a sequential API, we can't really do this. So that's why working with a functional API is very important. The next point is we'll be able to create more complex models with the functional API. So there is this model known as the ResNet which is very popular in deep learning computer or deep learning for computer vision. Now, a ResNet like structure will look like this. We have this model, this layer of outputs I've been passing to this next layer and then we have the outputs of this which are gonna be concatenated with this outputs and then after this concatenation, we are gonna pass this to the next layer right here. So if we wanna add this layer, we could have a layer here and have that. So as we're saying, we take this output and then concatenate it with this next output before passing to this next layer. And so those kinds of structures or those kinds of models could not be built with the sequential API and hence the need for the functional API. And then the last reason why we are gonna be using the functional API is the fact that we could use shared layers. With shared layers, we could have a layer or a particular layer in our model which has already a predefined way of encoding information. So when we pass information, let's say we have this input, let's say input one, when we pass this input one, this layer right here or this encoder produces an output which is gonna be different from when we pass in another input I2, but the way it produces these outputs is in a very thoughtful manner. So we could have I1, I2, I3, which all share this layer and then we have other layers of the model which follow on. That said, we'll look at how to create the functional API. So here we have the sequential and then just below we are gonna create this functional API. Before starting all the creation, we are gonna impart some classes. So start by imparting the input class right here which is a layer, we import input and then we have from tensorflow Keras layer, rather models, we're gonna import model. So we import the model right here and we import the input. We run this, that should be fine. We now have this func input since we're using the functional API, this way of calling that, we have the func input and then we have input which we just called and this takes in the shape. So here we are gonna copy this exact shape we use in the sequential API. Have the shape right here, there we go. We copy that shape and then we reuse it here and create in this input layer. So here we have that and then we have your shape. So we've had the shape, those points, you could start stacking up all these different layers we had stacked up in the sequential or with this when we're using the sequential API. We started with this, yeah, this com2D right up to this dense layer. So there we go, we is gonna make use of this. So we copy that and then we are gonna paste this out right here. Now, first things first, we have an output. So first things we have this layer that we have this com2D which we've defined and then we pass in the output from this input layer. So here we have this func input, we copy that and then we pass this into this comf layer right here. Now, once it passes into this comf layer, we have an output and that output is this x and then you should guess that right, we pass this x into this back norm layer. So here we have x as you could see and then we have an output of x, there we go. From here, we pass in the x into the max pull 2D layer. So we have this, we cut that and then we have this x right here. So we'll just repeat this same process right up to the end and there we go. You see that we haven't done much changes as compared with the sequential API. So that's it, we pass in this input right here, we have x, we pass it in, we have this, we pass in and right up to this end. Now, once we get to the end, we are now going to create the lunet model from this. So we have lunet model equal model, which will import it and then we have the func input. Now, yeah, let's say we have func output. So we pass this last and then our last output is func output. So we have the input and then we have the output. So there we go. We can now give it a name. We have name lunet model. If you look up, we have, let's take a look at this right here. So this would be the input image. Here we have our input image and then we've created a model lunet model. And then from here you could simply do lunet model summary. Now you'll notice that we should have exactly the same summary as we had with a sequential API. So let's run that and see what we get. Yeah, we have how many parameters? We have 4,668,297 parameters. There we go. You see, we have exactly the same number of parameters, the same number of trainable and non-trainable parameters. So basically what we've done here is we've created this model created with a sequential API. Now we've gotten this. We'll see that we have to change absolutely nothing from our code. So yeah, we was going to compile our model without changing anything. We have the same lunet model. Now we could also change this, let's say lunet func. So you see clearly that we are actually using this functional model right here. So we have this func, there we go, func and that's it. We could run that. And then we recompile right here. So we are not changing any parts of this. We recompile that and then we train the model. We are getting this arrow because of the way we named this model right here. So let's have this lunet model. That's fine. We recompile and then we run. So that's it. We train our model and here is what we get as results. Now coming back to our model, we'll see that we have this feature extraction unit right here. So this conf layers are responsible for extracting useful features from the images. And then this last layers are responsible for correctly classifying whether the image is parasitic or not. That said, we could build a model known as feature extractor. And so here we're at this. We have our model feature extractor, which is going to be like similar in construction as what we've done so far. So we just have that copied and then we have this. But the difference is we are not going to include this other, this final layers right here. We're only at this point. And then we'll have as output this year. So this is our functional, here we'll call this extractor. Let's just say we have this as output. So we have this output and there we go. So here we have our functional input and then we have this output. And then here is the feature extractor. So we have our feature extractor model right here. We could do this feature extractor. And then we summarize this. So let's run this and see what we get. So that's it. We have our input and then we have this output right here. At this point, instead of writing all this here, we're just going to call or let's take from this point. So we have our feature. Let's look at the name. We gave it, we gave it the name feature extractor model. So here we have feature extractor model. So here's our feature extractor model. So we take all this off. And then in here we pass in our input. So notice how we are making this model look like a function. So TensorFlow models are callable, just like the layers. And as you could see here, this feature extractor model could be seen as a layer, just like the dense layer, the batch norm layer and all other layers. So we've gotten this X from this input, which has been passed in our model. And then from here, you see, we pass this X into this flatten and we have the rest. So that's it. Let's now rerun this again. So you could see what we get as output. And as you could see, we get exactly what we expected. We have the same number of parameters and there is this difference here where we have this feature extractor. So unlike before where we had the conf nets, like the conf 2D batch norm max pooling and the same, like let's go up here. There's actually a feature extractor. So unlike before where we had this and then this, now it has been replaced with this feature extractor, like right here. That said, we've just built this model using the functional API. And in subsequent sections, we'll build even more complex models using this functional API models, where we're going to use shared layers. We're going to have multiple inputs, multiple outputs and models where we're going to have even more complicated model configurations. It's important to note that you could mix up the functional API model creation style with that of the sequential API. So you're, instead of creating this, so instead of having this of feature extractor created like this, we are going to create it using the sequential API. Let's add that. And then we copy out this from your, copy out this full model with a sequential API, this is out and we take all of the feature extraction part. Here, you see, we take this off and then we're left only with this feature extraction part. Now let's call this feature extractor. So feature extractor, sequential model. There we go, this is out right here and we're fine. So we have our feature extractor model. We run that, that's okay. Let's take this off and then we'll just make sure we put exactly the same here. So there we go, we paste it out and we rerun this. You see, we should be able to get exactly the same output. See, we have exactly the same output and here, instead of our feature extractor model, we have your sequential layer. So that's it. This shows us that we could mix up these different ways of creating models. From this point, we'll look at the model subclassant. So right here we have our model subclassant. There we go. It's important to note that model subclassant permits us to create recursively composable layers and models. Now, what does that mean? This means I could create a layer where its attributes are other layers and this layer tracks the weights and biases of the sub layers. Before taking an example, let's make this import. So we're going to import layer from layers. We have tensorflow.keras.layers. We're going to import layer. We run that and then we move on to create our model using the model subclassant. Now that's set, we have this feature, our feature extractor. So we have feature extractor right here and then this inherits from layer. So inherits from tensorflow layer and then we have an init method and followed by a call method. So that's it. Now let's use the right syntax. Here's a class, here's a method, init method. There we go. You could always check on our free cars on Python programming in case you're not versed with all the syntax. So that's it. We've had that. And now just like the way we did when we were creating this feature extractor, let's go back to our feature extractor with a functional API actually. Here you'll see, let's copy this out and then get back to our model subclassant. So that's it. Here, let's just put this down here. And then we have the super feature extractor. There we go. That init, so that's it. Now that we have defined this, we could now go ahead and use this layers as the attribute for this feature extractor layer. So there we go. We have this here. We have our conf, the self.self.conf one, which is this conf2d right here. So we take all this off and just place it here. And then from here we have the batch batch one, which is this batch norm layer right here. Take this and then place it right here. We have the max pull2d, take that off. We have self.pull one, there we go. So that's it. And then we just repeat this process. So we could have the self.conf one, conf two rather, so conf two. And then we just take up this parameters. Let's take this from here and place it right here. So here instead of having this, we will just get this. And then the batch norm export2d remain the same. So that's it for our init method. We now go ahead to build our call method. We have our call method here, which takes this input x. And then here, what it does is it permits us call each and every layer defined here in this init method. So here we have x equals self. So the value of x is going to change. We're going to pass it through self.conf one and self.conf one, we're not taking x. This looks similar to the functional API. We have x equals self.batch one, there we go. Batch one and then we have x equals self.pool one, x equals self.conf two, x and then self.batch two. And finally we have self.pool two, that's it. That's it. So now from here, we just return x. Also note that we could pass in a parameter, an argument like the training, which can tell us whether to use a given layer or not during the training process. Nonetheless, for now, all this is going to be used during training. So we have that. Now let's take all this off. We run that. There we go. That's run correctly. Now we should be able to build our model. So here we have this Lumenet model, which took the feature extractor model from here. Let's just copy this. And then, but before copying that, we have to ensure that we create this here. So we have feature subclassed. So we have our feature subclassed, which is this feature extractor viewed right here. So yeah, we have feature extractor built. So that's it. Now you could always pass this parameters, like the number of filters, the kernel size via this. So we could pass this here. You could specify filters and the kernel size. So let's just do that. So let's say filters takes kernel size, stripes and say padding and activation. So if we could pass all this here, such that when we get to this point, we just, we don't need to specify all this. So we'll simply take this off. We don't need to specify all this anymore. Now we have activation, all that's specified. Okay, we could also include the pool size. Let's include the pool size and all the strides here. Okay, so let you say we're gonna have two times the strides because we always specify the strides to be one. So let's take this off. Yeah, we're gonna have two times the strides and we've defined, here we have the pool size. So let's take this off and let's take this off rather. Let's get back. So we take this off and take this off. So that's it. We've defined all that. And now we are ready to pass all these values. So just simply copy all those values. And then in here, we specify the number of filters. So yeah, let's take eight, the kernel size. Let's take three, the number of strides. We have one, the padding. Here is valid, valid activation. ReLU pool size is two. So that is it. So we've defined all this. Now at this point, let's ensure that we have this times two. So let's run this now again. And then we have our feature extractor that we're getting this error. Let's try to understand why and how to solve this error. So there we go. Scrolling, we have this. Now, the reason why we have this error is because of the order in which this come to the Texas arguments. So if you look at this, trying to get this to come up. Anyway, we just look at the documentation. You see, we have filters, kernel size, strides, padding, data format. So you see, we have the data format, the duration rate groups before the activation. So it's important that we specify that this is filters equal filters. Then you will specify kernel size equal kernel size. If not, it's gonna take this to be the data format. So that's it, strides, padding equal padding, and activation equal activation. So this should be fine now. We'll run that again. We have this error, but this time around is for the second conf layer. So let's just redo what we had done here and take this off. Oh, notice filter size times two. So here we have filters times two. We try to do the same for the max pool 2D. So here we have pool size equal pool size. That's it, strides, that's fine. And then yeah, we have pool size equal pool size. Okay, so now everything should work fine. We run that and there we go. Everything works fine. So now we've created our layer, our feature subclass layer. We will now be able to use it in this model right here. So let's copy that and then get back to this. Oh, this is out here. And now here we have our feature subclassed. Let's take all this off subclassed. And then we could comfortably run this and we get in this error. Now let's get back to check. We see batch two, there's an error level of batch two. Now we see here, we have this two, there should be two and there should be two. So that's fine. Let's run that again and everything works well. So you see, it gives us exact same output we expect to get with this feature extractor right here. Then one last thing we could do is instead of doing this way, let's insert some code. Instead of doing this way, we are going to create a model using this model subclassing method. So yeah, we just copy this out. So we copy that out. And then in here, instead of having layers, so yeah, we're not having a layer, we're having a model. So instead of having layer, we now have model and then we're going to define a feature extractor. So our feature extractor now is going to be the feature extractor we've just defined here. So let's go up and there we go. So we're going to get this feature extractor right here. And then what we'll do is we put it in here. So let's take all this off. We have all this off. That's now our feature extractor. So we've got in this feature extractor. That's it. And then once we get X, we're going to pass this to our feature extractor. There we go. Let's take all this off and we'll find. Now we're done with the feature extraction. We could get the other parts which make up the model like this pattern, the dance and the back norm. So let's take this here. There we go. We are in this model and let's just copy it. Let's paste it out here. So here we're going to have this. Also note that we're going to have feature, or let's say, LearnNet. This is our LearnNet model. Modify this here. We have the LearnNet model. LearnNet model. So that's fine. Now we've gotten this. Everything is understood. We now check out this self.flatten, equal flatten. And then next we have self.dance1 equal, let's copy this out from here simply. So we have this dance right here, which is our dance one. There we go. We have our dance. And then we have self.batch1, which is our batch normalization. There we go. Copy this paste. Move to dance2 and batch2. So here we have dance2, we have batch2. This is 10 actually. So we have that. And then finally we have this dance layer. So let's take this off. We have the dance layer. Output one, activation sigmoid. So that's it. Oh, we call this dance3. So that's fine. Everything seems okay. Let's take this off now. And then there we go. We get into our call method. So yeah, we get into this call method. And this call method will basically call all these different layers. So after the feature extraction, notice how we've created this class. And this class makes use of this feature extractor, which was also created using the same model subclassing method. So there we go. We're using this here and we're actually using it here. So that's it. Now we just make this calls. So we have x equal self.flatten. And then we pass x. x equal self.dance1 as in x. x x equals self.batch1 as in x. And then finally we have this. So there we go. We now return x just as we did previously, and we have our model. So here we have our lunette model. Lunette subclass model. And we have this lunette here, lunette model. So let's take this off and there we go. We've just built our model, which when we try to find a summary, so we try to do lunette subclass.summary. What do we obtain? See, this model has not yet been built. Do the model first by calling build or by calling the model on a batch of data. So we're going to call this model on a batch of data. Right here, we're going to have lunette subclass. And then we have tf.zeros, tf.zeros, and one, two, two, four, two, two, four, and three. So let's run this and see what we get. That's fine. So yeah, we have our summary and then we are ready to compile this model. Yeah, we have lunette, lunette subclass. So we compile lunette subclass. And then we're going to fit lunette subclass, lunette subclass. So we feed that and everything should work fine. Let's take this to just five ebooks. You can see that we're getting similar results when we compare this with what we have with a functional API and the sequential API. So we now move on to creating custom layers. If you could recall from the previous sections, the way the dense layer is built is such that if we have this as our dense layer and then we have this input right here, let's call this I, or let's say it's X, we have this input X, then we have a certain M times X plus C. So this M is actually the weights. So the weights times X plus the bias, let's call this B and then this is now equal our output. So here we have an output of Y, Y equals MX plus C. So the symptom means if we want to recreate the dense layer, then we have to take this into consideration or the definition of the layer from scratch into consideration. That said, we could define a neural learn dense. So it's like our custom dense, neural and dense. It's gonna inherit from layer and then right here, this class. So we have this class inherits from layer. Then from here, we have our init method, init. init and then we have the super with pass and neural learn dense. There we go, we have that, that init. So that's it. Now from here, if you had noticed, whenever we're creating a dense layer, like let's call up here, whenever we're creating a dense layer, we generally had to specify at least because this is by default. This is not by default, but. So here we need to pass in this year to specify the number of output units. Now that said, we have to take that into consideration when building our neural and dense layer. So here we have output units. There we go, we have our output units. And then here we'll define self dot output units to be equal output units. So that's it. Now from this point, we're gonna build this layer. In order to build this layer, we have to take into consideration this definition right here. But this definition put out the way this isn't very clear. Now let's make this, or let's break this up. So you suppose now we have this input of shape batch size by let's say number of features. So suppose we have this input. Now what happens here is this input is gonna be multiplied by this weights. So it's gonna multiply by this weight, which happens to be a matrix. Now we take this and multiply by that matrix. And for the multiplication to be valid, we need to ensure that the number of columns we have here matches with the number of rows of this matrix. But then what are the dimensions of this matrix? We have to note that this matrix has to be defined such that we have a shape of F. This F here must match by the number of output units. So if you want a number of output units, for example, to be one, then here you should have one. And that's why when defining the dense layer, we don't need to specify this as this value is gotten automatically from the number of columns in the inputs since if we don't take this, we are gonna have an error. So TensorFlow test takes this automatically piece in here and then collects the input you pass in the dense. So when you specify, when you say you have this dense like this, and then we say, for example, one and maybe some activation, then you have that. So let's say we have some activation here. Now, once this one gets here, this weights matrix is now defined such that the input you pass into this dense is gonna affect the number of rows we have. But then what you pass in as argument here is gonna tell us or give us a number of columns we're gonna have for this weight matrix right here. So that said, we have F by one. And then when we multiply this, we're gonna have B by one. So we see, we now understand how we get this output. Then we have plus B by one. So that's it. Where this one comes from the bias. Now, once we add this up, we have an output of B by one. And that's our Y. That's the shape of our Y. If that's understood, we'll move on to building. So here we have our build method. We have self and for now let's keep it that way. So there we go, we have this build method. And then we're gonna define our weights. So we have self.weights. Let's specify weights, we call that. And then here we're gonna have the self.addWeights method. So this actually comes with this layer class right here. So we're able to call this because we are inheriting from the layer class. So here we have self.addWeight. And now we specify the shape. So guess what? We are gonna have our number of rows. So n rows, which is gonna come from the inputs. And then we're gonna have our number of columns, which is gonna come from this output units we specify. We should pass in when calling the neural and dense layer. So here we're gonna have self.outputUnits. So we get a number of output units. Now, how do we obtain this number of input units? We're gonna look at that shortly. For now we have the weights. And then we have that. And then let's go to bias. So we have self.biasis. And self.this is also a weight. Add weights. And then we specify just the number of output units since it's one dimensional. So we have output units. There we go. And so at this point we'll define our weights and our bias matrix. This actually weights. Just similar. So that's it. Now let's go into call. So we have our call method. We should actually get a job done. And then we have our input. So let's call this, yeah, input. Let's add an S. Or let's say input data. Or input features. So there we go. So we have that. And then what we're gonna do here is we're gonna return simply the matrix modification as we've seen already of the weights. So we have self.weights. And this input features. It's actually the input features times the weights. Let's get back to this. Here if we take the input, yeah, and it takes input times the weights. Because if we have the weights times the input we will have F1 times the F. We call our weights this, our weight shape and then this our input shape. More than this, you see that this is always the same. So it's not gonna be, you're gonna try an error. And so that's set. What we're gonna do is we're gonna just simply have input features right here. So that's it. And then we add up the bias. So here we have self. biases. Which is from this one right here. Now the way we're gonna get this number of rows is gonna be easy. Yeah, all we need to do is to specify here that we have the input feature, features, shape. We have our input features shape. Which is gonna come automatically from this. And then to get the number of rows, all we need to do here is have input, features, shape. And then we get that last dimension. So let's get back to this. We call, here we have an E by F plus E, F. And so here we need the weights, it needs to be F by the output. So to get this F, we just need to take the input and then get this last element right here. And there is the number of columns we have here. So that's set. We will just have that input features. Specify this, take this index. And that will be good. So that's how we get, we obtain this value automatically. The number of rows we've been looking for. So now we've gotten this, everything seems fine. The next thing to do is to specify that it's trainable. So we have to specify that these weights are trainable because in some cases we may want that the weights shouldn't be trainable. So let's have the trainable equal true. Now there's no self here, there is an argument. Like it's one of the arguments which I've been passing to this add weight method right here. So we would have that. And then there again, we have this trainable equal true. Now apart from that, we could randomly initialize our weights in biases. So here we have random, normal, random normal initialization, it's fine. Then it is same here. So we have our initializer equals random normal. That's good. We now run this. There's our neural learn dense layer. Then after running this, let's get to integrate this. Let's make it quite simple. We just use our sequential API. So let's get back to this sequential API we have built initially, copy that. And then we are going to integrate this new dense layer, this neural learn custom dense layer. So there we go, we have that. And then instead of dense layer, here we have neural learn custom dense layer. So you see that you can be able to create your own layers with TensorFlow, that's it, neural learn. So neural and dense, neural and dense. Now you see, this should try an error because we don't take into consideration the activation. And so what we could do is, we could get back right here and say, if the activation is equal relu, if activation is equal relu, we'll return this. And then else, let's say elif activation is equal sigmoid, return that, let's get back. We have this. So we're going to return this, let's copy this and then paste out right here. And then we have els, we will do the same. So now let's get into this and see the modifications we're going to make. There we go. So here we have our relu. In case it's relu, what we want to have here is tf.nn.relu of that. So we're going to pass this in to our relu. And then if the activation is sigmoid, we should have tf.mat.sigmoid. So that's it. There we go, we have that. And then this one we just maintain. So we have modified our code such that we now integrate the activations. There we go. That seems fine. So let's run this and then take this error. So here we have this double equals, find some syntax. And then we come down here, return seems fine. Let's take this again. Here we're supposed to have, we're supposed to specify activation. So we're supposed to have activation. And then salve.activation for activation. And then here we have salve.activation, salve.activation. We run that again, it's fine. And then now what we do is we pass, we just simply run this. So run that and we get in this error right here. To solve this problem, what we're gonna do is we're gonna include the shape here. So we're gonna have shape equal that and then shape equal this. Let's rerun that and we should have no error again. Oh yeah, we have in cancer attributes weights, likely because it conflicts with an existing read only property of the object. So right here, instead of using weights, we're just gonna say W. So we have the W and then, right here, we should have W. Now, since we don't want to repeat this over, what would you say we have a pre output. So we have our pre output, which is this here. And is that out? And then here we have W. Right here, we have B. That's it. And now we have, if this, then we pass the pre output. And we do the same right here. So we have just a pre output before the activation. Now, yeah, we just have pre output actually. So we just have pre output to take that out. Now running this should be fine. So let's run that and see what we get. We run this, we run that, and there we go. We have our model, the same exact model we have been building right from the start. But this time around, we're having, we're using a custom dense layer, which is our neural and dense layer. Now let's go ahead and compile this model and then train it. Yeah, we've actually maintained this year. So we have to change this name. Here's Lynette model. Let's say Lynette custom. Yeah, custom model. Yeah. Okay, we copy that. We have all the net custom model. Let's have it here. There we go. We run it. That's fine. Right here, we have Lynette custom. We run that. Yeah, we have Lynette custom. We run that. And then let's check out. As you can see with the train, there's a slight difference in the last values. And accuracy we're getting for this first epoch. And most probably, this is coming from this random normal initialization we've chosen here as with the standard dense layer, which you could see here. This kernel initialization, or the weight initialization is using this blurrow uniform. And the bias initialization is the zeros. So here we have all zeros for the biases. And then here we use the blurrow uniform method. So yeah, as you could see, this is in performing as well as what we had previously. But scrolling up, there is this error. So yeah, it was meant to be sigmoid. So let's stop this. Let's interrupt this training. And then get back and run this. So we will run this and compile and then start with the training. Again, this time around, it looks better. It looks more like what we should expect. So even though we're not using the blurrow initialization method, this training process looks quite similar to what we would have had in the case of the blurrow initialization method, which is used with the standard dense layer. And here is what we get after training for over five epochs. Thank you for getting up to this point and see you next time. Hello, everyone. And welcome to this new section in which we will look at other methods of evaluating our model other than the binary accuracy, which will be seen so far. So in this section, we'll look at how to compare the true positives, false positives, true negatives, false negatives, the precision, the recall, the area under the curve, how to come up with confusion metrics like this, and finally, how to plot out an ROC curve like this one, which permits us select the threshold more efficiently. Don't forget to subscribe and hit that notification button so you never miss amazing content like this. Let's now look at other ways of evaluating our model other than the accuracy, which we've seen so far. To better understand why working with other accuracy isn't always a great idea, we have to take into consideration the fact that our model on a test set has 94% accuracy. Now, this means that we have six out of 100 predictions which are actually false. Now, what if I get to the hospital and I'm told that I don't have malaria when in fact I actually have this disease? So that said, the model predicts uninfected and I actually have the parasite in my bloodstream. Now, this particular situation becomes very dangerous because the patient gets by home thinking he or she doesn't need any treatment, whereas that patient actually has this parasite. You see that even with a 94% accuracy, we wouldn't be able to save ourselves from such chaotic model predictions. Now, in another example, you have a situation where the actual is, let's put it out here. So in another example, we can have a situation where the actual is unparasitized. So actually you do not have this parasite, but the model predicts that you have the parasite. Now, in this case, although we have a wrong prediction, we have actually a less chaotic situation as compared to this previous case here, since at least actually you are uninfected. If we consider negative, that's this negative to be uninfected and positive to contain the parasite, that's P. So here we consider we have positive and then we have negative. So here we have negative, uninfected and positive parasitized. Now, if we let this, you will find that this first situation where actually we have the parasite and the model predicts unparasitized is known as a false negative. So here we have this false negative. And this is because the model predicts negative when it isn't actually negative. So since we have this wrong prediction for negative, we call it a false negative. And in this case where we have the model predicting parasitized, that is positive, when it's actually negative, we call this a false positive. So here we have FP and here we have FN. Now, there are two other scenarios that is with TN and TP. For the TN, we have the true negatives and the TP, the true positives. For the TN, we have the model predicting negative when actually we are negative. That is, we have the model saying this is uninfected when actually it's uninfected. So that's a true negative. And then for the true positive, the model predicts a positive that's parasitized when actually it's parasitized. So that's a true positive. Now, hopefully you've understood the concepts of true negatives, true positives, false negatives, and false positives. We can then summarize all this information in this matrix known as the confusion matrix. This confusion matrix, we have the true negatives, the number of true negatives here, the number of true positives, the number of false negatives, and the number of false positives. This means that if we have a test set of say, 2,750 different data points, and then we run this or we evaluate this with our model, we'll be able to get this number of true negatives, get this number of false negatives, number of true positives, number of false positives, and hence better evaluate this model. So if we take this example where we've evaluated our model on the test set, and then this model A produces this confusion matrix, and this model B produces other confusion matrix, where here we see we have for the true negatives and true positives, we have 1,000, 1,000, that's 2,000 correctly predicted data points, and then here we have 1,000, 1,000, 2,000 correctly predicted data points. And then for this model A, we have 700 false positives. And 50 false negatives. Whereas for model B, we have 50 false positives and 700 false negatives. Recall we had defined negative to be uninfected and positive to be parasitized. And hence, if we had to choose a model between A and B, we'll try to choose that model which minimizes the number of false negatives. So we are not saying that we shouldn't minimize the number of false positives because we have to try to minimize all the false predictions. But then, since with the false negatives, we are telling a sick person that he or she isn't sick, this at least is worse than telling a healthy person that he or she is sick. And so we'll try to prioritize the number of false negatives and based on this prioritization, we are gonna prefer model A since we have the smaller number of false negatives. And so here, we'll choose model A over model B. Now, as a quick note, you may decide to say no negative is for parasitized and positive is for uninfected. It isn't a must that this must be tied together like this, but for clarity purposes, it's better to look at it this way since saying you are tested negative means you're uninfected and tested positive means you're parasitized. Now, you should also note that depending on the kind of problem you want to solve, in some cases, you will want to prioritize minimizing the number of false positive over the number of false negatives. So this actually depends on the problem you're trying to solve. But in this case, we are prioritizing the number of false negatives. Now, based on what we've seen so far, we're gonna introduce several new performance metrics. And others take up this different formulas we could see right here. We have the precision, which is the number of true positives divided by the number of true positives, plus the number of false positives. Recall, true positives divided by a number of true positives plus a number of false negatives. The accuracy, the number of two negatives, plus number of true positives divided by number of true negatives plus true positives plus false negatives, plus false positives. So we'll stop in this first three for now. Now, what do you notice? You'll notice that in this position and recall, we have this true positive, true positive, and here true positive, true positive. What differentiates them is the fact that in the position we have false positive in the denominator, and in the recall we have false negative in the denominator. This means that if the number of false negatives is high, that is we have, let's say we have a constant, a constant K divided by a high value. So constant divided by a high value. So constant divided by high. Here we're going to have a low output. And so if we want to have a low recall, then we need to have a high number of false negatives. And if we want to have a low position, then we need to have a high number of false positives. Now in our case, we're trying to minimize the number of false negatives. And since we're trying to minimize the number of false negatives, it means that we're trying to maximize the recall. Since minimizing this denominator, we're until maximizing this overall TP on TP plus FN. And so here we're trying to prioritize the recall over the position. Now, if you look at the accuracy, you'll notice that we have TN plus TP and TN plus TP right here. TN plus TP, TN plus TP, and we have FN plus FP. If you are keen enough, you should see that this accuracy doesn't give any parity for whether the false negatives are the false positives. It treats these two as the same. But as we've seen previously in the real world, in solving real world problems, many times we'll have to prioritize. Hence, the accuracy may not always be the best metrics for our problem. In our case, we find that using the recall is even better than using the accuracy. As with the recall, we get to reduce or, with the recall, we get to see whether our model does well at minimizing the number of false negatives. Now, we also have this F1 score, two times the precision times recall, this precision is recall, divided by the precision plus recall, the specificity, the number of true negatives divided by number of true negatives plus number of false positives. And then we also have this ROC plot right here, ROC stands for receiver operating characteristics. Here we have the true positive rate and the false positive rate. The true positive rate is the number of true positives divided by number of true positives plus number of false negatives, which happens to be the recall. And then the false positive rate is the number of false positives divided by number of false positives plus the number of true negatives, which if you look carefully, you'll find that is equal. One minus the specificity, which has been defined right here. Before getting to understand this ROC plot, which we've put out here, let's recall this two models, which we had described previously, that's model A and model B, where model A had a smaller number of false negatives as compared to model B. Now, if we pick out just this model A and then we are interested in reducing this, or let's say we pick our model B. Suppose we pick our model B, I will interested in reducing this number of false negatives right here. Then one solution could be that of modifying the threshold. So if we have a threshold of 0.5, meaning that above, like we have the 0.5, below 0.5 we consider negative, above 0.5 we consider positive, that is pasteurized and then below uninfected. Then what I could do here is reduce the threshold. So if I take the threshold to say value of 0.2, let's take this here. So I've reduced the threshold now to 0.2. You see that for most of the predictions, our model is going to say that this is a parasitized output since now the threshold has been reduced. This means that if we have a model prediction of 0.3, which initially would have been uninfected, now this model sees this as parasitized. And so this makes it now more difficult for our model to have false negatives since our model now has this tendency of predicting that a given input image is parasitized. That said, we now need to look for a way that we could automate this process. That is we want to be able to choose this threshold correctly or rightly. Because if let's say we take a threshold of say 0.001, it means that anytime our model predicts less than 0.01, it's uninfected and then greater than 0.01 is parasitized, then this will be very dangerous for the overall model performance as now most times would have the model predicting that the input is parasitized. And so our aim here is to pick this threshold such that this number of true positives and true negatives we've had right here don't get reduced. Now, the way we could look at this is now by using this ROC plot. With this ROC plot, what you actually have here is the different true positive rates and false positive rates you'd have at a given threshold. So this means that a point picked here is just a given threshold. Now let's suppose that the threshold 0.5 is about here. So let's pick this. Let's suppose that this is 0.5. We could pick another threshold. Let's say this one is say 0.2 and so on and so forth. Let's say this one is 0.1. Now we could have another model with this different ROC plot, another one with this kind of ROC plot, but note that overall our aim is to ensure that this false positive rates is minimized and the true positive rate is maximized. So if we have an ROC plot, which is like this, so let's redraw this. If we have this kind of ROC plot, that is one that goes up straight like this and then comes this way, we have this right here. So if we have this kind of ROC plot, then we will be able to pick out this threshold right here, 0.0, say X. We'll be able to pick this threshold because at this threshold, for this threshold, at this point, the true positive rate is at its highest value, that is of one. And then the false positive rate is at its lowest possible value, that is of zero. So here we have one and then zero. So here we go, we have this and then we pick out the threshold. Now this value of X could be five, could be four, whatever. So we have a zero point, whatever value will lead us to this. Nonetheless, many times we wouldn't have this kind of plots. So we will do it plus, which will look like this, this and so on and so forth. Now, once given a plot like this one, let's suppose we have a plot like this one. The aim here is to ask yourself the right question. If I want to make sure that my recall is always maximized, which is our case, we will try to ensure that we pick out these points around this. So we'll try to pick out these points towards the top right here. Let's take some of this off. So as we're saying, if we want to maximize our recall, is that normal that will pick out threshold values, which will take us around this region because it's around this region that our recall is maximized. Let's do this so you could see that clearer. Okay, now the problem with picking a point around this is that when you pick a point around this, the false positive rates is increased. So you need to find that balance between this false positive rate and true positive rate. So it will be much logical to pick a point around this right here. So we could pick around this region instead of this previous region right here. Now you could see that in this region, or let's say we pick this point. If you pick this point, your false positive rate now is smaller while your true positive rate is, or your recall is maximized, though it isn't the best possible recall we could have. But trying to focus on getting that recall of one will lead you into trouble since getting a recall of one in this case will increase our false positive rate. And then if we're dealing with a problem where we're trying to maximize the precision, then in that case, we want to ensure that this false positive rate is minimized. And so in this kind of problems, we'll want to pick a point around this. So you see, we want to pick this kind of value since at least our false positive rate is minimized. But then if you want to go and pick a point around this year, you would have a false positive rate of zero, but doing this will get you this into trouble because here you're going to have a true positive rate, which is very small. And so you need to get that balance. Now, if you're having a problem where it doesn't really matter that is you aren't trying to prioritize the precision or the recall and working with the accuracy is just fine, then you could pick out this point right here. So this is when you are having, or you don't have to prioritize on any, here is when you're trying to prioritize on the recall and here is when you're trying to prioritize on the precision. But as is one great thing is with this tool, that is the ROC plot, we are able to pick out this point and then automatically get that threshold we need to work with. And so when doing predictions, we will not, or we may not use 0.5, but we're going to use a certain threshold, which will suit this objectives we've set initially. Now we'll now move to the area under the curve. For the area under the curve, we generally use this when we comparing two models. Here we have this model, let's call it model A. It's not actually this model A, it's a different model A. Let's call this model alpha. And then we have this other model, which we shall call model beta. So we have model alpha and model beta. It's clear that model beta is better because it gives us better options. Now you'll see that if I find myself here, I get it better through positive rate, false positive rate balance as compared to when I find myself at this position. And so if we are comparing these two models, we could make use of the area under the curve by calculating this area covered here. So let's bound this and then for alpha, we have this area under the curve, popularly known as AUC, A small U and then C, which will give us this. And then for beta, we're going to have this area under the curve now, which covers this area plus this extra area right here. And so in general, if we have two models and then we want to compare them, then we could use this area under the curve. Since it shows us how much freedom we have in playing around with the thresholds. We now get back to the code and see how we're going to implement this new metrics we've just talked about. So right here, we have metrics and then we have the false positives. We have the false negatives. We have the true positives. We have the true negatives. We have the precision. We have the recall. We have the AUC, AUC. And then since we're done with a binary classification problem here, we have binary accuracy. We run that, that's fine. And then right here, instead of having this metrics, we'll define this metrics list, which will contain those different metrics, which we've just talked of. So we have the true positive, false positive, right up to the AUC. Let's run this, compile, and then feed our model. You'll see that as we train, we have the true positives, false positives, true negatives, false negatives, accuracy, precision, recall, and AUC scores, which have been given to us. As you could see, the other results we obtained, we're going to go ahead to evaluate our model. So let's get to the model evaluation. We run this, test data, and then we have our model evaluated. There we go, our model has been evaluated. As you could see, we have this last 0.35, number of true positives, 1,323, number of false positives, number of true negatives, false negatives, accuracy, precision, recall, and AUC. So that's what we have for this model. For now, we have been showing you this binary accuracy, false positives, up to AUC. What if we plot out the confusion metrics and also the ROC plot? So we'll impart scikit-learn and seaborne. Right here, we have scikit-learn and from scikit-learn.metrics, we're going to impart confusion metrics. And then for seaborne, we need to import seaborne and as SNS, so that's it. So here we are importing scikit-learn and seaborne, which we'll both use in plotting out this confusion metrics. So let's run this and then we'll visualize our confusion metrics. So we'll have to get the level, that's the true values of the outputs. And then we'll also get the predicted values. So yeah, we want to get the levels and the predicted values. Now let's start with the levels. So yeah, we're going to have levels. We have this list, there we go. And then for x, y, or let's say for x, y in the test data set, we're going to take this as numpy iterator. So for x, y in this, we have the x and the y, we have that. And then we append every output to these levels. So yeah, we have levels.append y and we run that. So that's fine. We can now print out levels. What do we get? There we go. You see, we have this levels. Now let's convert it into a simpler form where we just have only this values in here and not this arrays, which we have here. So in order to get that, we'll have levels will be equal levels or rather will be equal array. And in here, we're going to create this list. So in this list here, for every element of this levels list. So for here, for i in levels, we're going to do i the zeroth index. So we're going to take always this elements right here. So that said, we have that and now we have levels. Let's print out the levels after doing this transformation. There we go. Here's our levels now. So we have that fine, take that off. We have this error actually, because we tried, we've converted these levels already and then we're trying to reconvert them. So let's comment this right here. We comment that and run this and we have all levels. Okay, now we have this levels, we could move ahead and get the predicted values. So to get this predicted values, we have our lunette model, lunette model. There we go, dot predict. What we're going to pass in this predict method is our input. So we have the input and then right here, we add this input list and then we do same as we did with the levels. So we have input, dot append X, which is this input here. Now we run this and then let's add a cell. So we add this code cell and then we're going to pre-process our input. So from this, we're just going to print out the input to see what it gives us or what kind of shape we're having right here. Let's put non-py array while that's loading, shape. Then we run this other cell right here. Here's what we get as shape. So we have to take this off, this one off here. And to do that, you see we would have the input. We'll select this first index. So we take that and then for this next, we take only one. So we're going to chop this off and then we have the rest. So that's it. Now we run this again and we should have put a shape here. So we run that again and there we go. So this is what we expect. Now we've gotten the levels and then we are now ready to get the predictions. So here we have this input, take this and then we pass it in here. We pass our input and we get our predictions. Let's print out the predicted here. Oh yeah, let's print out a predicted and see what we get. So what we get right here, we could print out a shape to better understand what this is. And as you could see, we have this. So how do we take this over? How do we make it look like the levels? We could simply take this off. So we have, we select the first and then for this one, we take it off. So we run that again and there we go. So now we could do this, take out the shape and then it should look like the levels we've had here and that's it. So now we've gotten this. Our next step is to use our confusion metrics from scikit and metrics and use our confusion metrics. Now let's comment this sections first. So here we've passed in the levels, we've passed in the predicted and then we're specifying the threshold. So what we're saying here is all values greater than the threshold considered uninfected and all values less than or equal to threshold is gonna be considered as contained in the parasite. And then we're gonna have this in this confusion metrics here. So we run that and then as you could see, we have our confusion metrics, which shows us the number of true positives, true negatives, false positives and false negatives. Getting back to our model evaluation, we'll see exactly which of those values are the true positives, false positives, true negatives or false negatives. So let's just copy this out here. We take this up and we shall see that below. So here we have this confusion metrics, which we've just seen. Here is it right here, confusion metrics right here. And then let's place this out here. So we have the number of true positives. This one matches. And then number of true negatives, one, two, nine, eight. Yeah, this is closer. So what Psychelearn gives us is quite similar to what we had with TensorFlow, but they're not exactly the same. So this tells us that this is a number of true negatives. This is the number of, no, this is a number of true negatives right here. So here we have true negatives. Here we have true positives. Then here we have false positives. And then here we have false negatives. We'll then see how modifying the threshold will either reduce the false negatives or reduce the false positives. So let's have say 0.25 right here. We run that again and there we go. Let's print this. So we see that we now have a number of false negatives reduced when we reduce the threshold. We now go back and then take this to 0.75. We run that again. You see that this is now reduced. So here we have an increased number of false negatives and a reduced number of false positives when the threshold is increased to 0.75. So that's it. Let's now print out or rather let's now plot out this more elegantly looking figure with the CBUN library. So here we go. We run that and that's it. So here we have same confusion metrics but more elegantly plotted. Let's take back to 0.5 here and have that. Okay, so that's it. Now you notice that when we're trying to say reduce the more false negatives or the more false positives, what we're doing is we're just picking up some values here like we could pick up 0.2 and then we see its effect and we could do the same for 0.25. You see how this takes us to 58 and this to 130. We could continue doing this until we get our best results. But then this isn't very efficient since we're just trying out different values. Now the way we could do this more efficiently is by working with ROC plots where we'll be able to choose a threshold in a more efficient manner using the plots. And so we'll be able to reduce the number of false negatives or number of false positives without having to try out all these threshold values manually. The very first thing we'll start with will be to import ROC curve. So we have ROC curve and this is part of scikit-learn metrics. So we'll run this again and then just right here we'll make use of our method. We're going to output a number of false positives through positives and then thresholds which we'll use in coming up with the ROC plot. So we have that, we have ROC curve, ROC curve and then this takes in the levels and then the predicted. So we have this predicted value. Now if you print this length out, the length of FB, length of TP and length of thresholds, so we have exactly the same number. Thresholds here, we should have exactly the same number. Here we have 330. Okay, so now we have this. Now note that the reason why we need this is because when coming up with our ROC plot, like say we have this ROC plot, what I have for each and every point, the corresponding TP, FP and then the threshold which will lead to that TP, FP pair. So that said, we are going to make use of this data now and then plot out the ROC curve. So let's get straight into that. We have our plot and then we do plotting. We pass in the false positives through positives just like XY. So here we have X and then here is Y. Let's get back, that's fine. And then from here we have the levels. So we have X level, which is our false positive. And then the Y level, our true positive rate. So we've seen this already. Now we have this, we could include the grid. So we have this grid and then we show. So that's it. We run that and here is what we get. So this is our ROC plot right here based on this FP and TP we got from here. Now, how do we include the thresholds? In order to include the thresholds, we are going to make use of my plot leaps test method. So we have plot the text and then here we're going to have the TP or rather the FP TP. And then what if the actual text will be put in here will be the thresholds. So we'll be passing in the thresholds. But now note that we actually have to do this for each and every point, which isn't possible since there will be too many texts put out here and it's going to be chucked up. So what we could do is we could skip some values. So we'll say for I in range, we're going to start from zero right up to the length. That's actually going to be 330, length of thresholds. And then we're going to be skipping some values. So skip and then we'll define the skip. So let's start with a skip of 20. So initially we're going to skip 20 values. Then once we skip this value, let's pass this in here. Once we skip this value, we pick in a given I, pick in that given I, same here. And then we do the same with the thresholds. So we get the corresponding false positive rate, corresponding true positive rate and then the corresponding threshold. Now that's done, we could run this. So here's what we get. We see this plot. Now we could see the ROC plot with the different thresholds. I think we could let it like this. It's fine. Let's create a size. And then we try to focus just on this portion, which actually matters the most because we wouldn't want to get into these regions because in these regions our false positive rate is going to be too high. And these regions below this would have a very small true positive rate. So generally we'll try to focus on this zone right here. Now, depending on the problem you are trying to solve, if your false positive rate is what matters the most, that is if you cannot afford to have a high false positive rate, then you would tend to pick values. Let's say, let's break it out like this. So here is like the meat point. We have some sort of meat here. Let's draw this line. Okay. So we have some sort of meat point here. So this is like 0.5, 0.46, 0.62 and all of that. So we're breaking it up like this. And then if you want to ensure that you try to minimize as much as possible your false positive rate without reducing your true positive rates too much, then you tend to take values around this. But if you want to make sure a true positive rate remains quite high, even at the detriment of the false positive rate, then you would tend to pick out values in this zone. So you have these two zones to pick your threshold from this is the main zone. And then you have this zone right here, this other zone and this other zone. So you have zone one and zone two from which you have to pick from. One other quick note is that if you have a problem like the one we're trying to solve where parasite is zero and then uninfected is one. So this is how the data set was created. And based on this we build our model. Then this means that this will be considered as negative samples while this will be considered as positive samples. Whereas in the real world, we'll tend to look at uninfected as negative and parasitic as positive. So you have to be very careful with these terms and know exactly how your data and models have been built. And that's why in our case where we're trying to avoid situations where our model predicts a fake uninfected output, that is a patient who actually has a parasite but the model predicts that's uninfected. There's actually a fake uninfected. This is a fake or false positive in our case. So we'll tend to minimize this number of false positives. Now, if your model was built such that parasitic is one and then uninfected is zeros, then it's clear that you try to instead minimize the number of false negatives since uninfected is considered as negative. That said, coming back to our problem, since our data set was constructed in this way, we try to minimize the number of false positives at our cost. But while doing this, we have to ensure that the true positive rate remains at the reasonable position. And so we could pick out a threshold of like 0.6265 given right here. Getting back to this, let's take 0.6265, we run that. We have here a number of false positives to be 87, which is gonna be smaller than when we're having a threshold of say 0.5. Run that again. You see, 87 is gonna be smaller than this 99, the value of 99 we're getting now. And so that's it for this section on metrics. Thank you for following up up to this point and see you next time. What's up everyone and welcome to this new section in which we'll build callbacks with TensorFlow. In this section, we'll look at how to build a callback from scratch by inheriting from the callback class. And then we'll build other callbacks made available by TensorFlow like the CSV logger, early stopping, learning rate scheduler, model checkpointing, and finally reduce learning rate on plateau callback. Don't forget to subscribe and hit that notification button so you never miss amazing content like this. Callbacks are methods we call during training, evaluating our prediction. This callbacks can permit us extract useful information from those processes we just listed or even carry out changes on those processes. Here in this TensorFlow documentation, we have TF Keras and then we have the callbacks. So this is what we have for the documentation. We could go through each and every one of this, but for the sake of this course, we are gonna look at the key ones. So right here, we have this callback, but before getting to this, we are gonna look at the history. The reason why we're looking at history first is actually because we've used it already. So we use this callback without really knowing that we're using callbacks. So yeah, we told this is a callback that records events into a history object. So remember the times we were having this year. So after training, that's when we're training, we start some information in this history. And then after training, we're able to come up with plots like this because we had this information start in this history right here. So that said, as you could see in this example, which is kind of like similar to what we've been seeing so far, we have this history and it collects all values we've been starting during training. And then we could print out this params, history.params. We get the params, and then we could also get the keys. So that's how this works. We've seen this already. And then we could look at this callback class right here. Now, the reason why this is like the most important of all this different callback classes is because this is kind of like the modern class, all that abstract based class, which can be used in building new callbacks. So if you want to build a callback, which isn't listed here, you could always get back to this and then build that callback from scratch. So here we have this callback class, which has attributes as you could see, params and model. And then it also has this methods, which we'll see how to implement very easily. If you look at this on batch begin and this on batch end, you'll see that the take similar arguments like here we have the batch, and then we have this logs. Then for the next, the epoch begin and the epoch end, they're actually taking the epoch and the logs. That said, we'll import the callback. So here we have from tensorflow, Keras, callbacks, we're going to import callback. This class, which we're going to be using and creating our callbacks, we run that. Then we define this callback class, which we'll note as call last callback. We have this last callback class, which inherits from the callback, which we've just imported. And then we will make use of the different methods which have been given to us in the documentation. Let's start with the on epoch end. So on epoch end, we're taking the epoch. And then what we're going to do is we are going to print out the loss values at the end of an epoch. So what we'll be doing is kind of similar to what we have already here. So let's do that and have print epoch number, let's say epoch number this, and then has a loss of for epoch number this, for epoch number this, our method, the model has a loss of this, and then we just format. So there we go, we pass in the epoch. Let's have this here and the locks. So just as in the documentation, where we add in the epoch and the locks. Now we are going to pass in the epoch and then the locks, but since we want to get just the loss, we could get the loss from this. So we have this dictionary here and we'll pick out just the loss. Now let's run this and then we'll see how to include a callback in the training process. So just right here, we're going to have callbacks. So the callbacks argument and we have this list. So in this list, we're going to insert this callback we've just created here. So we have the last callback, we should just pass in here and that's fine. So now let's rerun this, let's take this for say three epochs and then see what we get. We get in this error, we click search stack overflow and then click on this. You'd see that we have a solution just here. And what it said here is, what you're going to pass is the object and not the class itself. So that said, instead of passing the loss callback, as we just did, we're going to pass on this object. So we're going to put in the brackets. Let's modify those prints here and pass in a space. So we're going to have that to the next line. And then we rerun that again. So let's run that and that's fine. Then after training for all our three epochs, here's what we get. We see that unlike before where we just have this output, now we have this message output that is for epoch number zero, the model has a loss of this for epoch number one and then for epoch number two. Now what we could do is we could add this. So we have this formatted normally. So we don't start from zero, but it ends up from one. So we could say plus one right here. So that's it. Now another thing we could do is we could have on batch end. So on batch end, we have that and then we have the batch. We pass in the locks. This time around what we want to do is kind of similar to what we've seen already here. So we just print this out. That's fine. Take that off. So for the batch number, for this batch number, the model has a loss of this. So there we take in the batch. So that's it. And then we log out the loss. Now, yeah, let's just pick up this locks totally. So we have the locks and then that's fine. So yeah, we've put in the batch unlike here we'll pass in the epochs. We run this again and run this on this. You'll notice that this time around we have much more information, which has been locked out. Let's take this year and get right to the top. What we have here is for batch number one, the model has lost. So you see that this is after each and every batch. So here the first batch, we have this log and then the next batch we have this log and so on and so forth. And since our batch size equal to 32, this simply means after working on 32 different data points, we're going to have this locked out. And then the next 32, we have this locked out right up to the end. Now for the epochs, you'll notice that as we go on, so we're moving on to, let's get right up to 689. Oh, we have that and finally here we have 689. Okay, so for this last year, that's at the end of the epoch, we now have this locked out. So here we have for epoch, whereas previously we had for batch for all the different batches. In case you want to understand how we got this 689 right here, you could take the total data set size and then divide by the current batch size, which is 32. You should get this 689, which is given right here. So from this, we can look at the CSV logger. Now with a CSV logger, what we're actually doing is we are logging out this information in a CSV file or in some file, which we're going to define. So what we'll do is just simply copy this. Now this time around, we wouldn't actually have to recreate a class as we just did with the callbacks because there's kind of like a general way of creating these callbacks. Now with the CSV logger, we'll actually create the callback more easily. So here we have this and then we're going to specify the file name. So yeah, let's have the CSV callback equal that. Let's take this off and we specify our file name. So yeah, we just have file locks.csv and that should be fine. So we take that off. Take that off and then up right here, we have to import CSV logger. That's fine. We run that, we think it's okay. We get back to our callback and then we run this cell right here. Now note that this append is, as described in the documentation, a Boolean which tells us whether the locks we are currently putting in the CSV file or in the file in general are going to be appended on previously locked content or not. So when we have this as false, we're supposing that this is empty and so we are going to be putting information in this file for the very first time. So we've run this and then now all we need to do to take this new callback into consideration is to have your CSV callback. So let's make sure, let's take this batch locks out from here. So we are not going to take this last callback into consideration any longer. We run this. After three epochs, we're going to open this up and then we have this locks of CSV file which has been created. So let's have that. And as you could see, we have this CSV file which we could now download and then view it later. So here we have the accuracy, AUC and all the other metrics and lost values which we want to store. So that's fine. Next thing we could do is we could get back again here and then I'll select append true. So if we've done training the first time I want to redo the training process, we don't want to erase all the values we had previously. So here we set this append true around this again. After three epochs, we open this lock.csv file and what do you have here? You see, we have this new information which has just been appended on the previous information. From this CSV logger, we could now move on to this early stopping callback right here. To better understand early stopping, let's get back to the plots which we had previously. So let's take out this plot of the model's accuracy where we see how the train accuracy keeps increasing while after a certain point, let's see this point here. Let's take this off. After say this point, our models, even this point here, our model's validation accuracy doesn't increase any further. So what we have in here is something like this. We have this train accuracy which increases and goes towards one and then the validation accuracy which is something like this. Now in some other cases, you would even have situations where this starts to drop. Nonetheless, in this case, we have this plot where it just kind of like stabilizes and doesn't increase any further. Now note that this kind of situation is known as overfeeding. In overfeeding, the model starts to overfeed the training data. So because the model has been trained on the training data and not on the validation data, at certain points, the model stops or ceases to generalize because the aim of this training process is not to come up with a model which only performs well on the training data. We're trying to come up with a model which performs well on any type of data, be it a train, be it a validation or the test data. So if we're able to have a model which does the same or which has the same performance with a train and with a validation, with a test, then that model is an ideal one. But in this case, we see that as we keep on training, the model's parameters have been modified to suit only the training data. And this is very dangerous because at a certain point, you may feel like because you're having high training accuracy, your model's performing well. Whereas this isn't the case because when your model will be shown new data, like the validation in this case, and the test later on, this model wouldn't perform as well as it will do, or as well as it's doing with the training data. So to avoid this kind of false measurements, we tend to stop the training once this overfeeding status will occur. So this means that if you're training and then your validation, let's take, let's suppose we're stopping at this here. So we're training and then your validation data or your validation accuracy seems to be constant, whereas that of the training seems to kind of increase, then it's better for you to stop training at this point. Because after this point, the model parameters are just being modified to suit the training data and it doesn't really generalize, which is the case here because we're trying to extract some information from this data and make the model intelligent. So the model doesn't become intelligent by only modifying its weights or parameters based on the data it's been trained upon. It's intelligent because after being trained, it can perform well on data it has never, ever seen. So here we have this early stopping where after we notice that the validation accuracy doesn't seem to increase any further, we just kind of like stop the training and then use the model parameters from this number of epochs. So we could see that after say 12 epochs, we just stop the training. Now let's take this off and then do replicate something similar for the loss. So if we're having a loss, we could have something like this and then that. So yeah, we have the number of epochs and then we have the loss. So you could have a situation where you're having your training data, your training loss which keeps reducing whereas for the validation you see you would have something like this. So this is a typical plot for over feeding. Nonetheless in our case, the model over feeds but not that much. In some cases you will have a situation where this even starts to drop and where the loss starts to increase after a certain point. Here is the validation loss and then here is the training loss. Obviously the training loss will always keep reducing because we're training on this training data. So what we're saying is at this point where the validation stops reducing, it's important to just stop this training and this is known as early stopping. Now recall that the aim of callbacks is to be able to modify the training process, the evaluation process or the test process as a prediction process in an automatic manner. So that said, we shall make use of this early stopping callback right here which will permit us stop training automatically once we notice that a given parameter like say the loss, the validation loss doesn't drop any longer. So here we're just gonna copy this and then we just apply it similarly to while we are done with the CSV callback. So here we add this text and then we add that code, we just paste this out. Now we define this AS callback, that's early stopping callback and then we look at the significance of each and every one of these arguments. Now coming back to documentation, we have this monitor, quantity to be monitored. So by default, here we have this valid loss. This means that this callback will simply check on this validation loss right here and then once it stops reducing like see at this point, we're gonna stop the training. Whereas if we change this to say validation precision or validation accuracy, then what we'll be monitoring will be that accuracy or the precision value. So if you have them like this, we'll see we'll stop on this right here to ensure that we don't go and over feed on the training data. The next argument is this mean delta argument right here. So with the mean delta argument, we are defining a minimum change below which any change is considered as no improvement. So if we have a loss like this and that our mean delta is say a value of 0.1, then even if this loss reduces by a value of 0.5, then this callback will consider that there has been no decrease in the loss because the mean delta is 0.1. Now by default, the mean delta is set to zero. This means that any slight change is considered as a drop. So if we have even 0.0005, then we consider this as a drop in the loss. Now this is important because this has a patience, this callback makes use of this patience. With this patience, we are defining the number of epochs above which if we don't have a decrease in the validation loss, like in this case of the validation loss, we consider that we could stop that training process. And for the accuracy is the number of epochs above which if we don't have an increase in the validation accuracy, if we've picked validation accuracy for the monitor, then we'll have to stop the training. So we define this, well, we predefined this so that this could run automatically. Now for the mode, by default, we have the auto mode, but we could specify mean or max. Notice that when speaking about the loss, we spoke of a value or the number of epochs above which if the loss doesn't decrease. So here we're supposing that the loss is meant to be decreasing. And in that case, we are having a mode of mean. Now for the accuracy, we spoke of, for the patience, number of epochs above which if the accuracy doesn't increase. So in this case, we're having the max. Now, what TensorFlow permits us to do is to use an auto. And with this auto, TensorFlow automatically infers whether it's dealing with a mean or max. So this means that if you place in, for example, a valve position, this auto should be able to understand that a position should be increasing. And so it's going to use a max. Then we move to the baseline where the training stops if the model doesn't show improvement over the baseline. And finally, we have this restore best weights. With the restore best weights, which by default is false, we are simply saying that the model is going to take up its final state. So this means that if we start monitoring the model, say at this point here, or at this point where we have the lowest possible valve loss, and then the model, let's say we have a patience of five. So we are going to train for four to five epochs before stopping. So if after five epochs, we are on this, let's suppose that the feed epoch, we are on this. So we've added from here plus five epochs, we are on this. And then we are having this loss. It's clear that this model with this loss is less performant than this one. Now, if this restore best weights is set to false, then we'll just take the model's weights here, or the model weights which give this loss value here. Whereas if it's set to true, then it means we're going to take the best weights we've had throughout the training process and which happens to be the weights which provide this loss right here. That said, here we go. We have this, we have, let's say the patience to three, verbosity one, mode auto, baseline known, restore best weights false. So let's run that, and then all we need to do is to just include this year. So here we're going to have ES callback. Now we could take off the CSV callback, but you could always put all this together. Let's just let it, so we could see how all that works. So we just have this list right here, and we have this ES callback with the CSV callback. Now we run the training. We didn't train this for long enough to be able to observe any callback changes. So let's take this to 10 epochs, and then we reduce this to one or two. Let's take this to two. We run that again, so that's fine. And then we fit our model. After training for eight epochs, we see clearly here how the early stop in callback stops the training process. Now let's understand why this is, all this training process has been stopped. If you take a look at this validation loss right here, you would find that there was a drop here, drop increase, but after this increase, there was a drop. And since the patience is equal to, that's the patience we had defined here equal to, we have to get two successive increases or two successive same loss values before the training process could be stopped. So since we, after this, we have a drop, the training process continues, then we have this increase, and then we have this drop, so it continues. Then here we have this increase, and then here again, we have this other increase. So because now we have had this two successive increases, as we could see in the plot right here, you see right here, where we have this increase, drop, increase, and then increase. We now have the training process, which has been stopped, as we could see here. That said, we now move on to the learning rate scheduling. Up to this point, we've been used to training our models with one fixed learning rate throughout the process. So we could fix our learning rate as we did to say 0.01, and we use this same learning rate throughout our whole training process. But it happens that if this learning rate is too large, then we reach diverging, and if the learning rate is too small, it would take too long for our model to converge. So let's consider this plot right here. It's actually a very simplified plot, as actually what really happens is way more complex than this. So let's consider this plot, and we have this, and here we have the loss. Recall our aim and the weights. So here we have loss and then weights or parameters. And our aim is actually to modify this weights such that the loss is minimized. So our aim is to get to this position right here. Now we start with a case where we have a high learning rate. So if we were dealing with a high learning rate, then it would be easier for our model to find its way to this minimum position right here, to some position close to this minimum position, let's say at this point here. But the problem here is once it gets to this position, it could also very easily diverge from it. So it could very easily get back to another point around here, and then repeat this kind of process again where the model just kind of diverges. So you could have something like this, it could come down here and then get back to some point around this and so on and so forth. So we may have this case where the model doesn't really converge because the learning rate is too high. And then for the small learning rates, we may start training and then if say the model or if we find ourselves at this point here, that is we've modified the weights so that the loss value happens to be at this point here, it becomes difficult for us to get to this ultimate or global minima. Since the learning rate is too small and it changes, we're making very small changes. So we may find ourselves just staying around this local minima here instead of going towards this global minima, which is this. And so to bring in a balance, what we could do is when we start the training process, we could use a relatively high learning rate so that the model kind of approaches this global minima faster. And then after a certain number of epochs, we start or we modify the learning rate so that it becomes or it takes in very small values. And since it now takes in small values, we now start taking up very small changes such that we can get towards this global minima now without risking divergence. Now, one way of doing this is by say you could fix, let's say you could suppose that for the first 10 epochs, you train your model at a learning rate of say 0.1. And then from 10 to 20, you're gonna train the model a learner of say 0.01. Let's say we're dividing by a factor of 10. So we now move to 0.01 and the next 10 again, let's say we're training all this for 30 epochs and then the next we go to 0.001. So you could train your model and then after 10 epochs, you restart the training by modifying the learning rate in your optimizer and then again, you do this same year. But now the problem with this is you always have to be there to ensure that after the training, as after the 10 epochs, you modify this manually. Now, what if we're able to do this automatically? As usual, this is made possible by TensorFlow callbacks. With this callback, that's the learning rate shadow log callback, we could define a function which takes in the number of epochs and then modifies the learning rate based on the current epoch, based on a mixture of the current epoch and some predefined function. So as you could see here, we have the learning rate shadow log, it takes in a meta shadow and then we could specify the verbosity. Now, this is an example of this shadow log method which has been defined here. Here what it do is if the number of epochs is less than 10, then you're gonna use this predefined learning rate. And then in the case where the number of epochs is greater than or equal 10, then we start to reduce this learning rate in an exponential manner. So there we go, let's take this off. Now, what we're saying here is we're modifying the learning rate such that after that, we have in before 10 epochs, we have in this fixed learning rate. That's it. And then after this, we start to reduce this. So the learning rate starts dropping as we continue with the training. So now we don't really need to be monitoring the training manually because this callback will automatically modify the learning rate for you. We could simply copy out this example which has been given to us right here and then make use of it in our training process. So here we have to include this text and then our code. So here we paste out the scheduler. Let's do this. And then for this one, we have our learning rate scheduling callback. So learning rate scheduler. And then we do this import. So here again, we just have learning rate scheduler. We run that and then get back to this position right here. Let's do this. One, two, three. So that's fine. So notice that here we have this order. So we have the callbacks, learning rate scheduler. And we have CSV logger, LSTopping, learning rate scheduler under the callbacks. So that's fine. Let's get back to that. And then we are going to define. So here we've had the learning rate scheduler or this scheduler method, but we're yet to define our learning rate scheduler callback. So we have our, let's say scheduler. Scheduler callback equals learning rate scheduler. And then it takes in this scheduler method right here. So that's fine. Notice how the scheduler method takes in the current epoch and the current epoch number and the learning rate. So here we have this given to us already. And so we'll modify this to take, say for example, three. So after three epochs, we're going to modify the learning rate and then we could bring out the learning rate. So we could say, for example, the current learning, actually what we could do is let's take this off. Let's not have that. Let's specify the verbosity to be equal to one. So here we have the verbosity specified as one. And then we'll take off this. So you could also check out the CSV logger, the locked CSV file. So you'll notice how through our login these values since we just adding up or stacking up the values on the previous values we've had already. So for now, what we could do is take this off. Now let's take this off all this and then focus on just the scheduler. So we have scheduler callback and that's fine. Let's ensure that the cells have been around already. So that should be okay. Now we have in this arrow, verbosity, unexpected keyword arguments. Let's come back, get back to this year. And then we have always verbose. So let's modify that as verbose equal one. We run that again. That should be fine now. We could now get back to our training. So we're expecting that below three epochs, let's say equal three epochs. We have a given learning rate and then above that we have a learning rate which decreases exponentially. So that's it. You see here that as the training process starts, we did not really need to print out any value because when we set our verbose to one, it outputs this. So here we have learning rate, scheduler setting the learning rate to 0.00999. That's practically 0.01, which is what we had given here in this learning rate. So it's practically this learning rate setting. And then as time goes on, it's going to decrease its value and then always output the current learning rate. So as we carry on the straining process, recall that the aim of having to work with these kinds of learning rate schedulers is that we want to actually get the best of both worlds. So what we want is speed because a very slow or a very small learning rate doesn't actually speed. And so we want speed. And then we also want stability when training. So because we want this tool, we are going to use or modify our learning rates such that we always get this through our training process. And so that's why whenever we started our training, we have high learning rates, which could ensure speed. And then as soon as we have trained for a given period of time or for a given number of epochs and that we're trying to approach this global minima right here, we now seek for stability by reducing the learning rate such that we don't get to this point and have to diverge. So if you want a more stable kind of training process, what we'll do is reduce this learning rate. After the training is complete, here's what we get. You could see that we have this learning rate, which is now modified after a given number of epochs. So here you see how the learning rate starts decreasing as we go on with the training process. And so that's how we implement learning rate scheduling. But before moving on, let's check out this tutorial provided by the MXNet project developers, which talks of other different learning rate scheduling techniques. So here we have this learning rate scheduling with warmup as a slanted triangular. And as you could see, the learning rate actually falls between these two values here. So we have a learning rate between one and two. So you could always define your max learning rate and then your mean learning rate. So you could always have this. And then with the warmup, what we'll do is we are going to increase this learning rate linearly up to the max for a given number of epochs. And then once we get to this number of epochs, we now start decreasing this learning rate. So here the decrease is linear. So it's a linear function we're using. And then once it gets to the minimum learning rate, we just maintain that constant value right up to the end. So another learning rate scheduling technique we could use here is we have this linear increase. That's the warmup. And then we decrease this exponentially right up to this minimum value. Now this technique of warmup was developed in this paper by Priya Goyal et al. And they found that having a smooth linear warmup in the learning rate at the start of the training improved the stability of the optimizer and led to better solutions. Then from here, we move on to the cosine aligning scheduling method, where there is this smooth decrease in the linear rate, which is kind of resembling the cosine function. Now this is what a cosine function actually looks like. So here we have this and that. So here is our cosine function. And then if you notice, you'll find that this portion actually looks similar to what we have here. And now after we get to a given number of epochs, we just maintain that. So we have defined the mean and then the max. And then once we get to the mean, we just maintain that mean throughout. Then we also have this stepwise decay scheduling. Let's take this off. Here you see how we start with a warmup. So this warmup is kind of like a method used very much in practice. And then from here we have this stepwise reductions. So we go in for this fixed learning rate after a given number of epochs, we now drop the learning rate and then we drop it and so on and so forth. So here's our stepwise scheduling with warmup. Then from here, we have this cool down where we follow the stepwise method. And then after a given number of epochs, we now reduce the learning rate linearly and the term cool down. We also have this one cycle scheduling technique proposed by Leslie and Smith and Nicolet Topin. And here's what it looks like. You see that we increase that we have this warmup and then this linear decrease. Once we get to this initial position, we also have again this linear decrease with a different slope, with a smaller slope. And then once we get to this minimum or this final minimum, we now just actually just maintain the learning rate. Now finally, we have the cyclical scheduling methods originally proposed by Leslie and Smith. The idea of cyclical increasing or cyclically increasing and decreasing the learning rate has been shown to give faster convergence and more optimal solutions. So as you can see here, we are having the learning rate which is going to be bouncing from the minimum to the maximum, as you can see here. So we have this linear increase to the max and then linear drop, linear increase and your drop and so on and so forth. And then finally, we have this cyclical cosine aniline scheduling right here. We see how we have the cosine aniline and then we have this full cyclical process as we go from the highest to the lowest. And then this next cycle from the meat between the mean and the max, that's this right to the mean learning rate. And then we take up this final cycle from this quarter of the total. So we had 0.5, which is a quarter of two. So we start from this and then right to the mean. Now notice also that as we go into the cycle, the cycle lengths keep increasing. So we move from this to this length and finally this length. Based on the scheduler you want to implement, all you need to do is to modify this scheduler method right here. The next callback we'll be looking at is that of model checkpointing. In this model checkpoint callback, we are able to save the model's weights at some frequency. So unlike previously where after training, like with this after training our model, we get to load or rather we get to save the model like here. So here we save the model, we save our weights. After we're done with the training here, we could be doing this weight saving during the training process. And thanks to this model checkpoint callback, now as usual, we just copy this and then we could check out on this documentation for the significance of each and every one of those arguments. Here we have the file path. So this is the file where we are going to be saving our model. So here we have our checkpoint callback and that's it. We have model checkpoint here. We're going to define this file path. And then what are we going to be monitoring? We're going to be monitoring the validation loss, which is what's given to us by default. Just like with the early stop-in, where we had this monitor right here, we are going to look at our validation loss. And supposing we have this validation loss and then our training loss. So we're going to look at our validation loss, which is this, and then save the model weights when we have this mini model or the smallest validation loss right here. And then if you set the save best only to true, then we'll save only this weights, whereas when it's false, we'll save this and also save the latest weights. And then for this argument save weights only, if it's set to true, then we're going to save only the weights, whereas if it's set to false, then we're going to be saving a model. That's a full model with its parameters. Then we have the mode, which is automatic, like since it's validation loss, since it's a loss automatically, this would be a mean. Whereas if this is an accuracy or precision, we're going to have a max right here. So let's take this back to auto and then we have validation loss. Now the save frequency when set to epoch simply means we're going to do the saving after each and every epoch, but we could modify this to say three. So after three epochs, we're going to verify if our current model is the best of all the previous models, that's if we've set this to true. So if we've set this to true, we're going to verify if this model, the current model is the best, if it's the best, then we override this file path, that's override the file, the weight file we've saved. And if it's not the best, we continue training. So that said, let's modify our file path and then let's call this check points. That's it. So we have the checkpoints and then we run this. So we have that. And then we add this callback right here. So we have the checkpoint, checkpoint callback. That's fine. We run the training now and see what we get. After training for over 10 epochs, we have this two logs, which you could notice here, this one where we've been told assets reaching to checkpoints assets. So you could open up this checkpoints here. And then you'll see that we actually saving this model. And then we also have this log here, that's after the seed epoch. Now let's understand why this login is done after the first and also after the seed epoch. Just when training starts, we have a validation loss of 0.47. And so we're supposing that since this is the first then, the model sets its best state. Then it moves to one, which is greater than this. So the model state or the best state is this one after the first epoch. This, the model, the first epoch or the model state of the first epoch is to the best. This doesn't beat the record. This doesn't. And then you'd see that here we have this validation loss of 0.24. That is why the model is now saved. From this model checkpoint and section, we'll look at the reduced learning rate on plateau. And then later on, we'll have a dedicated section for tensor board. The way this learning rate or rather reduced learning rate on plateau works is that, if you start training with a fixed learning rate, and then say you've trained for over 100 epochs, and that the model performance after 110 epochs, so let's say we have 110 right here. So after this 110 epochs, our model performance doesn't improve. So we still fix our learning rate. But since for the past 10 epochs, we haven't had an improvement in the model's performance, what we'll do is we'll reduce the model's learning rate by a given factor. So here is the factor. The patience, our patience here is 10 million that we're gonna wait for 10 epochs before deciding whether we're gonna reduce the learning rate or not. The verbosity as usual, the quantity to be monitored, in this case, the default is the validation loss, the mode automatic. So if you want it to be mean, you could specify mean or max. But setting it to automatic allows TensorFlow to automatically decide whether it's a mean or max. In the case of validation loss, obviously it's gonna be a mean since our aim is to reduce the validation loss as much as we can. Then we have the mean delta, which takes in a default value of 0.0001. And this means that if after say 10 epochs, we have a change in the validation loss, but this change is less than this mean delta, then we'll consider that that isn't a change. And hence we are gonna still reduce the learning rate by this factor right here. We also have the cool down and then the minimum learning rate below which we wouldn't wanna drop our learning rate. As described here, this cool down is a number of epochs to wait before you resuming normal operation after the learning rate has been reduced. We are gonna create this callback very easily. And this time around, we're gonna monitor the validation accuracy. So if the validation accuracy doesn't increase after two epochs, we are gonna reduce the learning rate by a factor of 0.1. Meaning that if we had a learning rate of say 0.01 and that this validation accuracy hasn't increased after the two epochs, we are gonna reduce this such that this becomes 0.01 times 0.1, which is this factor right here. Just as before, we are gonna add this plateau callback and then train our model. After training for five epochs, we can observe that we have this learning rate which has been reduced after the third epoch. And this is because after these two epochs, there is no improvement in this initial validation accuracy. And so that's why the accuracy is gonna drop from 0.01 to 0.001. Now that's it for this section. Thank you for getting up to this point and see you next time. Hello everyone and welcome to this new section in which we're gonna look at different strategies of combating overfitting and underfitting. This strategies will include data augmentation as you could see here, dropout, regularization, early stopping, smaller network usage, hyperparameter tuning and normalization. So far in this class, we've mentioned the terms overfitting and underfitting without really getting in depth to what this two actually mean. Now for overfitting, we'll consider this plot of the loss versus the number of epochs and then precision versus number of epochs. With a loss versus number of epochs, what we'll generally have when a model overfitting something like this. So here we have the training, the validation loss and that here we have the training loss. So this is just like a general way of looking at this though this may take different forms. Now that said, as you could see, we have this two sets and the loss versus number of epochs plots. Clearly we could see that the validation and the training set initially start up with a similar pattern. Sometimes you may even have the validation which comes up like this. So you can even have a situation like this where the validation performs even better than the training set initially. But generally when a model overfeits, what we'll have will be this kind of model where at some point the model keeps doing well at the level of the training and starts doing very poorly in the validation set. We could replicate this with a precision epoch. This could be precision accuracy recall or some other metric which we've chosen. And what we could have will be something like this. Obviously you saw in the practice, it isn't always like this, but something in this sense. So you would have this here, you see you have the training and then you have the validation right here. So what goes on is your model keeps on doing very well for the training and then with a validation at some point it starts doing even a very, very poorly. And so the danger with this is if you are considering working only with a training set, you may feel like you're more epochs or training over more epoch is a good idea because you keep having this great results. Like supposing we have fixed this year and that your training is like say 99%. So you have 99% precision score on your training data and you're like, wow, this model will do very well in the real world. But this isn't generally the case because this model actually has been over-fitted on your training data. Your model has learned instead to modify its weights based on the training data instead of being able to extract useful information or some intelligence from the data which has been used in training this model. The main cause of overfitting is having a small data set and a large and complex model which contains so many parameters. Now, if a model like say deep neural network has many parameters and you're giving it a small data set, then obviously it's gonna adjust its parameters such that on this small data set, it performs exceptionally well. So you may come across a model which has say 99.9% precision or accuracy just because the data you've used to train the model has or was very small. Now, in another case, you may have say a moderate sized data set and then not just this kind of model but a very large model. So at the end of the day, we notice that there has to always be a balance between this data set and the model size. So this means that even if you increase the data size and then you also increase the model size, your model may still risk overfitting. To better understand this concept of overfitting, let's take this simple example. Supposing that you have three subjects to master a school, let's say math, English and sports. But when kids get to school, they are only taught mathematics or they only taught English or only taught sports. Let's choose for example, mathematics. So these kids get to school and from the first year to the last year, they taught only mathematics. It's clear that when you evaluate this kid or when you pick a kid at random and evaluate that kid in mathematics, the kid would tend to have a better or an above average result in mathematics compared to kids from other schools. But when tested on subjects like English and sports, that may not be the case. And this is because the English and sports weren't taught at school. And so if you are evaluated on only what you were taught, you would tend to have very high scores like this. And then what if we started to evaluate on stuff the kids were not taught? You'll notice that those kids will then start having poorer results because those kids haven't taken out some time to master English and sports. And so you may end up with a kid who was able to show for example, that sine squared x plus cos squared x equal one, but can't tell you the past tense of it. Now it's clear that to readjust this situation, the kids have to be taught all the subjects such that there is a balance that those kids need. And obviously this balance will come in a way that the kids now perform better in these two subjects. And because they have had part of the time they used to study maths allocated for English and sports, they may perform slightly less than before in maths, but at least what's important to notice this time around, they can now express themselves better in English or practice some sports. Let's now take this other example. Supposing we're training a model which predicts the presence of a car in an image, and then you feed your model with this kind of data. Then after training your model, where you get to test that model is on this kind of real world data. It's clear that even if you had very great results with a previous data, in this working with this or testing on this new data set right here, you wouldn't have those amazing results you had previously. And so it's important that not only should your data be large enough or get as much useful data as you can, but ensure that your training data represents or looks like your test data or looks like what you're going to be having in production, as that's really what matters. What matters is the fact that you have a model in production which works well, or which has a high accuracy and not a model in your notebook, which has say 100% accuracy on your training data, whereas on production, it doesn't perform very well. Now moving on to under feeding, it turns out that here our model becomes way too simple for it to even be able to extract information from our data. So we may have our validation data and then our training data like this. And then there is this huge gap between our current loss and the minimum possible loss. We could also have this at a level of say, let's say accuracy. So yeah, we could have accuracy and then we have our validation and then training and we have say accuracy 100%. Let's say we have one right here and then our model is still too simple that we just end up say at 0.6 or 60% accuracy. In this kind of situation, our data or the relative size of our data as compared to the model may be too large. So you could even have a situation where this data is smaller than what we had here. So your data could be small like this, but if your model is way too simple, say we have just this very small model, then you may face this problem of under feeding. It also turns out that sometimes you may have a situation where you have even a very complex model, but that model still under feeds. And that's because that model hasn't been built in a way that it could extract useful information from this data. Now, if you could recall in the section where we're predicting the car price, we had a situation where a model, we use a simple dense layer. We use a simple single dense layer with just two parameters and we have this fixed data set. But once we increased our stacked up more dense layers, we found out that we're able to get better training and validation mean average error values. There are several ways in which we could mitigate this problem of over feeding. The very first one we'll look at is that of collecting more data. So it's important to lay hands on as much data as you can. This data has to be representative of what the model would see in real life. And this data should be as diverse as possible. Even after collecting more data, to solve this problem of over feeding, we could use data augmentation. Now, what is data augmentation all about? Supposing we have this image cell right here, the cell image we have here, which happens to be pasteurized. Now, instead of having just this in our data set, we could have this image right here modified such that we now have more data to train on. So this means that in the case where initially we had say 20,000 images, so we have 20,000 images. Now, after doing data augmentation, after modifying each and every image, we now have a data set of 80,000. Now we're considering 80,000 because we're supposing that each image is gonna be flipped as we've just done right here. So we take this image, we rotate it, we have this other image, the same level obviously still parasitized. And then we flip again and get this image. We flip, get this image, flip, get this image. We see that we now have, instead of just this one, we have four others. This actually means we're multiplying this by five. So this is 100,000 since we have it now, one, two, three, four, five examples for this single example we had initially. It should also be noted that there are many other data augmentation strategies for this kind of image data. So apart from flipping as we've just done, we could crop just a portion, we could add some noise to this data, we could modify the contrast, we could modify the brightness and carry out so many other operations. And there is no particular data augmentation strategy which works for all problems. This means that when you have a particular problem, you will have to try different augmentation strategies and then be able to select the one which works for your data. Now, that said, we have dropout. To better understand this notion of dropout, we'll consider this simple neural network right here. Now, if you could recall, the reason why we have models which overfeed is because we are working with very complex models with many parameters. Now, in order to reduce the complexity of this neural network, what we could do is take off, for example, this interaction between this neuron and all those previous neurons right here. So this means that when training our model, we are only going to consider that we have in this hidden layer just this two neurons right here. So all those connections here become useless. Now, this has the effect of simplifying our network as what we have as output. Now, that's after carrying out the dropout operation, looks like this. So we have now a connection which look like this. Now, this particular case is an example of a dropout. We drop our ratio equal 0.3, or let's say 0.333. As we're dropping out exactly one third of all the connections, or rather one third of all our neurons right here. And if r equal two third, what we'll be left with will be this. So we'll take this off and we'll be left just with this neural network to train right here. We see that we could leave from this very complex model to a simplified model via this dropout operation. And this has an overall effect of mitigating overfeeding. The next step we could take is that of regularization. To better understand regularization, suppose we have this model with weights WJ. So we have say n weights, and this weights are free to take up any value. As we've seen previously, the fact that this weights can take up just any value may lead to overfeeding. As now, this weights can be adjusted to fit on the training data in a very perfect manner. So this means that we could have a model which picks out each and every point like this. So we have this kind of model and we have this. So we have this model which picks up every point like that. Whereas if we restrain this weights to stay in a given range, then we may end up with something like this. So we may end up with a model which looks simplified like this because it doesn't have as much freedom as this other model. Now the problem with this model is if now you're putting this new data, you'll find that this will try to pull out like this and then your prediction will be somewhere around here. And so if, for example, we're having horsepower in the x-axis and then year, want to predict the price of a car, let's do same year. We have the price and then the horsepower. Then it happens that we have this car with this very high horsepower. This model, because it is overfitted on this training data, will predict this very low price. Whereas this model which is generalized on this training data will tend to predict a more reasonable price. Now it should be noted here that because this model was able to go through each and every point, we would have a training loss of almost zero. Whereas now when we give it this new data, it doesn't perform well. Whereas with this, we wouldn't have a training loss of zero. But when given new data, at least we're going to have reasonable predictions. So coming back to regularization, our aim is to ensure that this weights, because this model can be represented as this function and this function is made of this weights. And so since when we're doing training, our aim is to ensure that we minimize the loss, then we could include this weights in the computation of the loss. This means that we have our loss, which is now equal. The loss would have normally plus the regularization constant times the sum of this weights of each and every weight square. Now this is known as L2 regularization. Whereas here we have L1 regularization where we're summing up the absolute value of each and every weight. For now, we're just going to explain how regularization helps in mitigating that problem of overfitting by restraining this weights in a given range. So let's have your loss equal L does initial loss plus let's call this R. Now, if our aim is to minimize this loss, then obviously this L will be minimized and R will be minimized. And so if we're trying to minimize this year or this sum in general, then it would have that overall effect of restraining these weights in a given range. Especially as we know that when we square very large values, these values become even larger. And so to avoid this, our weights will tend to take up smaller values which fall on the smaller range. This L2 regularization is also known as weight decay. It should be noted that the main difference between this L2 regularization and L1 regularization is that in trying to restrain the range of values which this weights can take up, the L1 regularization has that negative effect of making many of those weights to take up values around zero. That's values very, very small or take up many zero values. So this will lead to sparse models as compared to the L2 regularization. And that's why in practice we generally use the L2 regularization. That said, we move on to L stopping, which we've seen already. In L stopping, as we have seen, if we have our model, let's say we have precision, validation precision and then the training precision which keeps on increasing. And then we have this limit year of one or 100%. So we have the precision and we have our epochs. And then after a while this starts dropping and this starts dropping simply because our model is now trying to over feed on this data it's been trained on. And so we have to stop training once we notice that the validation performance isn't improving any longer. So this means that after a certain number of epochs we are going to stop the training. And we've seen this already in the previous section, we've seen it both to really go like this and then we had seen it practically. Then another thing to do is to reduce the size of the network or use a less complex network. Our next step will be to properly tune our hyper parameters. Hyper parameters like batch size, drop our rate, regularization rate and the learning rate can affect our model and dictate whether this model will overfeed or not. Now, if we look at the batch size, training with a larger batch size may speed up our training process, but working with smaller batch sizes have a regularization effect which help reduce overfeeding. And so according to Yann LeCun, frames don't let frames use many batches larger than 32. For the drop our rate, we'll see this already. Increasing the drop our rate means we are making the model simpler and the regularization rate, increasing the regularization rate means we're reducing the effect of overfeeding. And then finally, picking too small of a learning rate may lead to overfeeding. So in general, we have some hyper parameters to tune and they are not only limited to this as you may have many other hyper parameters depending on your problem. Now the fact that normalization introduces extra parameters, mu sigma, which bring in some noise in the model has that regularization effect which help reduce overfeeding. So if you're including batch norm in your model, then you could feel free to reduce the drop our rate. Since this normalization layer already brings in that regularization effect. From here, we look at ways of mitigating the problem of over underfeeding. So with underfeeding, we could use more complex models. We could also collect more data. We see that this solution falls in the tool that's both overfitting and underfitting. Here is always a good thing to collect more data or more clean and representative data. So from here we have, you could also improve the training time. Now note that you could have this model, like let's take this, and then we have here you've trained and then you have the validation. So here you have the valve and then you have the train and then you've been training for over say a thousand epochs. And then you feel like the model may not perform any better. Now, several scientists have reported that many times they've given up on a model and then come back later after forgetting to stop the training process and notice that this model kept on performing were much better. So sometimes you don't have to give up on your model. So you could train or you could increase this training time. Then again, we have hyperparameter tuning, which could help in making your model more performant. And we have normalization, which stabilizes the training process and leads to better performance in the model. We now see practically how the dropout could be implemented with tensorflow. So here we have this dropout layer, which takes as argument the rate, the dropout rate, noise shape, and the seeding. To better understand how and why we need to use seeding, is simply in the case where one reproducible experiment. So if wanna apply dropout in this layer with a dropout rate of 0.2, then we'll be taking off one neuron out of this five neurons to make our model simpler and avoid overfeeding. Now in doing so, we may take this one, or this, or this one, or this, or this other one. So it's a random choice. Now, if we wanna fix this choice so that this experiment can be reproducible, then we can set this seed so that each time we run the experiment, it's going to be exactly the same neuron, which is going to be taken off. Getting back to the code, the way we could use this is by importing the dropout layer. So here we just have this dropout, we run that and it's fine. And then we could include this dropout right here. So we could have here dropout, let's say dropout rate, dropout rate equal 0.2. Now you could always increase this rate. So let's have that. And then we have the dropout and then dropout or add a rate equal dropout rate. That's fine. We could piece this out here, but we wouldn't wanna add dropout here. Anyway, you could always include the dropout at this level, depending on how the model responds to this dropout, which has been added here. So let's add this dropout and then add the dropout right here. Bearing in mind that we could always add more dropout layers and increase the dropout rate or even reduce this rate. So that's fine. We run our model. And as you could see right here, this dropout has no parameters. From this, we look at regularizers. We'll see how to implement the L2 regularizer and the L1 regularizer. So yeah, we have TF Keras and then we have this regularizers right here. So if you select L2, you should follow on this page. Now, once you get this, you see you have this TF Keras, the regularizers, L2, and then you specify the regularization rate. So let's copy this out and then get back to our model. At the level of our model, let's come back to the definition of the Conf2D. So here we have this kernel regularizer and this kernel regularizer that we use in carrying out regularization. So here we have this Conf2D and then we simply specify the kernel regularizer. We have kernel regularizer, which is equal this regularizer right here. Now let's take this off. And there we go. We have this kernel regularizer, which is now our L2 regularizer. You could always take this off and then from here, and you have from tensorflow.keras.regularizers import L2. So that's it. You could also import L1. So run that. Let's correct that. That's regularizer, regularizer. Okay, so that's fine. We get back to our model right here and then we have just L2. Now you could have this taken off from here and then you add it up here. So pattern valid, activation relu and that. Okay, so we have this and then we add our regularizer and that's fine. Now you could also do this for the dense layers. So just right here, you could have kernel regularizer and that should be fine. So this is how we implement the weight decay with tensorflow. Now you could always modify this parameters. So let's have decay or regularization rate 0.01. So that's it. And then we run our model, that should be fine. And we could go ahead and retrain this model. That's it for this section. Thank you for getting up to this point and see you next. Hello everyone and welcome to this new section in which we'll delve deep into implementing data augmentation with TensorFlow 2. The first method we'll use in implementing data augmentation will be using this TensorFlow image model, which is made of these different functions. Now this is just a few of those functions and you could check out the rest of the documentation. So we'll be able to do stuff like adjusting brightness, contrast, gamma adjustments, saturation adjustment, flipping left, right, flipping up, down and rotation. Using this TF image is known to be a more flexible way of implementing data augmentation as we could alter an input image with all these different functions given to us. The next method we'll use or we'll implement in this course is by working with the Keras layers. And although we're limited by the number of data augmentation layers made available to us, this method permits us to carry out data augmentation more efficiently and hence speeds up the training process. Now to find a balance between these two methods, we could implement our own custom Keras layers. And that's exactly what we're going to do in this section. We're now looking at how to implement data augmentation with TensorFlow. So we're going to get into TensorFlow images. So here we have TF, here is layers, we actually in Keras closes up and TF.image actually. So we have the TF.image with all these methods which we could use in data augmentation. We call that in order to do data augmentation, we're basically modifying the images while keeping the labels fixed. So that said, as you could see here, we have this several methods. Let's check out on this overview and we should be able to get this list of methods. So here we have the list of methods and we have the categories. So here we have image adjustments working with bounding boxes. Here is for object detection where you have to get into this. Here is cropping, flipping, and decoding and encoding. So this is what we have. Now if we get back to the top, we'll have these adjustments. OK, so we have this resize, which we've seen already. And then we have these adjustments. We could adjust the brightness, contrast, gamma, hue, JPEG quality, saturation, brightness, contrast, hue, saturation, and per image standardization. Now notice that these ones here are random. So we are adjusting our image randomly. As you could see, we have random brightness. Whereas here we're actually adjusting this brightness with some fixed parameters. So if you click on this, I just brighten this right here. You see, we have this image and we have this fixed delta, which we choose. Whereas if it's random, let's get back. If it's random, that will be fixed randomly. So click on this random brightness. You see, we are just given a max delta because all we need to pass in here is a range. And then we are going to randomly pick a delta in that range. So that's it for this image adjustments. We also have this cropping. So we could crop out some parts of the image. We could crop or do a central crop. We could crop and resize. So this means if we have this image, so we have this image. And it's 224 by 224. And then we want to do a crop. We could do a center crop like this. We could take this center crop. Sorry for that. We have this center crop. And we have this new image right here, which is this center. Now after getting this image, let's suppose we have now this smaller image of shape, say, 150 by 150. And then what we could do now is to resize this so that we get this shape, which we need to pass into the model. So we could resize this. Hence the reason why we're using the crop and resize right here. And you could as well just crop and then do the resize manually. So that's it. We have flipping, rotating. So flipping, we have an image. We flip it left, right, up, down, random, randomly. So let's get the same thing. But this time around, we're doing it randomly. Rotation and then the transposition of the image. So as you can see, TensorFlow gives us all these methods which we could use in modifying our images and hence augmenting our data. So here we will define this method, visualize. So you could see this clearly. We have the original image. And then we have the augmented image. Now in here, we have subplot. We want just one line, two columns. And this will occupy the first position in that two-colon space. So here now we have this image show, which takes in the original. And then we repeat this. So here we have this second position. And then we have the augmented. Augmented. OK, so that's our visualized method. Now what we'll do is we are going to get the original image. So let's say we have this original image, which is equal an element which we take from our data set. So we have train data set. And then let's just pick one element. So we have this. And then we take just one element from our data set. Now that's it. And this should output a level. So we should have the level. And there we go. So we have the original image here. And then we now work with augmented image. We get this augmented image. And to obtain this only augmented image, we are going to modify this original image. To do this modification, we are going to make use of these methods, which we've seen already. So yeah, let's pick out this flip left right. Click on this flip left right and see what we get. So all we need to do is just pass the image. So let's copy this. And then while we have here is our flip left right and our image, which has been passed. That's fine. We have original image. So that's it. Now we have this augmented image. We run that. And then we could go ahead and visualize this augmentation. So here we have original image and augmented image. OK. So we run this and see what we get. So that's it. You see that now in your data or in your data set, you will not only get this image right here. You will not only get this one, but you would get this and this. So this means our data set now will be multiplied by two. Let's check on other augmentation strategies. Let's get back. Here we go. We have flip up down, this similar to what we've seen already. OK, let's check out. Let's do this random flip up down. And then let's click on this. And then rotation by 90 degrees. So here we have this image, similar. And then they could specify the seed. So here we have the random flip up down from that. And then here we have random flip up down. OK, so we run that again. We have augmented image. And then we visualize. You see that it happens that it's exactly the same image, which is outputted. Let's run this again. And now what we get is instead this. So notice how there's a difference between these two images. So that's it. The next will be this rot 90 degrees. Click on that. You could always feel free to click and then understand what it actually means. Like, yeah, we're told to rotate image counterclockwise by 90 degrees. Now let's take this off. We know clockwise is this direction. So rotating counterclockwise means we'll be rotating it in this direction by 90 degrees. So that said, let's take this off. And then we have rot 90. We run that. And there we go. OK, so as you can see, we've rotated this by 90 degrees in the anticlockwise direction. We can also try out this adjust brightness and random saturation. So let's get back to this notebook. We have your adjust brightness. And we run this visualize. We have this error missing positional argument. So let's get back to this and understand how it's used. We check on adjust brightness. And there we go. We need to pass in a delta. So we have to pass in this delta. Now this delta should be in the range negative 1, 1. And as you can see, when you add this delta, it basically just adds up to each and every pixel we have. So this 1 turns to 1.1. This 11 turns to 11.1, and so on and so forth. And we told you that this delta is a scalar. And it's an amount to add to the pixel values. So let's go ahead and add this delta. Let's have that. We add the delta. And we have 0.1. So we run that again. And this should be fine. OK, so that's fine. We now visualize. And that's what we get. You'll notice that there's some difference with this, like this one appears brighter than this. And let's go ahead and increase this. Let's take 0.8 and then have that. OK, so you see, we are able to modify this brightness. We could also include the random saturation. Random saturation. And we have the original image. Let's get back to documentation for random saturation and see what we need to pass in. So there we go. We have this image. We pass in the lower and the upper limit. So here, let's find this lower upper limit. We are told here, we're going to get an error if the upper is less than the lower, which is logical. The lower should be less than the upper. And then if the lower is less than 0, so we have to ensure that we're dealing with values greater or equal to 0. So getting back here, we have saturation, that. And then we have lower. Let's say 2. And then upper, let's say 12. Take this off. Run again and visualize. OK, so that's fine. So you see, we've added some random saturation. And this is what we get. We could obviously reduce this. We could make this fall in the range 0, 1. And here is what we get. OK, we now check out on cropping. Let's check out the central crop. Here we have central crop, central crop, and the original image. So that's what we have. Let's make sure we understand exactly how it's meant to be used. So here we have this. And then there's a central fraction. So here they explain how this works. Now, where x is the central 50% of the image, it is a float which lies between 0 and 1. As you can see here, 0.5 has been picked. Meaning that we're going to pick 50% of the image. But this portion of the image we're going to pick has to be centralized, has to be surrounded in the center. So that said, we've understood how that works. We can now put that, let's say, 0.3. Run that and see what we get. OK, so you see, we get this portion right here. Now, what if we expand this to, like, say, 0.8? We should. We expect to have some black regions now. Run that, and you see, we have some black regions. So it's actually centralized. Let's take this off. You see, it's actually centralized. And the reason why we have something like this, it's actually cutting out like this and something like this. So this is what we get in here, something like this. OK, so this is what we get when we do this central crop. Now, the way we're going to integrate this augmentation in our data pipeline is going to be similar to the way we did with the resize rescale. So just like we used this map method, we are going to reuse this same method for augmentation. Here, let's suppose we are going to add this code and then define our augment method. So we have this. Let's put this down. And OK, so let's just have your augment. We have this augment method, image, level. And then what we'll do is we'll take the image. And then for that image, we're going to have the rotate 90 degrees of that image. Then next, we are going to have adjust saturation. And then pass in the image and specify the saturation factor. So here we have our saturation factor equal, say, 0.3. Now you could always feel free to visualize this so you don't get to poorly augment your data. That said, here we could have this saturate. Always don't copy that. Anyway, this flip left, right, let's come back to this. We have this adjust saturation. We could check that out. And let's have it here. OK, we have that. So we run this and visualize to see exactly what it's going to look like. We modify that image to original image and then run it again. OK, so this is what we get. Now that's fine. We could also include the flip left, right. So here we have image, tf.image, flip left, right. And then what we do is we could return the image and the level. So that's fine. Then we could also just simply include this resize with scale in this augment. So here we have image and the level equal resize with scale, which takes in the image and the level. So we resize and rescale before doing the augmentation. So that's it. Now we have this augments meta defined. We could now, instead of doing this here, so here we have augments. Now another thing we want to do before doing the map is actually shuffling. So we'll modify the order in which we're doing things. Now here we have this train data set, could have that. And then we have the train data set right here. .shuffle,.match, and.refresh. So yeah, that's what we have. And then we just include the mapping right here. So instead of doing the mapping before the shuffling, we'll just do it after the shuffling. And there we go. Let's now have this, it's basically this with augments. So we just have that. And then there we go. We have this map. OK, so we now have that for the training. And then we could close this up. So that's fine. We repeat the same process for the validation and the testing. So yeah, we're going to modify just as we did with the training. We have here our validation. Then we include this map right here, map, resize, rescale. Now notice how we are not augmenting the validation and test data sets. So here we have the validation. And then we now do the same for the testing. Anyway, for the testing, we only did the map. That is, we only did the resize, rescale. So that's fine. We will now take this off, train validation. That's off. That's fine. OK, so here we have our test data. We have our train data. And then we have our validation data. At this point, everything is set up. We could now rerun ourselves so we start out with training. So that's fine. We will come back, get back to this. We have our data. We run all the cells and our model created. So yeah, we're going to work only with this sequential API and keep out all this other methods of creating models. Once we're done with that, we get right up to this level. OK, so now we get into training. That's fine. Compile the model. And we wouldn't use any callbacks for now. Take that off and start training. We're now done with training. And we'll see the results right here. So as you can see, the model performs very poorly. And we'll try to understand why the model doesn't perform as well as it performed without the data augmentation. Now, if you can recall, what we did for the data augmentation was that we rotated the image by 90 degrees anti-clockwise. We adjusted the image, the image's saturation. And then we flipped the image left right. But taking a look at the kind of data set we're dealing with, this saturation shouldn't be a great idea. And this is simply because if we look at each and every one of this, what differentiates a parasitized cell like this one and this uninfected cell is this patches we have right here. So you'll notice that with a parasitized cell, we generally have these kinds of patches. Whereas with an uninfected cell, we particularly have no patch. And with this, whereas here you could see these patches. And so rotating or flipping left right wouldn't change much about that. So that is acceptable. Now, the problem with modifying the contrast is when you modify the contrast, you tend to make the parasitized cells and uninfected cells less differentiable. And so it isn't a good idea in this case to modify the contrast or saturation of the image. Let's take this example so we could clearly understand why this saturation data augmentation strategy isn't a good idea. So yeah, right here, what we do is we just copy out this and then paste here. We take out just two elements from our dataset. And then here we have this 1, 4, and 2i plus 1. And we have the image show. From here, we create another subplot. There we go. We have that. And this one now is the saturation. So we have image.adjustsaturation. That's it. And then we'll pass in the image. And that's what we have. So here we have this first subplot and this other subplot. We have that, too. Now we could run this and see what we get. I'm using the positional argument. That is the 0.3, the saturation rate. So here we have 0.3. We're using exactly what we used previously. So here we have that. And this is what we have. In this example, we have both an uninfected cell. So we wouldn't actually see much difference. It's true this was modified. This was modified. But there isn't much difference. Let's take a parasitized cell. Let's rerun this again. Hopefully we get a parasitized cell. OK, as you could see here, let's take this off. As you could see, in this parasitized cell, what could clearly show the model that this is a parasitized cell is this patch we have right here. But when we adjust the saturation, it turns into this. And you'll notice that this parasitized cell now looks more like an uninfected cell. So the model isn't able to differentiate between these two clearly. So we see clearly that using the adjust saturation de-developmentation strategy isn't a great idea. Let's comment this part. And that's fine. Now we're going to modify our strategy. So just right here, what we had previously, we're going to take this off. So we comment this. And then we'll retrain our model. And so here we retrain our model after modifying our de-developmentation strategy. We run this again. And after training, as you could see, we get the results. Now we check out this loss. This is what we get for the loss. And if we look at the accuracy, we see that it looks more like what we had previously. Though we still have that maximum validation accuracy of about 94% and the trained accuracy of about 99% right here. And though the de-developmentation strategy hasn't closed this gap we had between the training and validation accuracy, we're seeing other sections how de-developmentation will be very instrumental in helping reduce this gap between the training and the validation scores. So far, we've been applying de-developmentation by using the image methods. Now the other way we could apply de-developmentation is by directly making use of Keras layers. So you come right here, tf.keraslayers, as you could see. And then you would notice that among those layers, we have those layers which permit us do modifications on images, like with this random contrast, which permits us randomly adjust contrast during training. Here we have the definition of this layer, this random contrast layer. And we have those attributes right here. Now, apart from this random contrast, we have the random crop, we have the random flip, and you could always get all these definitions, get back to, yeah, let's have this flip. Now the mode horizontal and vertical, so we're going to flip it both horizontally and vertically. We could also have other modes like horizontal, that's kind of a left-right flip. We have vertical, kind of up-down flip. We have horizontal and vertical, which is a mixture of the two. And the default is horizontal-vertical, as we've just seen right here. We also have this random rotation, which you could see, random translation, random zoom. So you could randomly zoom the data you're training on. And you also have this resizing. So we had looked at resizing previously, but not actually under the documentation. So we did the resize using TF image. Now you could also do this resizing, yeah, this resizing layer. Now, here you have the height and the width, as we had, similar to what we had in the image. We could scroll down and check on others. We don't really find anyone here. We could scroll back up, and that's it. Anyway, what's important to note is that you could create, you could implement data augmentation via two main ways, using the TF image and using the Keras layers. Now, we are going to look at how to re-implement this data augmentation strategy we had previously via the Keras layers. So here we had this rotation. So we had rotation and we had the flip left, right. So we'll get back to the documentation. And then what do we have here? We have this random flip right here, and then we're going to select the mode to horizontal, since we're doing left, right flip. So we'll copy this out. We have that layer, random flip. Let's put this just below. So this is the augmentation. This is augmentation using TF image. So TF image, augments, and then this is augmentation using Keras layers. So TF.keras.layers, augments. Okay, so here we have our augments. It takes the image and the level. But since we actually built in Keras layers, we are not going to take this this way. We're going to have augment layers. It's just a way of differentiating between these two augment and augment layers, which is a sequential model. So we have the sequential. We're going to make use of the sequential API, TF.keras.sequential. And then just right in here, we are going to specify all these different steps. And so we copied this out from the documentation. They just space it out here. We have this random flip. So we're going to use this random flip right here. Let's take this. Let's import random flip from the layers. So we have your Keras layers. There we go. We'll import random flip. Now we could have this. We have random flip. Okay, so we have the random flip imported. And then the other one is that rotation. Let's get back and we have your rotation. Let's check on that. We have scroll up and let's just do a quick search. So here we have rotation. Okay, so here we have random rotation. And don't forget, we will rotate in 90 degrees in the anti-clockwise direction. So here we have to be careful to make sure we re-implement that same kind of rotation we had seen before. Now, here we have this definition. We have a factor, fill mode, reflect interpolation by linear, the seed for disability, fill value. And that's it. Now let's check down here and understand better this factor. Now this factor we have here is actually this tuple. And this tuple is going to represent our range. To better understand this range, you should note that it's like each and every one of this represents a fraction of two pi. Now what is pi? Pi equals 180 degrees. So two pi is 360 degrees. So we're looking for a fraction of 360 degrees. Now recall that we're going anti-clockwise by 90 degrees. Like we're just trying to redo the exact same thing we have with a TF image. So here we're gonna look for a way to get 90 degrees from this 360. And to do that, we need to divide this by four. In other words, it means we're gonna get 25% of this 360 and 25% of 360 is 90 degrees. So 90 degrees, not percent. So that said, we wanna get 25% and that's fine. Now from what is given to us here, this is, we represent as a single flow. This value is used for both the upper and the lower bound. For instance, this results in an upper rotation by a random amount in this range. A positive value means rotating counterclockwise while a negative means rotating clockwise. Now we're interested in rotating counterclockwise as anti-clockwise that we're actually interested in rotating this way. Recall, it was this, this is clockwise. So we're not interested in this, we're interested in this anti-clockwise 90 degrees rotation. So we take positive values and then we'll try to ensure that it lies in that given range. Now, given that we are actually picking just random numbers and unlike the ROT90 method we have seen, which was somehow fixed. Yeah, what we'll do is we could just pick a range between say 90 degrees and say 90.1 degrees or any value very close to 90. So that said, recall it's gonna be positive, unlike if it was clockwise, it will be negative. So yeah, we're gonna pick the 0.25 because we wanna get 25% of this 360. So we have 0.25, that's positive 0.25 up to say 0.2, let's say five, zero, one. So let's pick out this range. So we'll work in this range so that we'll be around this 90 degree that we're actually looking for. Now that said, since we've taken 0.25, 0.205, two zero one, what we'll have here is 25% of two pi, which is 90 degrees, so it's gonna be translating it to 90 degrees and a value very close to 90 degrees. So it'll be like 90 point, a value very close or 0.00, whatever, degrees. So it's gonna be around this. So we pick this up and that's fine. Now we're sure it's anti-clockwise because as we've picked it to be positive. Now the fill mode and the fill values will just be the default values, even the interpolation tool. So that said, we get back to the code. Let's copy this out right here, random rotation, copy this out, get back to the code. Let's include random rotation here. So we have random rotation. Okay, random rotation, that's fine. We could now run this, let's run this cell and then get back to our method or this layer we're trying to construct right here. So here we have the random flip and then the random rotation. Now we started in this other method with the rotation. So yeah, we're gonna redo that. Now we have this random rotation, random rotation. Okay, and then we have this factor. So we just do exactly what we just agreed to put. So we have the factor, a triple take 0.25 and 0.62501. Okay, now we have that, which is, for the rest we take the default value. So we don't need to put those values. So that's fine. We've done with this first layer. The next layer will be this random flip. So we just put out this random flip. Let's copy this, copy that and then paste it out here. Now we have our random flip, we've decided on the mode, take that off and that's fine. Now this is horizontal. So let's have this as horizontal. Okay, so we have this random flip and that and there we go. We have our layer, sequential layer. Okay, now let's take this off and that's fine. So we have this augmented layers or this augmented layers which we run this time around. Horizontal is not defined. Now let's use this small case, horizontal and run again. Horizontal still not defined as pass as a string. Horizontal, run again. Then we could redefine an augment method. Let's call this method augment layer and then this takes in the image, image layer, sorry, levels and then what we'll do is we're going to return this augment layers which we just created and then pass in the image, specify that training is equal true. That's okay and then we send out the levels. So here we're going to take this augment layers as we're going to take the image and pass into this layer right here. It's going to rotate the image and then into this other layer which is going to flip that image. So we get our output which has been augmented and we have our levels. We run this, run this and then we get to this year where we have our augment and we have augment layer. So we call the augment layer method right here. We call this method, that's fine. We run that and everything is okay. Now after running this training, we have this error. So it shows that we have an error at a level of the shapes. Now let's get back. The problem here is we forgot to do this resizing before the data augmentation. So we could copy this out and then put out this resizing here. We have the image and levels, level, let's say image and level, we have that. And then we resize and rescale. After resizing and rescaling, we now do the augmentation. So we run this again and then we run the stringing. As you could see all as well, we will now look at another way of creating of doing data augmentation. Before doing that, we should also note that instead of doing this resize rescale using the TF image as we had done here, that is using this TF image resize and then by dividing by 255 as we just did, we could also use the resizing and rescaling layers. So right here, we're gonna include those layers. Here we have resizing and then rescale. We run that and then just here, we're gonna include that resize rescale. So here we say resize rescale layers. Okay, so now we have this sequential API and then we always start with resizing. So we have the resizing, take this off and have this. Yeah, we're gonna put in size. We have in size, in size. Okay, and then we do the rescaling. So we have rescaling, take this off and we do the rescaling. So here we have this taken off, one divided by 255. Okay, so now we have those layers which are responsible for resizing and rescaling. Resizing and rescaling. So we'll see how we'll replace this method in doing the resize rescale. So that said, let's get back to this augment. We had augment layer right here. After getting this image, we do resize rescale and layer. So we do the resize rescale on the image before doing the augmentation. Let's take this off now and this should be fine. Now, since we've built this resize rescale layers, we could replace this here. Resize rescale layers, even for the test set. And then we do same for this. This is already fine. Augment layer is what we've seen already. For the validation, we have resize rescale layers. Okay, so that's fine. We run this, run the validation, run the train and that's fine. We then train our model and see what we get. So that's it. Our model is training. Everything is working well. Now let's pause this and then check out this other interesting way of doing this data augmentation. So as of now, we've seen how to do data augmentation by using TF image and then by building these layers. Now we'll see the importance of building these layers. But before looking at this importance, it's important to note this and that is a fact that when you're working with a TF image, you have more flexibility as with these layers, you are kind of like restrained to what you could do with data augmentation. Though these layers already have many common data augmentation strategies like the rotation, the zoom, the crop, the flipping and the others. Nonetheless, when you're working with a TF image, you'll see that you have many more operations you could do. Like if you open up this TF image right here, you see you have many more methods and you could do many more stuff with this. Anyway, getting back to our code, we have to note that the advantage of these layers is that you could embed this into the model itself. Now let's explain. Supposing you have, let's take this off. Supposing you have your training pipeline here. So you have that TF data, you do the shuffling, you do the mapping, you do prefetching, caching and everything you need to do with your training data. So yeah, you're doing your data. And then next, after doing all this resizing, reskilling and everything, you now pass this into your model. So you pass this into your model and then you train your model and have all your results. Now, when you want to test this model, you will actually still need to do resizing and reskilling. So you still need to ensure that you resize your image and then you also ensure that you reskill your image before passing to the model. Now, if you recall, like, let's go down, let's get down here. You would see under the testing section, let's look at this, under the testing section, that we made use of this test data. And this test data, as we have seen from here, was already resized and rescaled. So if you check here, you see, we had done some resizing and reskilling right here. Now, what if you have to take this, this your model, which you've trained into another setup? And obviously in that new setup, you will still need to resize and reskill because just like here, before testing your model, you had to resize and reskill. So what if we put this resizing and reskilling in the model? So if we embed this resizing and reskilling in the model, it means that no matter where we go with this model, we don't need to resize and reskill. All we need to do now, let's, we'll take this off, right, I will take this and put in the model. So all we need to do now is just to pass in your data without the resizing and reskilling and the job is done. So now we're going to see how to do this with this augment or rather with the resize, rescaled layers, which we've just seen. We also do with the augment, but the augment isn't very useful since in testing time, we don't make use of the augment. What's more important here to notice the resizing and rescaling, although if you can include this augment layers into your model and make the training faster. Anyways, let's go ahead and see how this is done. So firstly, you have the layers, which you've built up, you've had the augment layers, and then you've had the resize, rescaled layers, which we have had here. So all we need to do is to get to this model right here. And then after the input layer, you just have the resize, rescaled layers, and that's it. And then you have augment layers. So here you have resize, rescaled layers, and augment layers, and that's all. So with this, you don't need to do any resizing or rescaling on your input because this is going to be done. All these layers, which are now part of your model. And then here too, you don't really need to specify this image size. You could now have this as known. And then yeah, you have that. So you could put in any image and then the resizing and rescaling will be done in that model such that you could put this model or use this model in any environment. And all you need to do is to pass in the image and you have your corresponding output. Now we've modified this model. We need to also modify the way we're working with our testing and training validation datasets. So let's get back to this right here. We have the down here, we have this, augment, augment, and then we have this test dataset. So with the test dataset, now we do not need to do this again. So we just comment this because we don't need to resize and rescale anymore. Now with the validation, you see with the validation, we do not need to do this mapping anymore. You see, we take this off. We don't need to do this resize and rescaling. And then with the training, get back to the training, this training, we do not need to do this augment because we've included this already. So all we need to do now is just to shuffle, put in some matches and do the profession. So that's it. So we have this, okay, we will now rerun this and see how the training goes. We run this training. As you can see, we have this very long trace and with all this, with this message right here, which isn't very readable. So what we could do is generally when you're trying to get or understand an error and you don't seem to have an idea of what's going on, you could run the model eagerly. So we use an eager mode. So run eagerly equal true. And here you see when we run this again, you would have a shorter error message or more readable error message. So you check this out and you see you just have this shorter error message. It should be noted that TensorFlow operates into main modes. That is the graph mode and the eager mode. So as you could see here, when you don't specify this, you run the training in the graph mode. The graph mode is actually faster and more efficient way of training our model, whereas the eager mode is slower but more easily debuggable. So when you have errors like this, you advise to run in this eager mode and when everything is okay, you could take this off or set this simply to false. But by default, it's not true. So by default, we are not in eager mode while doing the screening. Now that said, let's scroll down and check on our error. And what it reads is we cannot batch tensors with different shapes in component zero. First element had the shape and element one had this other shape. So we try to, we understand from this that since we're doing the rescaling in the model that is in here, we're actually doing the rescaling, the resizing and rescaling in here. We have this batching, that's what we had here. We had doing this batching on elements which do not have exactly the same shape. And so to avoid this, we'll just fix this batch size to be equal one. So we work with batches of one and yeah, work with batches of one, we run that. And we run our model again. We run this and that's fine. So you'll notice that training starts and everything now works well. Those gonna be slower since we're no more working with batches of 32, but now batches of one. And also to ensure that this goes even faster, let's take this off. So let's pause this and then let's restart training with the graph mode. So we run this and run this and this should now be faster than what we had previously. Now training is going on, we could pause with the training and see how it's gonna be very easy now with this model to run inference. To test this, we've had this cell here, the cell image which we'll run inference on. So we have the cell image and then we'll make use of OpenCV to read this image. So yeah, we have our image equal CV2 in read, in read this image. So, and yeah, you could start by printing out image. So you would have that not defined. Let's import CV2 right here. Okay, so we go ahead and import CV2. We can close this and we could import that just here. So we import CV2, that's OpenCV, we run that. And then we now rerun this cell. OpenCV actually stands for Open Computer Vision. So this is a very popular image processing library which normally you cannot do image processing or computer vision today without really working with this. So that's how with OpenCV you could read an image. So image read, and then here we go. You have the sprint of the shape. So you see that it matches up with what we expect. We have that. Okay, so here's the shape of our image. And now we are ready to, let's add a batch. So here we're gonna have image equal tf.expanddems and then we have the image and then we specify the zero axis. So we wanna add the batch dimension and then we print out this image. So here we have our image. Okay, that's fine. Let's print out the shape. So we should see that. Okay, we have this here. And now you see that we're gonna pass this into our model directly. So here we're just gonna do a lunette model that predicts image and that's it. So all you need to do is just do this. We don't need to divide by 255. We don't need to resize and the model is gonna understand. Now note that we are passing this image of this input, which is 265 by 262 by three. And our model is gonna automatically resize this because we've included resizing in the model. So let's run this now and see what we get. And that's fine. So it tells us that zero, that is, it's parasitic, but which is actually wrong, but this is normal because we have not really trained the model for a long period of time. And clearly from here, our results show that we should expect very poor results. Why is this disturbing? Let's scroll down. Okay, so let's scroll this way and then scroll back up. Okay, so you see that here we have this poor accuracy, 48%, if we train for longer, we can get to the 99% wish we had already. So this is actually done to show how easy it's gonna be for you to run inference on this model anytime you wanna in any environment you find yourself. So far, we've seen two methods of creating, of implementing data augmentation with TensorFlow. We've seen the TF image and then we've seen the TF Keras. Now with the TF Keras, it's actually interesting when you want your pre-processing that is, say, resizing, rescaling, and the augmentation to be in the form of layers. And with the TF image, it's more flexible to work with. Now, we're gonna look at a way in which we could build a custom Keras layer based on the TF image operations. So recall that when we're working with a TF image, we had this rod 90 degrees, which was right here. And when we wanted to work with this layers, we had to play around with this to give us that same rod 90 degrees effect. Now, what we'll do is in order to create a layer, which takes, to create this rod 90 degrees Keras layer and not just use this random rotation because it's not exactly this, we are gonna define a class which inherits from Keras layers. So right here, we have this class, which we'll call rod 90. Let's call it rod 90. And then it inherits from layer. We have all in it. There we go. Self. Super. There we go. dot init. And that's fine. So yeah, we have this rod 90, which inherits from layer. And then we'll define the call method. So we'll define the operation, which is gonna be carried out in this rod 90 layer, which we're about to create. Yeah, we have self and then we have the input image. So what we simply wanna do here is we wanna have, we wanna return the tf.image, the rod 90. So we're carrying out the same operation, but now we're converting this into a layer. So yeah, we take that and then we have image. So that's what we output now. That sounds fine. So yeah, we've had this rod 90, which we've created. We run this. And then instead of this random rotation, we are gonna use this rod 90. So yeah, we have rod 90. Take this off, not in its past, and that's it. So that's it. We have our rod 90 now, and we are no longer using the random rotation layer. So that's it. We get back and then run the cells and get to training. And as you can see, everything just works fine. And we'll see how to get the best of both worlds. As now we have the flexibility of tf.image and the speed and portability of Keras layers. Before we move ahead, there's this point which needs to be clarified. So we've already seen that when we have a single sample like this one, with data augmentation, we're able to produce different samples. In this case, we have the rotated version of this. In this case, we have another version in which we've modified the saturation. In this, we have this version where we've flipped this horizontally. That said, in practice, we are not gonna feed or we're not gonna create, let's say, this three other extra samples. What we do is, if we suppose we wanna implement three augmentation strategies, that is rotation, saturation, and flipping, for example, then for one epoch, let's have this year. For one epoch, we may decide that we do not wanna carry out rotation. And then from here, we may decide to adjust its saturation. So let's have from this, we have its saturation adjusted. And then in the last step, we'll decide that we do not wanna flip. So at the end, we have this input and then we have this as our output. Now in another epoch, you would have this instead. You may decide that you wanna rotate. So you may decide that you wanna have something like this. Let's rotate this by 90 degrees. Again, here, this is random. So we randomly rotate in this. We decide a random whether we wanna rotate this at 90 degrees or not. Then once we have this, we may decide that we do not want to adjust the saturation. And then finally year, we have flipping. So let's flip this. There we go. We flip this and we have this as output. So in this case, this is what we pass in the model. In other case, we may decide that we still wanna rotate. So we may decide I wanna have this rotation. Then still no saturation, but year we may want to maintain this. So we may still have this, you see. Now we do not flip this. And this is instead what this example or this model sees. Now in another case, we may have the input itself. That's this input, it gets past this. No rotation, gets past this. No saturation, gets past this, no flipping. And this is what the model sees. So we can see that the model has now seen four different examples of this same input without us necessarily having to create this four different examples separately before starting to train. And so this helps us carry out the data augmentation much more efficiently. That said, when we get back to the code, you see that that's what we actually did. Yeah, we decide to, or we may rotate or not. We may flip this or not. And if we had to flip, it's gonna be horizontally. And then year we may modify the contrast or not. So these are all augmentations which are randomly carried out during training. Now for the other alternative, which is the TensorFlow image, we need to make sure that this rotation year is carried out in a way that sometimes we have the image rotated and some other times the image remains intact. Now get into documentation, you'll see here we have this route 90, which takes in the image and this K right here. Now this K is simply a scalar integer tensor. And the number, and it's actually the number of times the images, the image of the images are rotated by 90 degrees. And so here we're gonna define this K, which is a random number, which could either be a zero or a one. And that's it for this rotation. Now getting back to documentation, you could check the same for the adjust saturation. Here is actually stateless random saturation. So you click on this. As usual, we pass in the image and then we have this low and upper bounds, which are those of the random saturation factor. So let's take this here, it's basically this. We copy this out here. So instead of this, we would have this. Well, let's just put this out here. We have image, which is equal this. Okay, so we taken the image and then we could have, let's say 0.3 and then 0.5. Okay, so that's it. We are gonna have this. And then now for the flipping left, right, we get back again here. Here you have random flip left, right. Then we copy this out. See, so we're gonna randomly decide whether we're gonna flip the image or not. Let's paste this out here. Instead of this, we have this now. There we go. Take this off. And that's it. So now we have this augment method, which randomly selects which augmentation strategies we are gonna use for a given instance. And this permits us carry out augmentation much more efficiently. Now let's do the same here. Let's copy this and paste out right here. So here we have this. Hello everyone and welcome to this new session in which we'll treat mixed sample data augmentation. Previously, we saw how to do data augmentation on a single image like this or this one. And now we'll learn how to create new samples based on a combination or mixture of different images or different samples from our dataset. And more specifically, we'll treat the mixed up data augmentation strategy where we'll pick two samples from the dataset. We're gonna mix up the samples and then we're gonna include the strategy in our TF data pipeline. Up to this point, we've implemented data augmentation strategies which involve modifying input samples like this one. So we could take this input sample, we could do a zoom, we could do a center crop, we could rotate, we could translate and do many other stuff with this in order to augment our already existing data. Now, in this section, we'll look at another data augmentation strategy which is known as mixed up. Now, mixed up doesn't only involve just one sample as we had seen previously. With mixed up, we are gonna make use of these two samples instead of just one, mix them up and then use this output sample. And if you look very carefully here, you'll notice that this image contains this one, you could see here, you could see it carved out this dog, carved out like this, you see here and then this other dog right here. So we also have this one which is also carved out like this. So we could see the mixture of this two to form one input. So we have this output image which we could define as X prime which is a mixture or a combination of this image X1 and X2. So yeah, let's have X1 and X2. But this actually happens to be a weighted addition. That is we have a certain factor lambda which is a value between zero and one and drawn from the beta distribution such that we have X prime equal lambda times X1 plus one minus lambda times X2. This means if lambda equals 0.5, then we'll have 0.5 X1 plus 0.5 X2. Lambda equals 0.3 for example, we have 0.3 X1 plus 0.7 X2. So some sort of weighted addition. Now we've created this new input, it's logical that we need to modify the labels because this doesn't belong to either or you cannot really say that this belongs to this class or this class. It actually is a mixture of both classes. So all like previously, when we do data augmentation on this image, we are gonna maintain the label because it still remains this particular class. But when we mix classes up together or images from two different classes together like this, we need to modify the level. And so that said, we have a new level Y prime equal lambda and you guessed that right, Y1 plus one minus lambda Y2. And from here, we could now dive into the code and implement this mix up data augmentation strategy. Recall we said that lambda was to be drawn from a beta distribution. But if you come to the documentation we've been using so far, where we have this tensorflow.org API docs, you wouldn't really find this beta distribution. So you're advised to look out in this one year. So go to tensorflow.org slash probability instead. It's here you would find this distribution. So here we have this tensorflow probabilities and then you have tensorflow distributions. When you click here, you could find many probability distributions, including this beta distribution, which we want to work on now. Now, just right here, you could see the beta distribution, you have the definition. And then notice how we have this two parameters, which we must pass. So the beta distribution is defined over 01, this interval 01 is in parameters concentration one, aka alpha and concentration zero, aka beta. So we have the parameters alpha and beta to pass right here. Now, if we look up in this mix up paper, you'd see that the parameters alpha, they use like here from this experiments is 0.2. And then sometimes they use 0.42, but most times they tested on 0.2. So it's this parameter we'll be using. And then getting back here, all we need to do now is copy this out. So we copy this, get back to our code. And then we have the mix up down here. So here we have this mix up. Okay, mix out the documentation, let's reduce this. And then we reduce this part. So you wouldn't run the cells because we're not going to make use of this now. So here we have this mix up, let's test out this. We run it and we have this error, name concentration is not defined. Anyway, we could get back here and then import TensorFlow probability. So let's import TensorFlow probability and then put that as TFP, run that and then get back to our mix up. Okay, so here we have this mix up. Let's take this off. And let's take all this off actually, 0.2, 0.2. Okay, so we got that from the paper. That's understood. And then we have Lambda. So here we have Lambda. If we spell it this way, we'll have the keyword, Python keyword. So let's just keep that simple and just have this Lambda spelled, although spelled wrongly. And now we'll do Lambda, let's print out Lambda, Lambda.sample. So we take out one sample from that beta distribution and we run that and here's what we get. So you see, if you run this again, you would obviously get different outputs and all this drawn from the beta distribution and the range 01. So here I'm just going to do this, pick out this zero item and then we have this output. We could also put this NumPy to get it. See, you have this now. So you're not having the tensor. But anyway, we prefer to use it as a tensor and we'll explain why it's preferable to work with tensors in the function we're trying to build. From here, we just simply apply the formula. So we'll have the output image, which is equal Lambda times the image one plus one minus Lambda times the image two. So that's it for the image. We'll repeat the same process for the labeling. Then using OpenCV, we'll test this on this two images, which we've added here. We have this dog and this cat image right here. So we have this two, you're going to test this on it. Let's take this off right here. We're going to read the images. So we have image one. We're going to do the reading in read and then we'll do the same for in two. So right here, we could print out image that shape and level that shape. All right, let's print out the level. So we have that. We get in this error. Let's get back up and we correct this. So this actually Lambda, let's modify Lambda. Lambda equal this. So let's have that out. Okay, so we have that right. We run this again and require broadcastable shapes, which happens because we haven't yet resized this because here we have image. If we print out the shape of image one and image two before doing this operation, you'll see there are two different shapes. So we have to ensure that they are both the same. See, they're two different. Now let's resize this with CV2 resize, resize and that will specify the shape. So let's have your image size, image size, image size. Okay, we have that done. We just copy this out, paste out here. We have image size and then CV2 resize. Okay, so we've read, we've resized and that should be fine now. We run that again, image size is not defined. Let's have that to be defined here. We actually defined this previously, but we haven't run those previous cells since we restarted the notebook. So that's why it isn't recognizing the image size. Okay, now level one are defined. This looks great already. For this levels, let's say we have level one. So here we have level one equals zero and then level two equal one. So we have just these two levels. Okay, we run that again and this is what we get. You see, we have this output image and then we have this final level and which happens to be neither zero nor one. From here, we could plot this out, PLT in show and then we pass in the image and normalize this. We run that and this is what we get. So you see, we have a mix up of these two images. Now that we have succeeded to do this, let's make this part of our TensorFlow pipeline. So we could take all this out now and then define this mix up method. We have this level. Okay, we'll take this off. Okay, we're gonna define this method. Let's call it mix up and the way this method works is we're gonna take in our data. So train data set, here we have one and then train data set two. So we have this two data sets which contain the same elements but which have been shuffled so that we could have this kind of mix up. Then we can now make this data sets available which we'll do the mix up on. So let's add some code, take this up and then here we have our first train data set. We have train data set one which is actually our train data set we have built already. And so we're getting our train data set right from here. We've run this already. We get back. Be careful we are not running this but we may make use of one or two methods from this. So we have that and then we get back to this. So here we have train data set. We do some shuffling. We specify the buffer size and then we also specify that we're gonna reshuffle after each iteration. We just copy this and paste out here to have our train data set two. Now, once we have this train data set two we now have our train. Let's call this mix data set or mixed data set. Okay, we have our mixed data set that will make use of the zip method. And with the zip we're gonna pass in the train data set one and then the train data set two. Now, if you could remember we had an arrow when we passed into images we had two different shapes. So we have to ensure that we do some pre-processing before doing the mix up. So that said, after doing the shuffling we could do the pre-processing. So let's say pre-process. Let's get back up while we define this pre-processing. And we had here. Okay, so we had at this level of data augmentation we had pre-processing. Although we had inserted this in our augment but it's practically this resize, rescale method here. So we could run this, that'll be fine. And then we just do resize, rescale. So we wouldn't call that pre-processing again we just have resize, rescale as we've done already. We do the same mapping. Yeah, we have resize and rescale. Okay, so now we shuffle, we resize and rescale and then we have our data which is now a combination of data set one and data set two. This GIF was gotten from gifi.com. Now we have this mixed data set formed. We run the cell, that's fine. And then in here, let's take this off. We've run this already. So we have that. And then yeah, we're gonna take in image one. So image one and level one. Level one, we have that. Yeah, this suppose we have image two and then level two. This closes up. Okay, so we have that. And then we get this from the train data set one, data set one and the train data set two. So that's how we get this. We have image one, level one, image two, level two. The image one that we had here, we don't need this again. We have Lambda, we get Lambda, we get the image, we have the level and then we have our output. So we return image and level. So this is all about the mix up. Now we run this cell and then we create this other new cell. Yeah, we have this error, should have this. And then we create this new cell right here. Then we pass this out from what we had done already. And then here we have our mixed data set. So we now have this mixed data set. We shuffle again, we do the mapping with the augment layer, we do batching and prefetching. But since our augment is no more the previous augment layer we had, here we have now the mix up. So yeah, we replace that with mix up and that should be fine. We have the train, we could do the same for the validation. We get an error or spell an error, train data set. Let's get back to this, train data set. Okay, so here we have the train data set. We run that, run this again. We have your input Y of mall operation has type in 64 that doesn't match the type float 32 of argument X. So this is where we multiply in the Lambda by the levels. So clearly we have the Lambda which is a float and levels which are ints. So yeah, we're just going to cast this. So we have this casting, we specify the D type, float 32. That's fine. And then right here, we do a same casting, specify the D type again. And that's it. Let's run this again and see what we get. See, it works fine. We have this warnings. Also note that here we were trying to experiment and change this. So let's get back to point two, run that again. Okay, so now we have our training data as you can see here, we have the batch and then the image and then the batch and then the level. So that's it. We've now created this data set which happens to be a mixed data set. Then from here, we are going to prepare our validation data set is actually the same as what we had already. So you could just simply run this previous cell right here. This cell is data loading. Just simply run this, run this and you should be fine. Instead of doing this augment layer, we meant to just do resize with scale. We just have to resize and rescale our validation data. We don't really need to shuffle. We could take that off. We could take the profession off and that's fine. So we run our validation and then check it out here. See, you have a validation data. Now the reason why we have this is because we run this twice. So let's re-initialize this right here. Let's get back to the splits. We created train data and validation data. Okay, let's run this again here. Now everything should be fine. Okay, so you have the batch dimension and that's good. So we now get back to training and make sure everything is okay. We rerun this and everything is now fine. Okay, so we're now set to train our model. So we run our sequential API right here. We run this. We then compile our model and then we could get ready to train the model. Have this poor results. Reason being that the mix-up data augmentation strategy isn't adapted to the data set we're working with. Even if the mix-up data augmentation strategy we've just applied wasn't very helpful for this particular problem, it's important to note that this mix-up strategy could be used in many other problems. And with that, we've come to the end of the section. Thank you for getting up to this point and see you next time. Hello everyone and welcome to this new session in which we're going to implement the cut-mix data augmentation strategy with TensorFlow 2. The cut-mix data augmentation strategy, though based on the fact that we are combining two different samples, is different from that of the mix-up. Instead, with a cut-mix data augmentation, we are going to take a random patch from one of the samples and attach to the other sample while modifying the levels accordingly. We've looked at how to implement data augmentation with TensorFlow and also how to implement more advanced data augmentation strategies like the mix-up. In this section, we'll look at this cut-mix data augmentation strategy. Here, if we suppose that we have these two images, image one and image two, we are going to randomly crop a part of this image. So just like you can see in this output here, they randomly crop this section from this image and then attach that to this other image such that what you have in the output is this one with this patch. So this is how this cut-mix data augmentation is implemented. If we take this example where we have this cut and this dog right here, we'll try to randomly crop a part from this dog and then attach that part at the same position on this cat image here. To do this cropping operation, we get to TensorFlow image. Let's have this. We have this TensorFlow image and then we have the crop to bound in box. So we'll click on this and then we have this definition. We see the arguments, we're passing the image and then we are going to see, or we're going to specify the offset height, offset width, target height and target width. Now let's explain what all this means. If you have an image like this one, just as it's giving you the offset height is a vertical coordinate of the top left corner of the bounding box in the image. And so this means that if we randomly select, for example, this box right here, let's suppose we've randomly selected this box and our offset height will be this distance that is, our reference is this top left corner and our offset height will be this distance here. This distance and our offset width will be this other distance. So that's how we have this offset height and offset width. And then the target height is a height of the bounding box. So here we have this bounding box's height and target width, the width of the bounding box. So once you provide this, it will be able to automatically crop out this zone from the image. So coming back to the code, we are gonna add this other extra subplot. Let's have this other subplot. We have, let's paste this code first. Could just copy this out. We have this. And then we have the subplot, third position. What we're doing here is we're not gonna be having this. We're gonna have image, let's call that image three. And then what we'll be doing here is we'll be making use of this method. So we have this method and then we'll specify this offset height, offset width, target height, target width. Let's suppose that our offset height is, let's say 20. So here we have 20, 20, say 15. So we have 20, 15. And then let's suppose this target height is 100. And here, say 98. Okay, so we have this specified and then we now pass in the image. The image we'll be using here is image two, this image here. And that will be it. So here we take this off and then we could simply show this passes here and that's fine. Now we could take this off. Okay, so we have this. We can now plot this out and see what we get. Okay, as you could see, what we obtain here is the correct, it's a cropping of this zone, as you could see. You see that we have something around this, we crop out this zone. So actually it takes more of this. So it's something like this. Now we could modify this, like we could shift the height and the width. So we could actually maintain the height but shift the width so that we could get more of the dog. So let's do just that. We shift the width. Let's say we take 100 too. We run that and here is what we get. You see that we get the dog's face this time around. And basically this is how we crop out a region from this image. Now, once we've done this cropping, we want to create another image which is made of only this crop while the remaining zones are actually left out. Now to do that, we'll make use of this other method which is the path to bound in box. So here we have this image path to bound in box. We'll copy it out and you see how it works. So here we have this image path to bound in box. We're going to create another plot. So let's increase this number of plots. We have four and year four and year four and let's place this out first. So we have that, copy this, paste this out and then create this foot plot. So we have this foot plot. And now what we'll do is just copy this here and paste it out. So here we paste this out and then let's get back. And then what we pass in as image now is this cropped image. So let's actually copy this. Let's say we have this crop. Let's call it a crop. And then we have, let's take this off. We have the crop, paste it out. Okay, so here we have the crop and now we're going to take in the crop now. So after, let's look at this. So after we've, oh, we've had an error before. So let's do this so you could see it better. Okay, so actually what we're seeing here is we have this padding to be done and then we're taking this image and then we want to pad it or like stamp it on another image which contains only zero pixels. So let's look at that. We have that. We have this pad. We pass in the crop. We have the offset height and width and then the target height has to be given. So yeah, we're going to put in the image size because we want this to be padded on an image with this dimensions. So we have the image size there. Now we could run this and see what we get. We have this error. Let's modify that quickly. Okay, so yeah, we have this taken off and then we have 20, 100, 100. Okay, so let's run that and we should have something unreasonable. Offset width again. Oh, okay, we should take this off. Okay, so let's look at this and there we go. As you could see, we have exactly what we expect. You see that we have this somehow the same image but we've taken out only this crop. So that's what we actually want. We want to be able to take only this crop and then take this like this crop and then add it with this image here. So what we want to do is we want to take this image now and then add it with this image so that we could be able to create our data augmentation pipeline. Now we could call this image four. So let's have this as image four equal this here. Stick this off. Image four, paste it out. And then once we paste it out, we could now let's also do another subplot. So we keep on doing the subplots. Let's copy this and paste it here. That's fine. We take the crop as we have the, or rather we have this image four plus our initial image, image one. So let's have that and then we put it out. Okay, we get in this plot but it isn't very clear. So let's increase that figure size. Let's add the figure size here and that's it. So we run that again and then let's look at this now, Clara. Okay, here's what we get. So as you could see here, we have this patch, like it looks like someone is working but there is a problem as we have some sort of mixture of this and the initial image. Now this is logical since when you do this addition for this black region, you just have only the cut but for this region, you still have the path of this cat image. So what we need to do is we need to remove this path such that when you take this and add to this, it just fits in like a puzzle. Then to crop out just this part from this image, we are going to use the same process we've used for this dark image right here. We'll simply copy out what we had already here. So we had this cropping, could copy this out and paste it out here. We run that and here's what we get. So you see, we have this crop again. Now we will take this crop and then add to bound in box as we had done here previously. So let's copy this and then what do we have? We have the seventh and then we're going to pass in this crop cat. So pass in the crop cat and that's it. But we'll print out this image. So let's call this image five and here we have image five, we run that and this is exactly what we get. Now the aim is for us to be able to take this and subtract like take this and subtract this image from it such that we'll be left with the full cat. We doubt this portion, without this portion, this portion right here. So that said, if just here you do image one minus this image five, you would get this answer. So as you could see, you have this whole cat without this portion and this is exactly what we want. Now that we have this part, we could add it up with what we wanted to add initially here. So we added up with this image four. So here we have plus image four and we get the response and there we go. So we've completed this process of cutting out this portion from here and then fitting on this image one. We now take off this other part, take this off and there we go, we now have the crop, the image four, crop cat, image five and then our final image. We copy out this mix up code we had done previously. So we get back to this. You see that now we could easily integrate this since we have image one and image two, we just left with the levels. Let's just copy this out, cut that and then here we take this off, paste it out. There we go and we now have the output image. We now modify this variable name such that crop is crop one. So here we have crop one instead of crop and then here we have crop two. So everywhere we meet crop cat, we have crop two, here we have crop one and then this image two is maintained from here, we have image two. Then the image four, we could tend this to pad one. So let's call this pad one. Here we have pad one, but this is actually pad two since we're working with the image two. So let's call this pad two and crop two. We have the crop two, then from this we have crop one and here image one, here we have crop one and here we have this image five, we'll call it pad one. Then coming right here, image five, image five is pad one and then image four, pad two. Okay, so we have that done. Now we could focus on how to get this bound in boxes. So yeah, we just pick this bound in box, but how do we get this bound in box? To get an answer to this question, we are gonna make reference to this formulas which were given in the paper. Right here, we are told that rx is drawn from the uniform distribution, which takes parameters zero and the width. And then ry is drawn from the uniform distribution, which takes parameters zero and the height. Now if we consider this image and say this box randomly picked, then our rx is a center that is this distance from this as based on this origin actually. So we have this vertical distance to this center, which is our ry and then this horizontal distance, which is our rx. So that's how we obtain ry and rx, which we draw from the uniform distribution. Now to obtain rw and rh, we have this formula right here. rw equal w, w is the width of the image. So we have w times the square root of one minus lambda. Recall how we obtained lambda with a mix up data augmentation strategy. So this is exactly the same way we get lambda. So here we have w times square root of one minus lambda, rh equal the height of the image times square root of one minus lambda two. Now note that when you multiply this two, that is if you take rw times rh, you would have the numerator right here. Let's have this. rw times rh gives you this numerator. And then if you multiply this two, you would have wh times the square root of one minus lambda. So if you multiply this times this, it gives you wh, this times this, square root of one minus lambda times the square root of one minus lambda. And you have the square root of one minus lambda. Okay, so this is equal rw rh. Now we'll see how they obtain this formula. You see here that since you have rw rh, you could divide here by wh and divide here by wh. This goes away and we left with this. Obviously, if I have the square root of x times the square root of x, then it is equal to x. Since I'm having the square root of x squared, now the square root gives you x. So in this case, our x is one minus lambda. So this is equal one minus lambda. And that's how they obtain this relationship right here. Now that said, we have rh. Oh, we've taken this off. Let's go one step back. We have rh. Let's take this. We have, sorry, rw and then we have rh. So this is rw, the width, and then here's rh, the height, r height. But recall that what we have to pass into the method, which permits us crop out the zone, for example, is actually this point right here, the top left corner of this bounding box. So we are not going to use the center, but instead it's top left corner. Now, how do we get a top left corner? To leave from the center to the top left corner, what we need to take into consideration is the fact that we notice this width, for example. Now, if we know this distance, and we have this distance from this to this point here, we're supposed to use the center, then we could obviously get this distance here. We could get this distance. To obtain this distance, we need to take all this minus just this portion to obtain this portion right here. Now, to obtain this portion, it's easy because we already know the width, r, w. So all we need to do is divide this by two. If we divide rw by two, then we have this distance. And since we already have this distance, rx, then we can obtain the x coordinate for this point right here. Now, the next thing to do is to find rh. And we're going to use, sorry, we're going to find ry. And we're going to use the same method we did to find rx from base in the top left corner. For ry, we know this distance. Now, do we know this distance? Yes, we know this distance because this is half of the total height since we are found at the center. Now, if we have this distance, then we could find this distance because this distance plus this distance gives us this distance. And so to get this distance, we need to take this distance minus this distance to obtain this distance. And if we have this distance, then we have the y coordinate of this point right here. Recall that the reason why we're going through all this is simply because the methods offered by TensorFlow consider that the bound and box coordinates are given based on this top left corner right here. So now we define this function box, which takes in the lambda. It takes in lambda. And then what this does is, it makes use of this uniform distribution to obtain rx, ry, and then makes use of lambda to obtain rwrh. Now, getting back to our uniform distribution, we could copy this right here. And then put in our code. For now, let's keep the function aside. Let's just have this. So here we have our uniform distribution. Recall that our low is zero. Let's take this back. Our low is zero and our high is the width. So we have your aim size and we have that. Okay, so we've defined this and then we have rw, or rather rx. That's rx. Now we want to have ry. Let's have it rx, ry. And then we could simply copy this out. There we go. We have this. Let's print this out so we see what we get. Print out rx. For example, there we go. We have rx. Aim size is not defined. We restarted the notebook. So let's get back to redefining this aim size. We run it again. And this time around, everything looks fine. Now we could draw a sample from this distribution. So let's have sample. We draw a single sample and see what we get. You see, we have that. We could take this zero element and there we go. So now we're able to draw a sample from our distribution. We could do the same for ry. So yeah, basically we have this. And then what we want to do is to ensure that all this coordinates are integers. So we could cast this. We cast that. We have the dtype equal int 32. That's fine. Copy this out and paste it here. Okay, so we have that and then we cast this too. Now we have rx, ry. Let's just do a sampling directly. So we've taken this and then we do the sampling right here. Sample, take a single sample, take zero element. And that's fine. We do the same here, paste it out and everything looks okay. So yeah, we could now print out, we could now print out rx and ry. Okay, it looks great. We have now our random rx and ry. Run again so you could see the response. Now we have to obtain m size times the square root of one minus lambda. So we suppose that we have lambda. We actually going to use the same method we had used previously. So suppose here we have our lambda and then right here to obtain rw, we have the m size times tf mat square root of one minus lambda. Okay, one minus lambda. Okay, that's fine. Now we have rh, r height, rh same in size, one minus lambda, that's okay. So now we have rh, rw and rx and ry. What we'll have to do is to modify this rx. So we will have rx equal rx minus the width divided by two. Now we want to have a whole number. So we have that minus the width divided by two, but the width is the m size. So we want to have this m size, there we go. Now, if you don't get why we're using this, just get back to this image to obtain this distance, that's to obtain this coordinate, this top left corner, we simply take this distance to the center minus the width divided by two and we do the same for the height. So that's it. Now we're doing that. We have rx equal rx minus m size divided by two. We'll repeat the same and then we have it for y. Now this is actually this width divided by to the width of the box, not the width of the whole image. So we're making an error here. Let's get back to this. We're making use of this width actually, not the width of the whole image. So let's get back and we'll put this code after this. So because we're going to be making use of this rh and rw. Now we have here, this is rw, rw, that's fine. Okay, now yeah, we're going to make use of rh. So we have the rh, ry, ry. Okay, so now we have rx and ry. We could now print this again and then also print out rw and rh. We get in this error. We told that in this competition right here, the rw is meant to be an int that was passed as a float. Since after this competition here, we would have that as a float. Now we could always print out rwd type. Let's run that and see what we get. You see, we have a float. So we'll modify that and cast it to ensure that we have an int. There we go. We have dtype equal, let's just copy this, tf ints 32. So we paste this out here and paste it out this way. tf.cast and that's fine. So now we expect to have the expected response and there we go. You see, we have rx, we have ry, we have rw and we have rh. When we run the cell several times, you'll notice that you have some negative values popping up from time to time. And this actually happens when, for example, we could have our center at this level, but our weight is so large that it doesn't fit in the image. So it goes in a negative direction. And this kind of situations are, say in this kind of situation, you could have our box going out of the image. And so we have to ensure that each time we create in this box, we limit it to the image. So this box now, let's take this off. This box now will be this one. And this other box right here will be this box. So we redraw this box, but it wouldn't go out of the image. Our aim is to ensure that we take this part off. Okay, so that's it. We're now going to clip this values programmatically by using the clip by value method. So here we have tf.clipByValue. There we go, we have that. And then once we pass this in, we're going to specify the range. So we're making sure that the values always fall in this range zero to the image size. So that's fine. Place it out and we have this clip by value right here. So that's it for our X, our Y. And now after running, you should have only values fall in that range. After clipping, what we have is, if we had say initially this box right here, we now have this box. So this is what we get after clipping. Of course, we get the coordinates of the top left corner, but the width in this particular case has changed. In another situation, we could have this, for example, if we have this, in this case, the height changes because we no longer have this height, but now this new height. In a situation where we had a box, which was like this, for example, here, but the height and the width will have to change. And so based on these new modifications, we have to ensure that we actually pass the right width and the right height. And so what we could do is make sure that once we have the center and we subtract the width, or we subtract half of the width and half of the height to obtain this, because we had the center, we had X minus the width divided by two to obtain the X coordinate of this point. And then Y minus the height divided by two to obtain the Y coordinate of this point. Now, what we'll do is we'll obtain this coordinate right here, but while implementing the clip function. And so this means that if we had a box, like normally centered in the image like this, and then we have this, we'll obtain this point, top left corner, and then we'll obtain this point, bottom right corner, just by using X plus W plus W divided by two, and Y plus W divided by two. So this point is of has coordinate X plus W divided by two and Y plus W divided by two. And that said, after clipping, we'll have just this now. But what's interesting about the method of clipping after getting this is that now we know exactly where this bottom left is found or rather this bottom right is found. And if we know we have this coordinate and we have this coordinate, then we could recalculate the width and the height. To recalculate the width and the height, it suffices to take, for example, for the width, we take this point here, the X axis, minus this X axis right here, or minus the X axis at this point, because this point and this for the X remain constant. Whereas for the Y axis, this point and this also remain constant. So that said, all we need to do now is to take this Y minus this Y to obtain the height, and then this X minus this X to obtain the width. So based on that, we'll re-copy this out and paste. So here we have X bottom right, X bottom right, and Y bottom right. All we need to do is to change this and add a plus, and here we add a plus. We keep it by value, so always ensure that it's found in the image. Then also now to obtain this final R, W. So our R, W now is equal, our Y bottom right, minus our R, Y, and then the R, H, we equal our Y instead. This is actually X, and here is X, because that's H, that's W, sorry. So you have a Y, B, R, minus R, Y. Okay, so now we've modified the way we calculate this based on the fact that the box may be out of the image, and that now it has some modifications to be done on the width and the height. Now the next step we have to take is, if this R, W is equal to zero, then we have to make sure that R, W becomes one. And then here we repeat the same process. If R, H equals zero, then R, H becomes one. So that's it. Now we have our R, X, R, Y, R, W, and R, H. That said, let's now create our box method. We have our box method right here. Let's have that. And this is what we return. So we return in this. But note that here we're taking the height, the offset height, offset width, target height, target width. And so this means that what we have to output in this method right here has to be R, Y, R, X, and then R, H, R, W. So there we go. We return that and normally everything should be fine. So we have this box method defined. We run that. We get in this arrow. We should pass in our Lambda. Let's go down. We have here Lambda. Okay. So we've passed in Lambda and that's fine. Now next thing to do is get back to our mix up. And then we like we did before with this Lambda, we actually could get this. We pass in the Lambda so we could take this off from here. So we run this again. And then we'll define this Lambda creation just right here. We have Lambda. Let's go back. We have Lambda. And then now we have the box. So this box is going to produce this output we need here. Let's call it R, Y and that other in the documentation, R, Y, R, H, R, W equal box of Lambda. So now we have box of Lambda and then we have this outputs here. Now instead of passing this, we have R, Y, R, X, R, H, and R, W. Let's take this off. Those boxes we fixed initially. We have this. Yeah, we have R, Y and R, X. Scroll down, that's R, X. Okay, now we repeat the same process here. We just simply copy this out and paste here, paste it out. Then also we have R, Y, R, X again. So here R, Y, R, X, let's take that off. Okay, so that's what we have now. And the mix up seems to be fine. So we could get back, run this method and then run our mix up method again. Now as usual, we are going to mix up these two data sets. So it suffices to run this. And then for this one, we are going to, instead of using the mix up, so yeah, we command this part and then we have the map and cut mix. So yeah, we have the cut mix method. We run this and everything should be fine. We get in this arrow. Let's check on how we call this. Oh, we call this mix up still. This should be cut mix, cut mix. We run that, fine. Let's get back to this year and run it again. Okay, that looks fine. Train data set, everything looks fine. Now we'll try to plot out some values so you see clearly what this looks like. Let's create this new code cell down here. This is out and we are going to show this image from our training data. And there we go. We could notice this patch, we could run this again. And this time around, we even have a bigger patch. So that's it, we've seen how to come up with this data augmentation strategy. But yeah, we get to do with the level. So I think if we print out this level, let's print out the level. Yeah, see, we print out the level. We just get in all zeros. Anyways, let's go ahead and implement this section for the level. And to do that, if you recall this formula, you have one minus lambda equal this. So this means that lambda is equal one minus RWRH divided by WH. And the reason why we need to do this again is simply because the RH, like at this level where we had this RH and RW modifications, all those clippings, we have to ensure that when creating the level, that condition is verified. So let's get back. Let's check on this formula. Lambda equal one minus RH, RW. So here again, we have lambda equal RH times RWRH, anyway, we have this RWRH, then divided by WH. So here we have in size, in size times in size. Okay, now this is one minus all this. Then we apply the same formula for the mix up. We have this lambda. Okay, we've got no lambda. We now pass this. We have level one and level two. Okay, so there we go. We have our cut mix. And then let's get back. Actually, we will have to rerun this, run our cut mix. And then for our training data, we would have to rerun again. So we have to run this cells again, run this, and then recreate our training data set. Again, this error, meaning that lambda is of type float 64. We get back to this and then we do some casting to send this, make this float 32. Okay, that said, we run our cut mix, cut mix run. We run this, there we go. Your cut mix, we run this again and this is fine now. So we have our training data set and then we'll go ahead and visualize it. We have this double knowns right here. So it's preferable for us to actually rerun this from this point. Okay, so we run that from that point and then we get back to this, we run this. Well, that's fine. We can run this and this should be okay now. Okay, that's fine now. We have everything intact. So that's fine. We can now visualize our data. So you can see we have our patch and then unlike before where we had add a zero or one, now we have values between zero and one as our levels. So let's now run this model, compiling and training. After the training is completed, we obtain those results. We see here how this accuracy, let's scroll down here. We have this accuracy, which doesn't really change much. So we just around this 46, 48 and high is this 50%. So it's around this 45 to 50% range and the last two doesn't really change much. Now, this is due to the fact that the model is getting confused. And the reason why this model gets confused is because if you have say this uninfected cell right here and this parasitized cell, it's actually this portion which permits the model know that the cell is parasitized. And so when you come and crop this part right here and then attach it to this cell here at this position, the actual part of this parasitized cell image which makes this parasitized isn't taken into consideration. And so the model gets confused as now it doesn't really know how to differentiate between an uninfected and a parasitized cell. Again, this cut mix data augmentation strategy isn't adapted for our model though it could be applied in many other problems. We thank you for getting around to this point and see you next time. Hello everyone. And welcome to this new section in which we'll see how to implement data augmentation using a specialized data augmentation library called albumentations. We'll see how to use albumentations with TensorFlow and also PyTorch which will permit us to see how easy it is for us to integrate albumentation with just any library. Note that this session was inspired by a question posed by one of us. Feel free to always ask questions as this will permit us better discourse. We shall be looking at the albumentation tool which is a specialized data augmentation library. Albumentation is a Python library for fast and flexible image augmentations. It efficiently implements a rich variety of image transform operations that is optimized for performance and those so while providing a concise yet powerful image augmentation interface for different computer vision tasks including object classification, segmentation and detection. Now we will look at why we need a dedicated library like albumentations. Here in the documentation they argue that if you do this kind of usual data augmentation there are a host of libraries like TensorFlow which we've seen already which do this kind of data augmentation. But then when carrying out data augmentation with different tasks like say object detection like in this case or let's take this example here which is more illustrative. You have this image as input and this bounding boxes which will permit you build a model for object detection. But then if you were to do data augmentation and then you want to augment this data by applying cropping like as you could see here notice how this image right here has been cropped like what we have now is something like this part or rather than this part actually. Now notice you have cropped out this whole image and you're left only with this. I think it's it goes right up so it should be something like this. Yeah, something like this. Okay, so you've cropped this image and it's part of your data augmentation pipeline. So let's take this off and this off. As we were saying, you crop this image and here's where you get. Now with the usual data augmentation methods or working with the usual libraries what you have to do is after doing this cropping you would have to manually modify each and every bounding box you find here. You see this bounding boxes right here or this one's not taken into consideration because it's not part of this crop. So you have to modify each and every bounding box you find here, this red, brown and boxes right here. And this is because this bounding boxes they get the positions with respect to this origin right here or in this case with respect to this origin. And so when you do this cropping obviously those positions change. Now, instead of doing this manually augmentation permits you to get this restructuring of this bounding boxes automatically without having to manually modify them. And apart from object detection another very common use case is in image segmentation. We have this original image which has been transformed into this one and then the image has a mask. So you are trying to segment the different parts of this image. And so after you've applied this rotation you now get augmentation to apply the same rotation in the outputs. Another reason why using augmentations is advantageous is the fact that it has this declarative definition of the augmentation pipeline and provides a unified interface. So basically this is what it takes to build augmentation pipeline. Now notice how with this we can also include this probabilities. So you see a probability 0.3 and this probability 0.5. Now stating that this brightness contrast augmentation will get a probability of 0.3 symptom means that you're gonna apply random brightness contrast three times for every 10 images you process. And this means that if you have a data set of let's say let's say we have a data set of 10,000 images we have this data set of 10,000 images then what our model will see are the probability of our model getting the original images is in this case of P equals 0.3, 0.7. So we have a probability it is most probable that we will get 7,000 out of this 10,000 from the original data set which our model sees and then we have 3,000 which is gonna be augmented. Then you also have this horizontal flip so you have the 0.5. So this means that you could pass in an image and then there is no brightness contrast and there is no horizontal flip. Same as you could pass in an image and you have the random brightness contrast and you don't have this or you may not have this and you have this or you may not, we've seen this already or you may have this and have this. But with this random crop you always have it because here there's no probability specified. Then also it's advantageous to make use of augmentation because it has been rigorously tested. As you can see, it has been battle tested, used widely in the industry, deep learning research, machine learning competitions and open source projects, high performance, diverse set of supported augmentation, extensibility and rigorous testing. We also have here this list of transforms and their supported targets. As you could see, this list is broken up into two parts. We have the pixel level transforms and we have the spatial level transform. So if you want to get more information for each and every one of these transforms, it suffices just to click on this. So you could say, for example, let's pick this sharpen right here. Click on the sharpen and you have here the arguments and the description. Let's take another example from the spatial level transforms. Here we could take the vertical flip. So click on this vertical flip, you see, we have this float probability of applying the transform default to 0.5. Now note that if you're dealing with a very large dataset, that is, if your initial dataset is very large, then you could use probabilities between 0.1 and 0.3. Reason being that since your dataset is already large, it doesn't need data augmentation that much. Now, if you're dealing with a small, medium sized dataset, you could use probabilities between 0.4 to 0.5. Nonetheless, you could always pick whatever value depending on how it affects your model performance. Getting back to the code, we're going to make use of this example, TensorFlow data augmentation pipeline built with augmentations. So here we have the transforms and then we have the augmentation function, then data preprocessing, and finally integration with TensorFlow datasets. Now, based on the kind of dataset we're dealing with, we have to be very careful in the dataset augmentation strategies we're going to be using. So like, yeah, we could use this random rotates because rotation doesn't wipe off those sections which contain information that permits us to differentiate between a parasitized and an unaffected cell. Getting back to our random rotate 90, you have here this argument, a float, which is a probability of applying this transform default 0.5. So we could make use of this random rotate right here. Then just below, we have other different rotations. We have the random rotate 90 apply, which rotates the image a certain number of times. We have the geometric rotates where we can select the angle and which will do the rotation. Then we have this safe rotate would avoid this kind of data augmentation strategies like the cropping because here you could crop out information which permits us differentiate between the parasitized and uninfected cells. You also have this resize here, which we were using preprocessing our images. We have the vertical flip and the horizontal flip, which we are gonna use. So here we have horizontal and vertical flip. Let's add that here. We can also do just a flip. We also have this random grid shuffle, which we could make use of. This randomly shuffles grid cells in an image, meaning that if we have this kind of image and then we've picked a grid size of three by three, we could break this up this way. You break this up and then you have this three by three grid cell and then you simply randomly shuffle this position. So this one can end up here. This ends up here. This ends up here and so on and so forth. Could also have this random brightness contrast. So with this one, we'll take this all and then paste it right here. Random brightness contrast. The next one, let's take this sharpen. So that's it. You could use other data augmentation strategies you want. Always make sure that you visualize the outputs to better understand exactly what you're using. Now we define our transforms. We have our transforms, augmentation, compose, and then we have this list which is made of the different augmentations. Yeah, we just copy this out. Let's have this copied. But before this, let's do a resize. So yeah, we could have a dot resize and then we specify the image size and that's it. So we specify the image size. We're going for the horizontal and vertical flip. So let's copy this out. There we go. You have that horizontal vertical flip. For those random brightness contrast, we're gonna use the default parameters. Sharpen, we're gonna use the default parameters and that's it. We now have our transforms. So we could run the cell. We get in this error and this is because right here we have to have this. So it's augmentation. This is going from augmentation. We run this again. We get in some errors. Augmentation has no attribute sharpen. Even when we comment this one, we will also get this error. Augmentation has no attribute random grid shuffle. So we'll just comment this tool and then run that again. Okay, that's fine. That looks fine. We can also implement this augmentation one off. With this one off, either the vertical flip or the horizontal flip. So let's take this out of this year and put it right here. We have one off horizontal flip or the vertical flip. Now we're gonna have this year and then, so we have this list actually. We're gonna create this list and it's gonna be made out of this two transformations, which is the vertical and the horizontal flip. Let's have this year and close this. Okay, so we have defined that one off and then we could also specify a probability. So let's take P equals 0.3, for example. And there we go. This probability year actually defines whether the one off will be applied or not. And so we have that. We could run this again and that's fine. Drawing inspiration from this method given the documentation, we are gonna create this own method, aug-albumant, which takes in an image, creates this dictionary, feeds this information in the transforms, which we've created here, and then normalizes the image. So this is what we do with this aug-albumant method. From here, we could have our train data set similar to what we've been doing already. But what the difference is that instead of this, we have now process data, getting back to the documentation. In this process data year, we have, let's copy out this process data. In this process data, what we're actually doing, let's add this cell here. In this process data, we're taking the image, we're taking the level, we're taking the image size, and then actually modify this image and have this and then take this level and pass it to the output since the level remains unchanged. Now for this image size, we wouldn't need this so we could take that off. We're just taking the image and the level size, and then as this input, we pass in the image. The tensor out is gonna be updated type flow32. The function is gonna be arc albument. So arc albument is our function. And then we are gonna make use of this tensorflow NumPy function. Now getting to documentation, we see it wraps a Python function and uses it as a tensorflow operation. That said, we could still work in the graph mode even though we are having this Python code right here. And this is because tensorflow is gonna convert everything that goes on here as a tensorflow operation. So we could run this. We have the process data that looks fine. We could run this too. And then finally, we have our train data set. So we run our train data set, batch size not defined. We should have run this here. Okay, we run this cell. And then getting back, we run this again. It should be fine now. Okay, so here we have train data set. We could look at that, and that looks fine. We can quickly visualize our data set. We have the image, the level, and then we have an element pick from our data set. So now let's im show this, im show im. Around that, we get this error because we are dealing with batches of 32. So let's just pick one of these elements. Let's run that now, and that should be fine. Okay, so here's what we obtain. Now we could have many more plots. So we have this figure. We define the figure size for i in range one to 32, one and a half, plot dot subplot, subplot, eight, four, and then i. And then plot dot im show our image, i. Okay, we run that. And here's what we get. So we have this, and now what we could do is, let's get this cut out, which we have here in the documentation. Specify a number of holes, maximum height size, maximum width size, the field value. Always apply false and the probability. So when we apply this, you're gonna see exactly or better understand all these different arguments right here. So let's go ahead and apply cut out, which should be more visible as compared to the other transformation like the rotations that we did. So yeah, let's have this year, let's, you see how easy it is now to integrate any data augmentation strategy you have. So yeah, we have a, is that a cut out? And then we take the default parameters. We have that cut out, let's run that again. And what we need to do, because the train data set has been modified. So we need to like get the initial train data set we had after getting from here. Let's get back down and then check this out. Okay, so we have done the transform, all couple moments, run this, run this, and now we could visualize. Okay, yeah, the cut out hasn't been implemented. That is, obviously there's a probability of 0.5. So it's possible that we don't have cut out here. Now, if you look at this, you see clearly that in some parts of this, let's increase the size. Let's see, 15. Okay, as we're saying, if you look at this, you'll notice that in some parts or in some images, we have this portions which have been cut out. So you could see, for example, this one, you have this part which is cut out and then getting back to the documentation, we have your default number of holes eight and we could copy all this out and then modify it so you see how this could change the kind of output we get. Because here you see, when there's cut out, we have one, two, three, four, five. Surely you have the other holes or cut out regions in the black spot so you can identify them. And then here you have one, two, three, four. You see, you even have this, looks like a cut out. Anyway, that's the idea. You specify the number of cut out, number of regions you want to cut out. You also specify the value is going to tick. So like in this case, when you specify zero, it simply means you're going to be having this black spots here. So let's have that back here. Okay, so let's put this in here and then you have number of holes. So that's it, you could modify this and observe exactly what goes on. To increase the size of the cut out region, you could simply specify the values here. And then you have the fill value. You have this Boolean, always apply and then you have the probability. So if you want to always have cut out, you could simply send this probability to one and that's how it works. So we've seen how this works. We could go ahead and train our model using this augmented data. But before training, let's take out the cut out as it was just meant to show you how this works. Okay, so let's, we run this again, run this and then we visualize. As you can see now, there is no cut out region. So that's as expected. That said, we'll move forward to training our model again. So we run that training process. So the other results we get after training for several epochs, you'll notice that the accuracy doesn't go up to 99% as it used to be before. And also with this validation accuracy, we even get better results. Here's the result we get after training for several epochs. Notice how here the accuracy doesn't get as high as it used to be before. So the accuracy we're getting now is, the highest value of accuracy we get is like 94.5%. And the validation accuracy we get in year, it's about that, like the highest we get in year is like 94.48%. And this is the accuracy versus epoch plot we now get. Also, after checking our implementations GitHub page, we found this solution to those problems we're getting here where we weren't able to make use of the cut out or rather we were able to make use of the sharpen and the random grid shuffle. So yeah, when we do this install, which you have here and then uncomment the sections, you should now have this working. Hello, dear friends and welcome to this new session in which we'll be building custom losses and metrics. The first method we'll look at is building a custom loss method without parameters, the next with parameters. And finally, we'll build a custom loss class. And then from here, we'll go on to build custom metric method with parameters, custom metric method without parameters and custom metric classes. Most times when building models and training them, all you need to do is to pick this loss function from one of these losses available in the documentation. Now, in a case where you have to build a loss function from scratch or a custom loss function in which defining the loss isn't as easy as just specifying this loss function name right here. So right now, let's look at how to build such custom loss functions. Here we're gonna have this custom binary custom. Let's call it custom BCE. BCE stands for binary cross entropy. So we have the custom BCE loss right here and then we'll define it just above this cell. Now we have our custom BCE loss. There we go. And here, what we're gonna do is we take our, we define this BCE object, the binary cross entropy loss object. And obviously here we have binary cross entropy. There we go. So we've defined this object of this class and then we return this computation of the binary cross entropy of the Y true and the Y red. So here, this Y true is the actual output levels and here's what the model predicts. You could understand this better by taking a look at the documentation where here you see the define this object and then in computing this binary cross entropy loss, you see specify this Y true and Y red right here. Now the binary cross entropy could take several arguments, for example, from logits, levels smoothing, the axis reduction and the name. So that's it. We have this custom binary cross entropy and just here, all we need to do is to specify this name. So let's take this off and we're fine. Okay, now we just run this. We run this binary cross entropy. We run this metrics not defined. Let's run the cell, run the cell and then finally run the training. We get this error. This function takes zero positional arguments but to a given. So we have to have this year Y true and Y predicted. Okay, we run that and run that. We could see here how this training process looks similar to what we've had already when we're dealing with a preconceived binary cross entropy loss. Now that said, let's suppose that we are going to pass in a parameter here. So we want to parameterize the way in which we calculate this binary cross entropy. That is want to have this binary cross entropy multiplied by a given factor. Let's say we want to multiply this by a factor of say 0.5. In that case, we could have this year as a factor and then we'll pass in this factor right here. So we pass this in, let's stop the training and then we train to see how that works. We run this year and run this factor not defined. We could include this factor here. We run it. Factor still not defined. And this brings us to the way in which we define models or rather define loss functions in which we have a given parameter. So right here, we're going to define a loss year, a loss function, and then we pass in our Y true, Y pride. So it's going to be like similar to what we had with a custom BCE without a factor. We have that. And then in year we have this BCE and then we return this right here. Now notice how we take this tool off. So we have in just this factor and that's what we're going to be passing here. And then here what we return is this loss method right here. So we define this custom binary cross entropy. We return this loss method and this loss method. We take in the Y true, the Y pride and we carry out the computation as we had done before. But now note that we're multiplying this by a given factor. So here we could define our factor. Let's have this factor and your factor. Then before this, we'll define this factor. Let's take this factor to be equal one because knock is actually equal one. But supposing you are having a different problem, you may want to modify this parameter as you wish. Let's have this factor here. That's fine. Now let's rerun this and hopefully we should have no error. So that's it. No error, that's fine. And then we run our training. As you can see, the training process continues normally. And we could now check out on another method of building custom loss functions, which is that of actually building a loss class, which inherits from the Keras loss class. So that said, let's add this year. And then we have our class. We'll call this custom BCE and then we'll inherit from Keras losses loss. Okay, so that's it. Now, once we have this right, we have our init method. And just like we did here, we're going to pass in a factor. So let's have this factor. Then we inherit methods from the parent class. There we go. We have our custom BCE, custom BCE, and that's it. Okay, so we have this right. And then we now go straight forward to creating this factor variable and take this value factor. And that's it for the init method. From here, we're going to define a call method where those operations are going to be carried out. Now, here we have this call method. And then what we take in is ytrue and then ypred. Okay, so we've taken this. We now do exactly the same thing we did here. We could copy this out and paste it right here. We have our BCE defined operation carried out. And here we have self dot factor. Okay, so everything looks fine. We have our custom BCE defined invalid syntax. There should be a dot just right here. We run that again, that's fine. Okay, so we have that right. And what we do now is take this off. So here we have custom BCE, which is this class where define how we just defined right here. Okay, now we should also note that you could pass in the factor here. We actually have to pass in the factor. That's it, because we didn't specify the default value. So we have to pass that. Now let's run this. That's fine. And then we rerun our training and make sure everything works well. We get in this error. The call takes two positional arguments but three were given. Let's get back to this and we see this error. We do not put in the self. Okay, so that's it. And let's rerun this. As you can see, the training went on successfully. Now we know how to build custom loss functions with tensorflow. We then move on to building this custom metrics just similar to the way we deal with a loss. So here we have a custom metric. We'll start with a custom method. So we'll take this off. And then just right here, we have custom metric. Let's add this and this. We paste this out here and have custom accuracy. We now get back to the documentation, tensorflow, Keras, and then metrics. So right here, you would have this accuracy which has been defined. Here we go. We have binary accuracy. So that's it. We have binary accuracy. We specify why true, why proud, and the threshold. There we go. We can paste this out here. This is the loss. Let's go back up to this metric. Okay, let's paste it out here. So now we have this accuracy which takes in the why true and why proud. So we just return, we basically return this. We return binary accuracy, accuracy, which takes in why true and why proud. Okay, so let's take this off and that looks fine. Now we have our custom accuracy, custom accuracy. We run this as well. We get to this compile custom accuracy, fine. We compile our model and we train the model. Everything looks good. You could see this custom accuracy. Now let's suppose we had a factor. We're gonna use this same kind of approach we had with the loss method. So we just simply copy this out. And then right here, we have this paste that. Then we have our custom metric right here. Here again, we have custom accuracy, custom accuracy. We're taking a factor. We have this metric, we change it to metric. There we go. And then we simply have this. So let's just copy this out and paste right here. So here we have this here, let's copy this out. Okay, so we have this and we now multiply this by the factor. So yeah, again, we call on the metric method and that's fine. We run that, we have this custom accuracy. Oh, okay. We haven't stopped this training yet. So let's stop this training. That's fine. We get back here and run this. Looks good. Compile and train a model. We're told this takes one positional argument, but two are given. So let's get back to this and then include this factor. Run that. And the training looks fine. We can see our metric right here. Our next approach will be the class-based approach similar to what we deal with the loss. So here we have this custom loss class. We'll build a custom metric class beside this and this code. So we have that and then we have our custom metric class. This class here is a subclass of the class metric. We have our class metric and then just right here, we have custom accuracy. So we have this custom accuracy and then yeah, we specify the name. Let's have this name. Let's say we want to put here custom accuracy. Okay. And then we pass the factor too, which by default was set to one. So that's it. Now we have custom accuracy. We have our factor. And then what we'll do is define this accuracy by calling on the add weight method, which comes with this metric class right here. So we have this add weight and then we specify the name, which is the name we've passed. We have the initializer equals zeros. So that's it. You'll notice that this will look slightly different from what we saw with the custom loss as unlike with the loss class, which has this call method, yeah, would instead use three other different methods. The first we'll be looking at will be the update state method. We have update state. Then next we will have result method and then the reset method. So we have this reset right here. The update state method here permits us update our metric state. The results permits us to output the metric values and then the reset permits us reset the metric states. So we have here reset states. Here we go. We start with this update state method. In this update state, we're going to be assigning a value to this accuracy variable. So just right here, we have self.accuracy. We assign a given value. Now the value we're going to assign is exactly the same we had here with this binary accuracy. So let's take this factor and then paste out here. We paste it here. Okay, so this is our self.factor. That's fine. So that's it. We assign this value to this accuracy variable. This method to takes in the Y true, the Y bright. And then with the result, we have a return. So here we have this return self.accuracy. That's it. And then here for the reset states, reset states, we are going to reset the state at the end of each epoch. So once we finish with one epoch, we reset the state and then re-update the state and then output this result. So here again, we have self.accuracy.assign. The value is zero. That's fine. Let's run the cell. We get this error because this is actually metrics. So let's take this off. We have here metrics. We run that again. It should be fine. Okay, so we have our custom accuracy built. We now get into this year. So let's take this off now and specify our custom accuracy. Then here we specify the name. But anyway, we've got in default value. So we could just pass it this way. Let's run the cell and then we will start the training again. We get this error of this state, got an unexpected keyword argument, sample weight. Now we get into this error because year, we didn't put the sample weight as one of our arguments. So yeah, we should have the sample weights. We're not going to be using that. So we just set it to known. Let's run this again. Fine. We recompile and then train our model. We get this other error. The input Y of equal op has type float 32. That doesn't match type 64 of argument X. So this is actually an error which comes up because right here when competing this binary accuracy, we have Y true and Y paired of different data types. So let's print this out and you could see that. Y true and then Y paired. We run this again. Looks fine. We get down to this year, this compilation. Recall when we want to debug, we could run eagerly. So let's run eagerly, set that to true. And that's fine. We run this again and try to understand what's going on. So as you could see, that's exactly what we expected. We have this int 64. So this is an int and this is a float 32. We have the Y true. So let's, we could cast this year. So let's cast this value tf.cast. And then we specify the D type to float 32. Float 32, that's fine. And then just year we take this off. Okay, so we run this again. There we go. We now compile and then train. We get this other error, cannot assign value to variable custom accuracy. The variable shape and the assigned value shape aren't compatible. We get back to this year. And what we do is we're going to print out this output, which we assign to the accuracy. Yeah, we define this output. And then we simply take this out from year and have this as our output. And now we're going to assign this output. So since we now have the output, we could print it out so we understand exactly what we're assigning to the accuracy. Let's run this cell, that's fine. And then we get to this compile and train the model. Okay, there we go. So now we see exactly what we're assigning, but you'll notice that instead of just assigning a single value, we assign this list. And this is the reason why we're getting that error. To solve this problem, we'll also have to understand that this output actually corresponds to whether the model's prediction was correct or not for each and every element of the batch. So when we have a one like this, it means the prediction was correct, or the model's prediction was correct, and one and so on and so forth. So because we've trained a model for quite a while, you see that for all this, or for this particular batch, we have kind of like 100% accuracy. So instead of printing out 100%, what a model gives out is this list. Now let's look at this again. We see now that with this other batch, we have some zeros. So out of 32, we've had three zeros. And hence, our accuracy in this case is 90% for this batch. That said, we have to figure out a way of counting the total number of ones and then dividing by the total or by this length, which in this case is 32, or by the batch size, which is 32, and then multiplying by 100, or we could just let it like that. So we make use of this method, count non-zero, where here we could count the number of non-zeros values very easily by just calling it, and I'll be fine. So let's copy this here, and then instead of having this output, we'll count the number of non-zeros. There we go, we count that. And now we divide by the length of this output. So we divide by the length of the output. But this length is gonna be an int, so we're gonna cast this so we get a float. We have that, and then D type equals here, float 32. Looks fine. Okay, so now we have that, and then we could, let's take off this output, and then we run again. We then compile, and then train the model. We still get another error, so what we'll do is we're gonna just check in the documentation, and then here, what do we see? We see that by default, this is an int. So here, we could also cast this, we cast this into a float. There we go, we have this output. We don't really need to cast that because we could specify the data type. So here we have the output, and then D type equals TF, float 32. Okay, so that's it. Hopefully this should be okay now. We check, we compile the model, start the training, and everything looks okay now. We could now stop this and then get back to graph mode. So let's run again so that we could now train even faster. And that's it for this section. We'll see how to build custom losses and metrics. Thank you for getting around to this point and see you next time. Hello everyone, and welcome to this new session in which we'll see how to switch between the ego mode and the graph mode in TensorFlow. We'll first start by understanding what these two modes mean and also when to use either of them. Then finally, we'll see how to switch between these modes by simply adding this TF function decorator right here. So far in this course, we've been building methods like this one in the ego mode. That is, we've been following a Pythonistic approach in creating these methods. Apart from this ego mode or this ego way of manipulating data, we also have the graph mode. In the graph mode, we have data which is passed into these different nodes right here. Now, these nodes are operations. And so when we pass in our data, which could be a tensor, this data gets modified and then passed to the next node and right up to this output right here. This means that if we consider this line of code where we have X equal Y times Z, TensorFlow is capable of converting this into a graph with a single node where this node will represent this multiplication operation. So here we have this multiplication operation and then we have the two inputs, Y and Z. Then our output X. Let's draw it this way. We have Y, Z, the multiplication operation. Then we have X. Now this X, this data we get can be passed into two other operators or two other operations. So here we could have addition and here we have subtraction. Then we could combine this to finally have, say an addition operation and then get the output. Now let us come up with some code which will represent this rest of our graph. Here we've had X and then we could have, for example, R which is equal X plus a certain constant. So X plus a constant, let's call that constant key. We have X plus a constant key, let's just say constant. So this addition we have here is actually an addition to a fixed constant and that's it. So here we have R and then we could define S to be equal Y minus this same fixed constant. There we go, we have S. Now we have S. We could get an output by adding up S and R. So let's call this output T. We have T which is equal R plus S. Now this addition here is different from this one as this one takes in just one input. Since this input is gonna be added to the constant and the subtraction here takes in just one input as this input is gonna be subtracted or the constant will be subtracted from the input which in this case is meant to be X actually. Let's take this off, this is X because here you have, after this multiplication, you have X and this is X that gets in this and the same exact goes this way. So here it's meant to be X and that's it. So here we have now S. Once we get S, we have S here, this tensor S and then here we have our tensor R. Now R plus S gives us an output T. So we see that under the hood, what TensorFlow is capable of doing is taking this Python code you write and convert it into a graph like this one. One advantage of working with the graph mode is that this now becomes portable or this code now becomes portable since in a case where you don't have the Python interpreter, you have now this data structure which could be used in any environment. Hence making our code or this method more portable. Another advantage is that since we now dealing with this data structure, it can be broken up into several or separate blocks hence making it easy for parallelization leading to faster computations on devices like GPUs. The good news here is in order to convert this method, for example, into this graph, all you need to do is to add this TF function decorator. And when once this is done, when TensorFlow gets this block right here, it does what we call tracing. During the tracing process, this graph right here is generated. And once generated, each time you'll be calling this method, you now have your inputs passed into this data structure and your outputs generated without you going through each and every step in this method. Getting back to the code, all we need to do right here is to include our TF function decorator. And this automatically is turned to graph mode. That's fine. Run this. And then for this augment, we just add TF function and that's it. So there we go. Now right here in this Route 90 class, what we'll do is at this level of this call, that is when we compute in this Route 90, we will have this decorator put out here. So that's it. That's fine. We run the cell, augmented layers. Okay. Then this augmented layer method too, we have TF function. Now note that previously we had seen this Resize Rescale layers. So let's get back. We have Resize. Okay, that's the Resize Rescale layers. But if we were calling instead this Resize Rescale, and that we had already applied this decorator right here, then it would have been needless putting this decorator here. So in fact, what we're saying is if you have a big function and you have this function, say let's say we have A equal a function call to small function. There we go. And we return some value. We have that. Let's say we return known. Okay, so we have this function of this method. We have this method defined and then we have the small function right here. Then we could also, let's define small functions. We have small function and then we have return whatever value is in known. So what we're trying to say is if you have this decorator, this TF function decorator put out here, it's needless having it in this small function because once you put this, then all the methods which will be called in this method will automatically be converted to graph mode. So that's how this works. We take this off. Let's explore another interesting effect of working with a graph mode. So right here we have print. Let's say I was here. Okay, and then what we're doing here is we're going to call this method. We're going to call this resize method the first time. We have resize, rescale. We pass in the original image, original image and the level which we got from here. So that's it. We run this and we'll see we have this I was here. Now we'll repeat this several times. So let's have this. And repeating this, we should expect to have this printed out three times since we run this actually three times. Let's run this cell. And we see that this is printed out only once. Okay, that's what we notice. Now let's take off this TF function decorator and run this again. We see that this is printed twice. And the reason why we have this is because in the graph mode, what we have is a data structure, a graph data structure. So we have this graph, we have this nodes which have been linked to one another. And so when we write code, input output. So when we write code like this and it's converted into this graph format, the very first time we make a call on this method, we actually carrying out tracing. Now tracing permits us to convert this into this graph format. And then the next time we make this call, since in the graph mode, we basically storing this operations and the data that's gonna be passed in between the operations, we are gonna focus on only this portion of this method right here. So this print is not gonna be taken into consideration for the second time and for any other time. So that said, we could run this for as many times as we wish and we wouldn't. Okay, let's take this back to the graph mode. So this is the eager mode, taking it back to the graph mode, you see it's gonna be printed only once. And then you could convert all the methods we're using into the eager mode by making use of this, config run functions eagerly function. So yeah, we have tf.config.run functions eagerly and then we set this to true. So we're saying that we're gonna run all these functions now eagerly and then let's copy this out. So let's copy, let's say we take this three and then paste right here. You see, once we run this now, you see it's printed out twice. This will tell you that even though we have this tf.function decorator right here, all those methods now are run eagerly. We could also set this to false. So yeah, let's take this. If we have this false, we run it and there we go. You notice that we have nothing printed out and this is simply because the tracing has already been done. The next point we wanna make is in the case where you're working with the graph mode, then there is a specialized print you could use. So yeah, instead of using the print, some sort of this print for the eager mode, the usual Python print, you could now use this print which is specially made for working in the graph mode. So here we have this print here and you see that the output of this tf.print is different from that of the print because now this tf.print makes this method look like a normal method as it goes through this print each and every time. Let's come in this and have this. So you see how this is printed. You see it's printed only once whereas if we're having the tf.print it's gonna be printed twice. Then from here, always ensure that if you turn away in the graph mode, make sure all your operations are tensor flow operations. So if you are able to resize, for example, with OpenCV, it's actually a good thing but you have to make sure that that resizing or you have to try as much as possible to have that resizing done with tensor flow operations. And so instead of CV2, the resize, for example, is preferable to use this image resize method. And then in order to keep everything in this function right here based on tensor flow, we should avoid passing in Python variables into this method right here. So you should make this method depend only on tensor flow variables. In conclusion, we've looked at the graph and eager mode with tensor flow. How to leave from the eager mode to graph mode and vice versa. Thank you for getting right up to this point and see you next time. Hello everyone and welcome to this new session in which we're gonna go under the wood and understand exactly what goes on when we call on this feed method right here. In fact, we're gonna be training our own model without making use of this feed method. So you're gonna have this custom training loop which we're gonna build. As you could see here, we have this training block. We have this validation block. And then we have this method which we named neural learn. And this method which replaces the feed method which we used to get in. TensorFlow is not only a great machine learning library because it permits us build and deploy machine learning models. Its greatness also comes from the fact that it permits us build these models and train them without getting to know everything which is going on under the wood. So all you need to do to train a TensorFlow model is to specify that model and then make use of this feed method while passing in your data set which is made of the inputs and the output. That said, in this section, we'll see how to create our own feed method. That is, we'll go a step further in understanding how TensorFlow works under the wood. Recall that when you make use of this feed method right here, what goes on essentially is TensorFlow applies the gradient descent method to update this weight data right here. So all the status have been updated until the model converges. Recall that for given inputs X1, X2 and outputs Y1, Y2, Y3, our aim is to update this weights such that when we pass in this inputs into the model, we get an output right here which looks practically the same as this outputs. And to know whether this model outputs are the same or similar to this actual outputs, we apply or make use of a loss function which computes the difference between these two. And generally, initially, when you start training the model, that's when you start updating these parameters, there will be a great difference between these two. But as you keep on updating these parameters, we notice that the loss starts dropping. And so once this loss drops until it starts, it converges, we then stop the training. But recall also that the way we update this weights is such that we take the initial weight minus the learning rates, which is a constant we define times the partial derivative of this loss. Now the loss, which we've seen here, which we understand that is the difference between the model's predictions and the actual predictions. So here we have this partial derivative of the loss with respect to the parameter or the weight in question. And so if we want to update say this theta double prime to one, we'll simply use this same approach. Now the way tensorflow computes this partial derivative right here is by automatic differentiation. And so when you have this model with the inputs, as this inputs are being passed into the model in this forward pass right up to the output, tensorflow records this gradient or this partial derivatives. And this is done for each and every parameter such that once we get the output and we compute the loss, we could now get this whole partial derivative right here. That is the partial derivative of the loss with respect to each and every weight. And then each and every weight has been optimized or been updated via this formula. So in general, we can have theta, whatever. And then we have, let's say I G, we could have I comma J equal theta I comma J, minus linear rate times partial derivative of the loss with respect to theta I G. And so as we go through this forward pass, what tensorflow does is it keeps in mind the thetors and the gradients. Let's call this gradients the theta. So we have the theta and the theta which have been stored or which have been recorded as we go through this model that's in this forward pass. That process of recording this gradients here as we pass in our inputs in the model is similar to what we would have with a tape recorder where here you'll just need to speak in this mic, the information is recorded and then it could be replayed later on. And so here as information has been passed in this model, this gradients have been stored and they could be used in the gradient descent algorithm later on. Getting back to the code, we'll now see how to build a fit method from scratch. So here we have this, let's have for epoch in epochs. You see that now, unlike here where we just needed to pass in this number of epochs and TensorFlow does the job, here you really need to do much work on your own. Okay, so we have for epoch in epochs, what we're gonna do here is go through each and every batch of our data set. So we also have that for each epoch, we are now going to go through each and every batch. So for X, batch, Y, batch in train data set, let's have this tuple. So we have X batch, Y batch in train data set. What we're gonna do now is pass in this X batch into our model and then after getting the model output, compare that model output with actual output and obtain the loss. So here we have Y pred, which is what a model outputs. We have the model, here we've defined the model already. So we working with a sequential model we had built previously. Here we go, we have this sequential model. And then in this model, we're gonna pass our X batch. There we go. We specify that we're training. So in training mode, training equal true. Okay, so now we have Y pred and what we could do next is compute the loss. So here we have loss equal loss function, which we're gonna define loss function, which is gonna take in our Y pred and the models and the actual Y. So we can make use of this custom BCE right here. Now this takes in the Y true, Y pred. So we should ensure that we have Y true before Y pred. And here we have, we should have Y batch and then Y pred. Okay, so let's take this off and that's it. So here we're gonna use our custom BCE. We could take this off and then specify custom BCE. Now we have our custom BCE, we could go ahead and update this model's weights. But in order to update our model's weights, we need those gradients. The way we get those gradients is by recording or taping these gradients as we pass the input into the model. To do this, this part of our code has to be put in a particular scope. So here we have with gradient or tf.gradient, gradient tape as tape. We are gonna have this, let's have that. We're gonna have this two lines of code. Now the effect of doing this is we are recording the gradients in this tape or in this recorder. We could say this as recorder. And now all the intermediate values of tater and intermediate gradients have been recorded and now could be used in computing this partial derivative right here. That said, we can now obtain the partial derivatives. So we have partial derivatives equal the tape. So this, or rather equal the recorder because we changed that to recorder. So equal this recorder dot gradient of the loss with respect to all the model's trainable weights. And so this line of code here represents this operation that is computing this partial derivative of the loss with respect to each and every weight. Then we move on to optimize our model by going through this stochastic gradient descent or some other gradient descent based optimizer like the Adam optimizer which we've used so far. That said, we get back up to this compiler. We have this optimizer. Let's take this from here, copy that. And then what we'll do is we are gonna add a cell. Let's take this up. We have our optimizer here. So let's have this optimizer. That's it. Our Adam optimizer defined. We run that cell. Okay. And then what we do now is we have optimizer, optimizer dot apply gradients. So we use this apply gradients method in order to update the weights based on the gradient descent algorithm. In here, we have zip of gradients or rather the partial derivatives. So we are passing in the partial derivatives and the model trainable weights. So recall we have some untrainable weights actually. So we have this model trainable weights right here and the derivatives. Let's scroll. Okay. Let's have this here. This is linked. So this part is linked to this. And then for this, it's the whole algorithm, the whole gradient descent based algorithm which takes in this partial derivative and the model trainable weights, that's the Thetters. Obviously while defining the optimizer, we have already set the learning rate. So there we go. Then from here, we could print out our loss values. Here we could print out the loss. Print of the loss. Okay. So now we said we could now run this and see how this works. Let's not forget to specify the number of epochs. So let's have number of epochs equals say three, four stat. Let's have this epochs. And then we run the cell. We are told here that the model is not defined. So let's modify this and have a lunette model run that again. We get in this error where we told this expects the shape while what we get is this. Now the fact that we have this means that we haven't yet resized our training data. So let's get back up and then run this cells here for resizing. And there we go. Let's run the cells. Okay. We have that. And now it looks fine. Okay. So we have the cells run. Let's get back to our custom training loop. Okay. So let's run this. As we can see the training goes on smoothly. You have those loss values we should drop in and we run this actually for three epochs. So there we go. We have our loss, which is dropping. And then we could decide to print this out after a given number of steps. So here we could add this step and enemy rate, enemy rate. There we go. We could check out our free Python costs on neuralland.ai to understand this in case you're new to all those keywords. So there we go. We enemy rate. We have step. And then what we're going to do here is we're going to print our loss values only after a certain number of steps. So given that we have 689 steps, that's if our batch size is 32, we'll suppose that after say 300 steps, we're going to print this loss out. So here we're going to have, if the step modulo 300 equals zero, then we'll do the printing. So there we go. We have this print. And then for every epoch stat, we're going to print out training stars for epoch number. And then we format that. So here we have the epoch. Okay. Let's rerun this again. Here's what we get now. So we have this training stars, training stars, training stars. And then we have this intermediate loss values which have been printed out. Now, what if we include the validation? So here notice we finish like we train the model. And then after training for one epoch, we don't, we then get into validation. So for X batch, val, Y batch, val in val data set, we'll simply copy this out. So here we have this year. Let's paste it out here. Okay. So here we have the Y PRED and our loss vowel. It's still the same model, but your training is set to false. We have training false. We have Y PRED vowel. And then we have Y batch vowel. Here's X batch vowel too. And then we print out the loss vowel. So here we have validation loss. Okay. Let's run this cell now. While training is going on, we could include the metric. Here we have this metric, which is binary accuracy. Binary accuracy. Now while training is going on, we could deal with the metric. So here for this, for each batch we're working with, we are going to update the state. We are going to print out the result and then we are going to reset the states. So let's get it back to this part. And then just after this, that's for this particular batch, what we have here is we have our metric that update state, and then we pass in our Y, our Y batch. That's our Y batch, true Y, and then Y PRED. So once we update this metric, we can now print out this metric value out of this for loop. So after each epoch, we're going to print out the metric value. And so here we have print of metric dot result. There we go, print metric dot results. And then we should have here the accuracy is, there we go, accuracy is, that's it. Once we print out the result, we now go ahead and reset states. So we have metric dot reset states. And that should be good. So we reset the states and that's fine. We could repeat this exact same process for the validation. With the validation just in here in this batch, we are going to repeat the same step. So here we have the metric, let's pass it out here. And then here we're going to have metric vowel. Metric vowel, we update the states. You're going to use Y batch vowel and Y PRED vowel. Okay, once we update the states, we now do this same process here. So we print and then we reset the states. Let's have this year print and then reset states. Let's take that off. Okay, so we have here metric, vowel and then metric, vowel and that's it. Let's define metric and metric vowel. Here we go, we have metric and then we have metric vowel. So that's it. So here we see how to recreate this feed method that we have here. Now that we've seen this, we see the training is going out well. We see the training loss, the validation loss and that's it. But the training loss, we're getting years for each and every 300 batch. So let's take this off from here. Let's say we print this out after each epoch. So the training loss, we print it out after each epoch. And that's it. This means that we don't really need to have this again. Let's take that off. So we print out the training loss and the accuracy. This should be good. We now run this. And here's what we get. You see, we have the train loss. We have the train accuracy, validation accuracy. We've changed this to vowel accuracy. And then we have the validation loss. For epoch two, we have the same. For epoch three, we have this. For now, we're in the eager mode. So obviously this training process is gonna be slower as compared to if we're working in the graph mode. Now, what we're gonna do is we are gonna pick out those competition expensive units of this cell right here. Now, we know that this is competition expensive because here we have to pass in our input into the model, compare this output, compute the loss, get the gradients, optimize the model, update the metrics. So we look at this as one of those blocks. And then we have this other block. Let's scroll up. We have this. Let's take this off. So here we have this block. And then we also have this other block. Now, you'll notice that this is what we call the training step, and then this is the validation step. So what we could simply do here is create these two methods. So we have train. Let's call this training block. We have the training block. And then what goes on this training block is we take in a X. So you have X batch and then Y batch. Basically, we just copy all this. So here we copy this, let's cut that. And then we paste it right here. Okay, so here we have this training block. And then here, instead of having this, we just say for each step, we're gonna go pass through the training block. Pass through the training block. We pass in X batch and Y batch. Then we'll repeat the same process for the validation. Here we have this copied out. And let's copy this. Okay, so here we have the validation block, or let's say, wow block. Here we have the vowel block, which takes L batch vowel and Y batch vowel. There we go. We have the string block, validation block. Now, just here, we could paste out. Let's create our validation block. And then take X batch vowel, X batch vowel, Y batch vowel. There we go, we paste this out. So for this, we have that. Now, since we compute this loss in the training block, we wanna return this loss. So we return this loss. Okay, we have the loss returned. And then just right in here, we are gonna say loss equal this. So we call this loss, and now we could make use of it right here. And then we'll do same for the validation block. With the validation block, we take this here, and then we return our loss vowel. Now that sounds great. We should now be able to change this in our graph mode. So we have TF function. That's all it takes to convert this to graph mode, TF function. That's it. So there we go. We've converted this to graph mode. We can now run this. We're getting this error. Let's check on what's going on. Okay, let's run this now. We told on and then doesn't match any outer indentation level. Okay, yeah, we should put this to match up with this width. And that should be good. Let's make sure we don't have the same error here. So we have this four and that's it. Okay, so let's run again. And obviously we should have no problem. Now we could retrain our model and we should get faster training this time around. But still we're getting this error. Loss vowel must be defined before the loop. So we get back to the code. And what we could do is comment this section, which is for the training because this works already. So let's comment the section and then focus on the validation block. The first remark we have here is we've made an error of putting this for loop in this valve block. It's not actually a syntax error, but more of a design error. So it's better for us to have this valve block in this for loop instead. So that said, we're gonna cut out this from this valve block. And then what we'll have here now is for this. So we have for XY in the valve data set. You see, we're gonna pass this in here and then have this valve block. So here we have this valve block now. Everything looks fine. We should have that. And then let's now run the cell again. You now see how just recorrecting that error automatically solve that issue. And then we could go ahead and uncomment the section and we run our training. While the training is going on, we are gonna create this method. So this method, basically we're gonna copy from this. So here we should add the cell. We add the cell. And then we'll define this method, which we'll call train. Or let's call it neural learn. So we'll call this method neural learn. And our neural learn method, we are gonna take in a training data set. So we have train data set and then our validation data set. We could also get in the model. So we have the model. We could pass in the number of epochs. Let's have that year. So it looks like the model feed, which comes with tensorflow. So here we have the number of epochs. And then after the model, we have the last function, last function, metric, vowel metric. And that seems okay. So here we have that. We didn't have this call. So here we just have neural learn. And then we pass the model, the net model. We pass the last function. Last function. We pass the metric. We pass the vowel metric. Vowel metric. We pass the train data set. We pass the vowel data set. And then we pass the number of epochs. So all we need to do now is just run this and then our training process is gonna start. Getting back to the training year, you see how this training loss, you see we have the training loss. We have the accuracy. We have the validation loss and we have the validation accuracy. Let's stop this. And then we could even take off the cell. Let's take off the cell. Okay. And then we have that cell off. Let's delete that cell. And then focus on this. So now we have our neural learn, which takes in the model. So here, as you could see, we have this optimizer. We have the metric. Metric vowel, epochs. Let's get back to the optimize. Let's add the optimizer here. So we should have after this last function metric and then let's have optimizer. Optimizer. There we go. Now after the metric year, we have our optimizer. It was fine. Let's run this. And this loss function not defined. So let's scroll up and we have this custom BCE. So let's just put out the custom BCE right there. Custom, custom BCE. We run that. Training is now complete and you're the results we get. You now know how to create your own custom training loops and make them run in graph mode. Thank you for getting around to this point and see you next time. Hello everyone and welcome to this other amazing session in which we are going to see how to integrate TensorBot callbacks with TensorFlow. In this session, we are going to look at how to log in information from our training process or from our different experiments into TensorBot, how to view model graphs, how to do hyperparameter tuning with TensorBot, how to view distributions, histograms, time series, how to log image data like confusion matrices, RSC plots, and finally how to do profiling with TensorBot. In one of our previous sessions, we saw the importance of working with callbacks as they permitted us to modify certain key information during the training process and also store certain information and do certain modifications during the training. That said, we have this TensorBot callback right here, which we spoke of last time but didn't really get into. And so in this session, we are going to go in depth and see how to make use of TensorBot to visualize on a web interface vital training information. You'll notice also that this TensorBot callback is going to be used in a similar way to the way we had done with the previous callbacks. So yeah, we define the callback. And then just here, you see in this callbacks argument, we pass in the TensorBot callback right here. So let's go ahead and copy this out. We have this, we just copy from here. We've copied this clip bot. And then now we go ahead and see how to pass this in our training process so we see exactly how it works. So yeah, we open up this callbacks. You see the different callbacks we have created previously. And then now we're going to include our TensorBot callback. So just yeah, let's add this text and add a code cell. We paste this out. Then let's take all this off and simply go, let's run this cell, that's fine. And now let's move on to our training process. Here we're going to have the callback. So we have the callback equal TensorBot callback. And then we're ready to train our model. So let's check on this. We are getting an error, unexpected keyword argument callback. Okay, you should have callbacks. We'll run that again. Now while our model is training, you'll notice that there is this logs folder which has been created. And the reason why the logs folder has been created was because we actually specified that in this TensorBot callback call. So here we have this log directory argument. And we're going to store in some information or some training information. And this is training information that TensorBot will use to display very important training information on a web interface. Now you could click open here and you'll see that you have these two folders. In this train folder, you see you have this information for content in the training data, which has been stored here and information for the validation, which has been stored right here. The next thing we'll do is copy out this here, this command, and then run it just below. So let's get to where we have visualizations, like reduce this one. And we have the visualizations here. Now we're going to have this. We're going to see how we're going to replace these visualizations with our TensorBot visualizations. So it passes out. We have TensorBot, log, dire, path to logs. Then we created this log dire variable, here, which takes in our path. So let's get back. And then instead of putting our path directly, we just have actually this log dire. So we have the log dire. Now before running this, we are going to add this code cell and then load this TensorBot notebook extension. So here we have load extension TensorBot. We run this and then you see already loaded. We are loaded already. Anyway, we have that and then TensorBot now. So we run this and then we should expect to have some interface which contains all our log data. No dashboard active for the current data set. Check out on scalars, what do we have? Anyway, let's do this. Let's check on this. Let's have logs. Logs, run this again. Okay, and there we go. We see now that we have this interface, which pops up or what do we see here? We have this logs. We have both training validation. You could pick this out as you would pick only the training data. You could see only training data. And then here you have the scalars. The scalars basically stores all this information which we pass in here, like the loss and the metrics. Call that we had defined this metrics here. So we are going to get all this metrics information. So unlike previously where we had to manually do this step and do this step for the loss and the accuracy, now this is done automatically. So let's run this and then we will compare what we get here with what we get from TensorBot. You can see here we have this loss and then we get back to TensorBot. We have this level of scalars. Let's reduce this accuracy. Let's view the loss first. So let's reduce this. Okay, we check out, we have this epoch loss. Now this is what we get for the loss. Let's include the validation. So you see, you have this plot right here. Click on this, you could also expand. So you see, this is the plot we get. Now, what do you notice? You notice that it's exactly the same as what we had here. But this time around, we didn't have to write any code. All this information was automatically locked in this file here and TensorBot took care of the rest. So that's how this works, it's really very interesting. And it's a kind of tool we want to master how to use because when working on different machine learning experiments, you wouldn't want to always have to lock all these values by hand or manually, you want to have this done automatically. Now, one of the interesting point is, you have all the metrics here. You just have to select any one. So let's look at the accuracy which we've seen already. We can check out this. Let's click here, okay. So we see this accuracy and then let's scroll down. You see, you could compare it with what we have here. See, we have this zip around the, this should be the ninth epoch or let's say eight epoch. Anyway, let's come back and check here. Let's scroll up from here. So actually an interface in this other interface. So we have this, you see, at this ninth epoch, what do you see here? You see, notice on this here, you will have this name, trains, modad, value, what do we have, step, ninth step, the time, and that's it. Okay, so there we go. We see how we could plot all this automatically and then you could get at any point. So you could go through each and every point and then get all the exact values. So that's how we look at this. Let's reduce this. We could take now, let's say a precision. So you could see, monitor the precision. You could also check out the number of false negatives. You see how this, as you keep on training, the number of false negatives keep reducing. And then the false positives, what do we have here? False positives, that's loading. Why that's loading, let's scroll down and we have the loss. We've seen this loss already. We have, let's look at true negatives. That's loading true positives. Here's a plot we get for the true positives and the true negative. And in the section where we have this evaluation, evaluation, evaluations, practically the validation metric and loss that we're plotting against the number of iterations. So that's what we have here. It's just like the validation accuracy versus iterations. If you take off the train, you see nothing really changes here, but when you do this, you see all that goes. So that's our validation and we could monitor this and observe that the highest accuracy we have is 94.09%. No, it's 94.21%. Scroll down, we have precision. What's our highest precision here? It's 93.6, no, the value is 94.39%. As of now, we've been able to log this information just from compiling our model. So because we passed our loss and the different metrics, we're able to log the information and visualize it on tensile board. But there are other possibilities. That is, it's also possible for us to log information manually instead of just logging all of this information. So what we could do is log, for example, image data. We could log even this different learning rates here. So we are gonna log or we can log the learning rate values for each and every epoch. And this gives us that freedom to log in just any kind of scalar or quantity we want. So here we have this metric. We're gonna create a metric directory. This is gonna be logs and then here we have metrics. And what we're gonna do now is create this train writer because now we are doing this like manually. So we create our train writer. We have tf.summary.createFileWriter. And then in here we have our metric directory. Okay, so that's good. We have our train writer created based on this metric directory, which is gonna be in the logs. So in those logs, we're gonna create a metrics directory. So that's it. We have our train writer, we could run the cell. Then just in here we have width. So once we're done with this, with train writer as default, with train writer as default, we wanna have this logged. So we have tf.summary.scalar and then we specify that we're dealing with the learning rate. So our learning rate. And then we're gonna pass in the learning rate actually. So here we pass in the learning rate and then we pass in the epoch. So this is like, as you could see here, you have data. Look at this popup. You see here you have data and then here you have steps, or rather step. So you have the name of the scalar, you have the data and you have the step. You also have the description. So here we have the name, which is learning rate. We have the data, which is this learning rate. And then we have the step, which is the epoch. Now we'll have to do this, like with this, we have to put it in each and every one of this since either we get into this or get into that. But to avoid writing this twice, we just have to set a learning rate. So in here, we define this learning rate, learning rate. And then here too, we have our learning rate. There we go, we have the learning rate. And then out of this, we return our learning rate. Okay, so that's it. Now we return the learning rate and then we are also logging this data. So here we should change this and have learning rate. So that's it. Now we have the set and then we could go on and train. Before training, recall each time you want to log in this kind of data or this kind of custom data. First thing you do, you create your writer as you create this file writer. After creating the file writer based on a given directory that you set, you now go ahead and then put this in this train writer scope right here. So with this, we can run this now. We've run this already, we can run this now. And then since this is our shadow log callback, we'll have to add this in our feet. So let's go ahead and add it in this our feet method here. Let's take now just five epochs. So yeah, we have tensor bot callback and then we have scheduler callback. Okay, so we run this now and see what we get. That's training. Our training is now complete. Let's go ahead and check this here. We have our logs. You see, we have this metrics and we have this locked. Next step, we go to visualize. So here we have this tensor bot and then here let's do, anyway, let's run this first. So you'll see what we get. As you can see right here, we now have this learning rate which has been logged and which we can visualize. So you see, as we go from this epoch to this epoch to this, this, if we set this at zero smoothing in, we have this. Now, the reason why we, after this, we don't get any value is because of what we have here. So we sent the learning rate into this tensor and what we should be doing here is getting that NumPy. So we should have this learning rate right here, learning rate equal the learning rate because this is going to be converted into a tensor. So we have learning rate at NumPy and that should be cool. Okay, we run this again and that should work. And then to view or to have immediate response, let's set this to just one. So if the number of epochs is greater than one, then we would have the learning rate being modified. So let's get back again to training. This time around, let's see, we just have three epochs. Okay, this time around, we have this actual value locked. And so we get back, let's reduce this custom training loop. We get back to our visualizations. We will run this again. Getting back here, we see we start with this loss and then this drops after this second epoch right here. As you may have noticed, each and every time we run new training process, the previous values or the previous locked input data is being deleted. So what we could do now is we could modify this file name as this logs file we're using here, where this folder name now depends on the current time. So yeah, we're going to have date time, date time dot date time dot now. And then we get a string, string from the time and then format this output. So yeah, we're going to have percentage, the day percentage, the month, the year and then we specify the exact time. So you're going to have the minute and the second. Now let's go ahead and import the time up here. We have the time. So yeah, we import the time simply that should be imported. Okay, so we run that and that should be fine. Okay, we've imported the time. Let's get back. And then let's print out this log diet right here. Let's have log diet printed out. See what we get. See, we have this logs and this is actually on your folder. Now, if you print out this again, you see we're going to have a different folder. And this is important because each and every time we do not need to cancel our previous runs. From here, let's take this off and then set this as our current time. So current time, current time these equal this, take this off. Here's our current time. And then we have this plus current time. Okay, and then yeah, we'll do the same. We have this plus current time. Let's take this off and then have the current time here. Current time. If we get back to the straining, we can run this again. If we're now complete, as you could see here in this logs, we have this new folder created, which is dependent on the time in which we decided to do the training. And then in this metrics, we also have this. But what we want to have is actually just this one photo which contains a train validation and the metrics. So we should modify this right up here. So instead of having this metrics before, we should take this off. We should take this off and then add it later on. We have plus and then we add this. Okay, so we have this slash and then slash. Then we should take this now. So that's it. We have recreated this and then we will rerun again to avoid this kind of error. So let's go ahead and retrain our model. Let's have it to be, yeah, it's fine. Let's say two epochs and then we'll run that again. Okay, the training is complete. Now you'll see that if you open this up, you have train validation. This is what we had previously, but now we have metrics, train, validation. This is exactly what we want. We want to be able to lock all this into this one directory. And you'll now notice how we do not have to erase previous logs. So let's go down to running this again. We run this. Take this off. Okay, here we go. We have this information now locked. You see, we could take all this previous logs out and focus on just this log here. So let's have this one. This should be 0.5, same, n3.7.1. So we want to focus on this one, which n3.7.1. And okay, here we go. So we have this metrics, this train and this validation. For this, we are not interested in logging this. So that's it. Let's take this off now and then get back. So you see here, we have the learning rate. We have the epoch accuracy. We have this true negatives. We have the recall and that's it. So here we'll see now how to create this directories, which are dependent on the current date and time. Now the next step we'll be doing is how to actually do this logs we doubt or when we're doing a custom training. So let's get back to where we did this custom training loop. We had this custom training loop and then we have this feed method, which comes directly with tensorflow. So what if we now try to use, or what if we actually use the custom training loop and we do not have the possibility of just simply saying, okay, callbacks, tensorback, callback, and then the job is done. And what if we just have this custom training loop? In this case, we're gonna use exactly the same process we've just followed here. So we're just gonna create this file writer. So let's copy all this. And then just as we did here, actually, we're just gonna write in this scaler values and then create the scaler, put in the data and then specify the step. So that's basically how we're gonna function. Now let's get back to this custom training loop. We're gonna add this code cell. And then in here, we have current time as usual. We have now, let's call this custom directory. We have logs, current time, and then we have custom. It's called this custom train writer. Let's call this custom train writer. Custom train writer. And you can also define a custom validation writer. We could have that too. So let's have here custom, custom directory. And in here, we will specify also train. So notice that since you were using this feat, what we got was the immediate automatically, we automatically got this train and validation. So what happens in the background is these two file writers are created. That is the train and the validation. And we're just gonna do exactly that here. We have custom. And then let's say we have custom train directory. Custom train and then custom validation. Okay, so we have that. And then here we have custom validation. Now we specify our writer. We have custom train and custom validation. Then here we have custom train. Custom train, custom validation. Okay, so I think this is okay. We could now run this. And then let's copy out this code we had put out here in the section and the shadow. So let's simply copy this out. You see how easy it becomes when you have already done this. So yeah, you now have to say, instead of just only printing this out, you let's have this. So with our, let's get a name from here with our custom train writer. And then here we have with our custom train writer. Custom train writer. We have the loss. So we have the loss. Let's call it train loss. We have the training loss. We have the data. The data is now this loss here. It's this loss. So we have the data which is passed, which is now the loss. And then the step is the epoch we have here. So that's it. Okay, we've logged this. Let's now go ahead and log for the accuracy. So we'll paste this out. And then we have training accuracy. Accuracy. And then the accuracy. So we kind of paste out. We will see metric the results. Metric the results. Okay. Metric the results. And then we have the step specified. So this is for the training process. We could separate this block and that's it. Okay, so we've done this. We now simply copy this out and then do the same for the validation. So in here, instead of writing this out like this, we could simply put out your custom val. We have custom val. Take this off. Here we have validation. Validation loss. Here we have the loss, but this is loss of val. Loss of val. That's fine. We have metric val. Metric val. That's fine. We have validation accuracy. Validation. Validation accuracy. Now you have this too. Take this off. And then we have the val. Okay, so that sounds fine. Everything looks okay. We could run this here. So let's run this. We run this. And then we run neural learn. And then we start with a training. Training now complete. Let's go ahead and see what we have. You could check out this logs. You see, we have our values now locked in here. Custom, train and validation. So we have this locked. We now go ahead and rerun this tensor board. As you could see, you have all these values here. You could, as usual, let's take this off, toggle our runs. And then let's pick this very last one. So we pick this last one and also pick out this one because there's a train and this is the validation. Then we come right here and check out the training accuracy. Train loss. There we go. We have train accuracy. We have train loss. If we do this, you see, we have, okay, we have the train accuracy. We have the train loss. We have the validation accuracy and we have the validation loss. Now, if we want to take off all the information stored in the logs, we could have this command. So we remove all this information and that will specify the logs. So we run this and then open it up this, you see, you don't have the logs folder anymore. At this point, we'll go ahead and see how to display image data with tensor bot. So unlike previously where we've been displaying information like the loss, the different metrics, now we'll see how to implement or rather, we're going to display image data like the confusion metrics we had seen previously. Let's get back here. We have this confusion metrics right here. And what we'll do now is after each epoch, we are going to display this confusion metrics with tensor bot. That said, we're going to copy out all this code we used in displaying this confusion metrics right here. So we have this code. And then we have this log images callback right here with this on epoch end method. Then in this method, we are going to paste out this code we used in visualizing the confusion metrics previously. So here we have this level input right up to this. We have the confusion metrics based on the threshold. And then we are going to visualize this confusion metrics. But now since we're working with a callback, what we will do is at the end of each and every epoch, we are going to display this with tensor bot. Now that this is set, we are going to, for now we've actually just been able to visualize this, but how do we put this, or how do we make this work with tensor bot? What we're going to have here is we create a buffer. We have this buffer IO dot bytes IO. Here we have bytes IO. And then, so that's our buffer. We are going to save this image, the confusion metrics image in this buffer. So we have this PLT dot save fig, and then we save it in that buffer. And that will specify that the format should be PNG. So we have the PNG format and that's okay. So now we have this buffer. We've saved that into our buffer. The next step we'll take is create an image of this buffer. So from here, we have this image. Let's take this up. We have this image. We use the tensorflow image decode PNG method, which takes in this buffer. So we have our buffer get value, number of channels equal to three. That's it. We have this image. And then once we get this image, we then write this in tensor bot. So we have this image writer, which we've created right here, similar to what we've done already. We create, we use this create file writer. Let's modify this and have your image directory. So we have the image directory, and then we create this file writer, or rather we create this image writer. So from with this image writer's default, what we're going to do now is, instead of having this summary dot scaler as we used to have here, now we're going to use summary dot image. So you could see here that tensor bot permits us not only input or write scalers, but also images. And that's basically all we needed to do here. So let's have this and then run the cell. So we make sure we have this run, run this, we run this. And then we, okay, we have this log images call back here. Copy that. Now that's copied, we have to reduce this one. And then right here, we run this metrics and then compile and run this. We get this arrow where we said, where we have this value arrow, no step set. So let's get back to this call back. And then we specify the step. So right here, we have the step, step equal the epoch. So we run that again, and this should be fine. Training is going on and then the image data has been logged into tensor bot at the end of each epoch. Training now done, we could go ahead and then run this on tensor bot. Once training done, we could now visualize this confusion matrices on tensor bot. So we run this to cells and here's what we get. You'd see here step zero, step one, and then step two. This is because we actually run this for three epochs. So we could notice how, let's come back to the top. We notice how here we have 53. And then as we keep training, these drops to 35. And then finally here, let's go down a little. Finally here, we have 240. So this tells us that the last epoch wasn't helpful in improving the number of false negatives. You could see also even with the validation that here we had 16 false negatives, eight false negatives. And then this value rose up to 111 false negatives. So it's kind of similar to what we have with the test data, which is exactly what we're logging in the tensor bot. And now that you know how to log in image data with tensor bot from the example on the confusion metrics, what you could do is log in directly this ROC plots. You could also log in data like this one right here, where on the test data, you're going to put out the actual value and what the model predicts. And so that's it for the session on logging in image data. Now let's move on to visualizing model graphs with tensor bot. To visualize a graph, we're going to rerun this command to delete all the logs we've stored so far. And then we run this chance about callback once more. So we have that and then let's get back to metrics. We run this and that's fine. Now we have the training done. Let's go ahead and rerun this again. So we run this two cells again. And as expected, here's what we get. So yeah, we have the skillers. And then if you click on this graphs here, or to have this graph coming up, let's reduce this slightly. Okay, so we have this, notice that the graph tab here is up graph. And this actually means we're viewing the graph from the operations level. If you could come right here and zoom in, you see you have the conf layer. You have the conf layer again, dense layer, dense layer, and then zoom in this other way. Here you see you have the Adams optimizer. Let's double click here. Where you have this plus you double click. And then you get to see exactly what goes on in this Adam optimizer. Let's reduce that, zoom in, and then let's double click again to reduce this. Okay, so here you have that. And for this dense layer, you could double click. You see, you have that. You see, you have this kennel, you double click to better understand what goes on. And as you can see here, we have this regularizer. So it's a regularizer we had defined previously. Now let's reduce this by double clicking. Let's scroll here. We double click that, reduced, and then double click this. So basically it's hands to hands are barred. We able to visualize exactly what goes on under the hood when tensorflow creates this graphs, which in turn permeate us do computations even faster. Now, another way we could look at this is by coming right here, the stack and selecting Keras. Once we select this Keras, instead of having the operation graph we had just seen, we now have this conceptual graph. We'll see that this is going to be quite easy compared to our easy to understand compared to what we had done previously. So yeah, this focuses on the Keras model. We had built a Keras sequential model, we had specified. And here what we have is, you see the input. So unlike previously where we had stuff like the Adam Optimizer, the different metrics and the say loss computations with the up graph, here we have just the Keras model. So here you have the input com batch normalization, max pooling, drop out com batch normalization, max pooling, flatten, dense, batch norm, drop out dense, batch norm and finally dense. So that's our conceptual graph. And that's it for the section on graphs. You see, you get to understand exactly what goes on under the hood, thanks to this visualization or better to this graphs visualization made available with tensor board. Now, as we go ahead with building these models and then training them, you may sometimes wonder why this value six, for example, was picked, why the kennel size three was picked, maybe why the 16 was picked year, why 100 was picked, why not say 32, why not say a value like 100 year, why others values were picked. And then looking at the drop out rate, what makes us pick a given drop out rate, what makes us pick a given regularization rate and so on and so forth. Now, although in this particular case, we're building this model based on the Lynette model, which is some like a model, which has already been built and tested, we'll see another technique known as hyperparameter tuning, where we'll be able to select the best values for these different hyperparameters automatically. Now, this hyperparameters will not be only this, which we've picked out here. We could also have hyperparameters like the learning rate, like even the choice of this optimizer and so on and so forth. And this way of deciding this best parameters for our model and model training is known as hyperparameter tuning. Now, we're going to see how to do or carry out hyperparameter tuning with TensorBoard. So here, first in first, we carry out those imports. We have from TensorBoard plugins, HParams, hyperparameters. We're going to import API as HP. Then right here, we're going to redefine this model or we're going to restructure it. So instead of having just the regularization rate drop our rate given to us like this, what we're going to have here is here, for example, instead of having this drop our rate, we'll have HParams. And then we'll have HP drop out. And the aim of this process is to ensure that if we have this, suppose we have this model here, and then after training, we have an output accuracy, we want to be able to modify the parameters which make up this model and see how they affect the accuracy such that we are going to pick out the best parameters or we're going to pick out the parameters which permit us get the best possible accuracy. So that said, instead of having the drop out, we just have this variable drop out. And then for the regularization, we're going to do same. So you're going to have HParams. Instead of regularization rate, we're going to have this. And then we have HP regularization rate. Then apart from this regularization rate and drop out hyper parameters, we're going to include just right here. Let's start with this one. So let's copy out this. We have HParams and then we have HP number of units. So here we have number of units and then we'll call this one because just after this, we're going to have number of units too. So here we are going to be able to pick out the best possible values for this hyper parameter here. So let's have this here too. Okay, so that's it. The other hyper parameters will be included when doing the model compilation. So let's include this model compilation and then we're going to carry out the training in here. So we have this, let's take out the summary and then take this back. Okay, so that's it. Now we have this model compilation right here and then we have the optimizer. But what we're going to do here is we're going to fix this optimizer. So we're going to have the item optimizer and then we'll specify the learning rate to be HParams. And then we have your HP learning rate, learning rate. Okay, so there we go. This looks fine. We have the last binary cross entropy metrics accuracy. And then we do the model training. So once we do this model training, we are going to get the accuracy that we're going to train the model. We're going to evaluate the model. We obtain the accuracy and then TensorFlow will permit us modify all these different parameters such that we now get the parameters or the specific values which permit us maximize this accuracy. So here we're going to create a function which is going to return the accuracy. Let's have this to the right and that's it. We have this method which we'll call model tune, call it model tune. And then it's going to take HParams. So it's going to take this HParams and then it's going to like take different value for HParams. And then based on the accuracy, we're going to know after this different training steps which values for this hyper parameters best, gives us the best results. So that's it. We have that. And then we return the accuracy right here. Then we're going to define the range of values this different hyper parameters can take. So like this one, let's copy this out. Let's say we want to get this HP number of units one. Paste this out here. We're going to have hp.hpparams. There we go. And then we have the name specified. So we have number of units one. And we also specify now this range. So we have hp.discrete. So let's have discrete. And then let's say we want to pick out those values between the range, say 10 to 100. Now let's take this powers of two. So we have 16, 32, 64, 128. Okay. So that's it for the first. And then for this next one, we have two right here. Let's have that. And yeah, units two, there we go. We have that fine. Let's copy this again. And then we repeat the same for the dropout. So here we have dropout, take this off. And then here we have dropout. For the dropout, we're going to take values between 0.1, 0.2, 0.3 and 0.4. Anyway, let's say we want to take between 0.1 and 0.3. And then from the dropout, we have the regularization rate. So make sure you see that we've taken typical values. So your typical values for this number of units will be this and for the dropout will be this for the regularization, regularization rates will have different range of typical values. So here we will have 0.00, yeah, say 0.001, 0.01 and then let's say 0.1. So that's it. What's next? We have the learning rate. So let's copy this out, HP learning rates. Here we have one E negative four, one E negative three. Okay, so that's it for the learning rates. And that should be all. So we have nothing left actually. So that's fine. So there we go. We've defined all those different ranges. So we're going to search in this range, search in this range, search in this range, search in this range, search in this range. Now, notice that this is actually discrete value. So it's not like we're going to pick between this value and this value. We're just going to basically pick this value or this value or this value or this outer value. And now to perform this grid search, we are going to go for number of units, one in this range that we've specified here. So we're going to have in HP num units one, there we go, dot domain dot values. Here we have values, there we go. And then we're going to do the same for this. So we have for num units two in HP num units two, HP num units two, dot domain dot values. And then for the dropout rate and then the regularization rate in the regularization domain, learning rate in the learning rate domain. We are now going to create this HP RAMS dictionary. So we have HP RAMS right here, which is going to be our dictionary and it's going to contain the different values here. So we will start with this HP num units one. And then what we're going to put in here is this value we have here. So the value we pick from this domain is what we're going to pass in here. So we have num units one, one, there we go. We now repeat the same process. So we just have your two and then the rest. And there we go, we have the remaining parameters. And then from here, we are going to create a file writer. So for each run, because here we're going to have different runs for the different values. So for each run, we're going to define the file writer. So here we have the file writer, TF summary, create file writer, just as we've seen already. So here we have the file writer and then we specify this directory. Now we want the directory to have different names based on the exact run number. So yesterday we want to have run number here, which initialize to zero. And then we have this. So let's have this logs slash, let's just put in the run number in your string. We have the run number. Okay, so that's it. We have that and then after each and every run, we want to increment this run number. So yeah, we're going to increment the run number, run number equal, plus equal one, plus equal one. Okay, so that's it. Now we've defined this file writer. That looks fine. Next thing we want to do is with this file writer as default. So we have this default as default. We'll now log in the current hyper parameters. So we have hp.hperrams and then we pass in this current hyper parameters. Recall, we are going to sweep through all these different values and then for a particular run, we want to know exactly what we're passing in. And that's what we're going to pass in here. So we have hp.hperrams, that's fine. And then once we've notified Tensor Bar that this is the parameters or the hyper parameters we're working with, the next thing we want to do is to pass this hyper parameters in this model tune method right here. So here we're going to pass in now this hp.hperrams, this hp.hperrams which we've just defined here, which is a function of these different values we are going to be sweeping through. So here we're going to have a model tune, model tune, which is going to take in the hp.hperrams and then it's going to output, like from here it's going to output this accuracy. So here we're going to have the accuracy. So we have the accuracy, which is that. And then once we get this accuracy, we are going to log in its value. So we have tf.summary, scalar. So log in the scalar, accuracy, accuracy, and then we have the accuracy value logged in. So that looks fine. We have this run number. Why do we have a red here? That's actually okay now. So let's get back to this. Now we could run this cell. We run this cell. Everything looks fine. And then we now go straight into this hyperparameter tuning where we are going to pick out the best values for our hyperparameters, which actually give us the best accuracy. So for each step, now we want to let's say print out this. So we'll print out the hp.hperrams. So our hp.hperrams is, so we have hp.hperrams, there we go. And then we also want to print out the current run number. So what we're going to have here is the hp.hperrams are for the run, let's specify the run number. Our hp.hperrams is that, so our hp.hperrams is this. And then we format, so we have that. And then we have the run number, there we go. So let's have this here and then I'll run this cell. So here's what we get. We also modified this code here to contain each and every value in this hp.hperrams. So we get this kind of output and we make this run for like, let's count this, for like 287 times. So we have 287 different runs. And then from here we run the cell and then run this too. So from here we have the scalars, you have the different accuracy to get. Anyway, we're not very much interested in this for now because we're interested in looking at this hp.hperrams here. Now let's start with this table view. You could see here that with this table view we could see the different values, the different hyper parameters we made use of or we tuned throughout this process. So you could see, let's scroll up a little. Here you see we have the regularization rate, number of units, drop out rates. And then let's scroll down again and then go this way. You have, okay, so we have number of units too, learning rate and then accuracy. So you see here that the accuracy changes based on the different values we have for this different hyper parameters here. Now, another better way of looking at this is this parallel coordinates view. As you could see in this parallel coordinates view, you get to like pick out the highest accuracy value. And then if you follow this path, you get to see that taking a learning rate of 0.0001 and then here, let's just look at this part. So we see, notice this red part here, which you can actually turn to green by clicking on it. So you see it turns to green, you have 0.0001 and then the number of units too, it's like around 32. Here we have drop our rate 0.2, number of units one, 32, regularization rate 0.01. So these are the best hyper parameter values we have and we can now make use of this information to better create our model. So now instead of having a model like this, so now we will modify this and then instead of having, let's get back to what we had up here. So what we're saying is instead of having this, here where we had 10, now we could take this to 32 and then right here to 32, then we could pick the regularization rate to 0.01. So I have the 0.01 drop out rate. We didn't take, we didn't not drop our rate. Let's get back to that. The drop out rates, 0.1, no, no, no, 0.2 actually. So drop out rate 0.2, that looks fine. Learning rate 0.0001, so that's it. So we now see how to better pick these values thanks to this hyper parameter tuning, which we can do easily with tensor board. So let's get back to this and then we have 0.2 right here. Now we should be noted that the method we've used so far is grid search, that is we've, specified those different ranges and then we've some sort of search through those different ranges and then we've gotten the best possible parameters which maximize the accuracy value. Now another very popular technique is the random search. With a random search, we'll just define a range and then pick our random values in that range. So unlike the grid search where we have some defined or predefined values which we have to search through, with a random search, we kind of like pick values at random. But then they both have the advantages and disadvantages. And one thing you could take out from this is that with a grid search, since you have a fixed range, so you have a fixed range of values which you can pick from, it means if this fixed range of values is very large, it becomes problematic as this process of searching is going to take a very long time. So you may take a very long time before finding your best hyperparameter values, which we connect. Whereas for the random search, since we're not going in a particular order, it happens that we may even get the best hyperparameter values after just one step. So after just one run or a few runs, we may already get the best hyperparameter values just like we may get these values after 200 different runs. So if you have enough computation power, you could make use of the grid search since you're more sure of the different values you're going to be picking from. And if not, the random search will be a better option. Another very important aspect of TensorBoard is TensorBoard's profiler. With TensorBoard's profiler, we're able to evaluate the TensorFlow code and based on this evaluation, modify this code to ensure that it runs as efficiently as possible. That said, to make use of the profiler, we're going to start by installing this TensorBoard profiler plugin. So we have your pip install this, you run this cell. The profile plugin now installed, we now go ahead and run the cells. Right here, we're going to have this profile batch. So notice that we have this profile batch right here. And this value specified here has to do with the range of batches to be profiled. And so here we could take say 100 to let's say 132. Okay, so that's it. So we have that said, we are now around the cell too. Looks fine. And then we could start with the training. We have this TensorBoard callback which has been added here. So we run this to start with the training. Training now complete. We go ahead and run these two cells. Just as expected, we have the scalars, we have the graphs, we have these distributions, we have these histograms, time series. And then in here, we have this profile. And you're in this overview page. So you have different pages in this overview page. We have this performance summary. We're given the average step time, all the time, compilation time, output time, input time. Notice how this input time is relatively larger than the others. And you'll see that, or you'll notice that here, the input time occupies the largest part of the step time. So that's why you have a recommendation to first focus on reducing this input time. And then we have 7.5% of total step time sampled is spent and cannot launch. It could be due to CPU contention with TF data. In this case, you may try to set the environment variable TF GPU trend mode to GPU private. Then we have 6.6% of the total step time sample is spent on all other times. This could be due to Python execution overhead. Only 0% of device computation is 16 bit. So you might want to replace more 32 bit operations by 16 bit operations to improve performance. So this actually means we could make use of mixed precision training. Now we're going to look at mixed precision training in subsequent sections. We also have this other tools we could use for reducing the input time. This input pipeline analyzer. You could click here. You have this input pipeline analyzer. Now notice that you have these different tools here. So we have the overview page. Let's go back to overview page. And then here we have this input pipeline analyzer. Let's scroll down here. We have the input pipeline analyzer. We have this TF data bottleneck analysis. You could click on this. You see we have this TF bottleneck analysis. And then you let's get back to the overview again. Scroll up. And then you have this trace viewer right here. So you would also find this trace viewer in here. You could scroll here and you have the trace viewer. Now get into summary of the input pipeline analysis. We get to see exactly the breakdown of the input process and time on the host because we've seen already from here that the input processing time is kind of taking up close to 68.4% of the total step time. So here you could see we have the data processing, which is like the main reason why this input processing time is that large. And we have this different steps we could take. So what can be done to reduce above components of the host input time in current data? That is, you may want to combine small input data chance into fewer of the larger chunks. Data processing. You may increase number of parallel calls in the dataset map or preprocess the data offline. Reading data from files in advance. Reading data from files in demand. Other data reading or processing. And then here we have a more detailed input operation statistics. So let's click on this. Scroll up and then you could have this statistics given to you right here. So as you could see, we know exactly why our input processing is taking up much time. And based on this different suggestions, we could reduce this time. Now from here we have the kennel stats. Then we also have this memory profile. And then we have the part viewer. From this part viewer we have the TensorFlow stats. And then here we have this TensorFlow data bottleneck analysis. You could also look at this from here. You see we have the root prefetch. Look at the self duration here. 10 microseconds, 36 microseconds. See this one now is very large. So here's our bottleneck, a level of the mapping and batching. Shuffling, just 118, 373. Pre-fetching and so on and so forth. So we now know that the problem comes from the mapping as we have seen previously. Now selecting this trace viewer, we have this year. And we'll make use of this year. You could carry this around by clicking on this. And then you have the arrow. You have this to pull this from place to place. You have the zoom. So you have the pan. You have the zoom. You have this timing. So let's click on the zoom and then you see click. You click and you drag to the top. You could zoom in and zoom out. Now you notice the string. We have values from 100. Like let's get back to what we had defined previously. Here we have 100 to 132. And that's why you notice here we have the string from 100 to 132 to. Getting back here, you can zoom. See you have that. And then you could click here on the pan and then you pull this to one side. Now stopping right here, you could zoom again. And then you get to see all those different operations carried out during a single process. All those different operations carried out during a single training step. And with this timing tool you click on this timing tool. You can be able to like let's zoom this again and zoom and then drag this here. Zoom again. As we're saying measure the timing for a given operation. So once you click here, you just click and then you see you drag and you can measure the timing for different operations. You see you have the time here. 166.5 microseconds. And that's it. We now go on the distributions. Here we have batch normalization, batch normalization. Let's check out on this comp 2D. You see that different biases and weights. You see the kernel here is the weights have values which fall under this zone. The values fall under. The values are between 0.3, negative 0.3, 0.3. For the biases between negative 1.2, about 0.4. And then for comp this dense, we have this other dense layer here. This comp 2D2, which you could see here. See the range of values. And then we have the dense 1. We have the dense 2. We also have the different value ranges for both the kennels and the biases. So that's it for this distributions. We have the histograms too. You could check this out. Time series. And the way this time series is kind of similar to information we've seen already. So here we have this all. We could select just the scalars. So here we have the epoch loss. We have the different evaluation accuracy and so on and so forth. Now click on images. You select only images. Nothing to be shown. See that's why you have no information here. Click on histogram and then you have the histogram data we've just seen already. That's it for the section of TensorBot. TensorBot has other functionalities which we shall explore subsequently. And thank you for getting right up to this point. Hello everyone and welcome to another amazing session in which we are going to see how to work with weights and biases and integrate it with our already existing TensorFlow code base. Weights and biases help practitioners in experiment tracking, collaborative reports, dataset and model versioning, interactive data visualization, and hyperparameter optimization. It's trusted by over 100,000 plus machine learning practitioners around the world. In this session, we are going to focus on experiment tracking. It's one thing to build a model, train this model and evaluate it on a given dataset. And at least as we've seen throughout this course, this is pretty easy with TensorFlow. When we work in large teams and we have to collaborate, we have to produce reproducible results. We need to debug those ML models as a team and we also need to enforce transparency. Then a machine learning operations platform like weights and biases becomes indispensable. Weights and biases permits us to build better models faster with experiment tracking, data versioning and model management. As of now, the different products which Weights and Biases offers to us are experiment tracking, reporting that's producing collaborative dashboards, artifacts, dataset and model versioning, just like how you would do code versioning in Git, interactive data visualization, and hyperparameter optimization. Also, you could see here that this has been used by modern 100,000 machine learning practitioners around the world. Some key aspects of the Weights and Biases tool are the fact that you could integrate very quickly. You see that with Keras. So here we're supposing we are building a Keras model, a TensorFlow Keras model, and that all you need to do is to import this 1DB callback right here, start a new run as it's given here. So if you have some configurations, you set those configurations and then in the place of the callbacks, you simply pass this Weights and Biases callback which we imported right here. So it's quite easy to integrate with already existing frameworks. Now you see with any framework, you just need to do 1db.log and you could log any information you want to log. We could also visualize useful information very seamlessly and then we could collaborate in real time. So if you're working on a project, you could all as part of the team discuss the project's progression and see how to eliminate any bugs or any problems. Now Weights and Biases is designed for all use cases. So here we have a practitioner that's supposing just a single person. You have this dashboard, central dashboard. You see this hyper parameter sweeps artifacts. You could do dataset and model versioning just like you would do with GitHub code reports to share updates very transparently. Throughout this course, we'll look at these different products and in this section, we'll focus on experiments tracking. We're now going to go straight forward into signing up. So we click right here. We want to sign up with GitHub. Click here. And then we authorize 1db to get access to our neural learn account. So authorize 1db. We have the full name, organization, neural learn, and that's fine. Okay, so I agree to the terms and conditions. You could always read out the terms and conditions very carefully. And then from here, we also have to put in this username. So let's say neural learn. Okay, so we continue. How often do you train models? Let's say every week. So we have that and then get started. Here is now a homepage. Here you could create a new project. You could modify a profile, invite your team. You have this documentation. Click right here. Docs.1db.ai with the different guides, references, which you could always make use of in case you are having any difficulties or as a starter, you want to master how all those work. Then you also have this fully connected right here, which brings ML practitioners together. Now here you have curated tutorials, conversations with industry leaders, deep dives into newest ML research and a whole lot more. So you would always have this information or this curated information at your disposal. Then we also have the community and then we have this quick start for the different frameworks. So we could view all frameworks here. Getting back, you see we have PyTorch Keras. We're working with Keras. Click on Keras. You would have this quick start for Keras users, which we actually. So here you see how easy it is to get started with 1db in Keras. The very first step will be to install and log into 1db. So we'll simply copy this and then get back to our Clap Notebook. We paste this out right here and then run this cell. We have the syntax error. Let's have this and then we run. There we go. As you can see, 1db has been installed. From here, we are going to get into the login. Now you see you can find your API key in your browser here. So we could click on this link, get the API key and then paste it out here. Now let's get back to this and then copy out this API key. There we go. We have that and then we paste it out here. So we paste it out there or you could click on this link and then still get the key, which we can paste in here and then simply press and enter. So we hit enter and we should be able to log in since we have this key put in right here. Now moving on to the next step at the top of your training script, start a new run and to start this new run, we are going to make use of this init method. Getting back to the documentation, you have your run is a unit of computation locked by 1db. Typically, this is an ML experiment. So we create a run with the 1db init. So before moving on, you should note that you could create a project and in this project, you have several runs. Now one run could be for training. So we could have a training run like some sort of ML experiment as defined here and then we have evaluation and you could have other different runs or other different training processes which will act as different runs. Here, as you can see, when you just import 1db, the run is known and now you have 1db init. Once you make a call on this method, you have a run which is automatically created. And so everything you're going to log in into 1db will be sent to that particular run. So if you have say this project, let's call this project malaria prediction project. If you have a malaria prediction project and then we have this training run, everything like once we create this run, everything we log in will be sent into this particular run right here. And then if we create another run, that's evaluation run, everything we log in will be stored in this particular run here. So getting back here, you see you could create this run and you could finish or you could stop that run. And that's why once you have this init, you've created a run, you stop the run and then here normally this should be known as there should be no run going, no run created. Now, after doing this, you could also create this like here, like 1db init, you could put this in a with block. So we have with 1db init as run, then you now have all the data to be logged in here such that out of this with block, you have no run. From here, you could check out on the different attributes and then the information related to this attributes. Now let's get into this 1db init. 1db init you have as definition with all the different arguments it takes and then you have this information concerning all those different arguments. So here we could define like we have this run, which we could create by simply doing 1db init and then we'll specify the project, specify the entity, I would say neural learn, the project malaria detection, the configuration, all the information will be needed. Let's say for training, we specify a safe code to permit 1db safe code. By default, this is actually false. So by default, your code is not going to be saved to 1db. You could check this out here. Let's have safe code. Let's search that. Save code. Go up and there we go. So by default, we don't allow this and you could flip this behavior by going to the settings page. Now you could check on all these other arguments. You have this job type argumental, which is also very, very important and this is very useful when your grouping runs together into larger experiments using this group argument right here. Now that said, let's copy this part of this code and then get back to our notebook. We will notice that we have this import. So we would have to finish, put out this import here. Let's have this. We have import 1db from 1db Keras, import 1db callback. Let's take this off. So we put this tool here and then we run this cell. That should be fine. Now let's run. We scroll down. We have the model. Let's run this again and then let's actually get back here and create this run. So let's have this here. So 1db init. We're going to specify the project. We're going to create this. So this is 1db install login and install login and initialization. So we have that. Now, as we've said already, this is going to permit us create a run. So here let's have malaria detection, malaria detection and one way the like to write this out, like prefer to put this hyphen instead of the space. So let's have that that way. So we have your malaria detection, entity in neural learn and let's have that for now. So let's run the cell and then let's add this code below here. Currently login is neural learn and that's fine. From here, let's now do 1db run. See what we get. From here, we see that we have no metrics locked yet. So that's fine. We could now take this off and then go straight away to add our 1db callback to tensorflow. Before that, we have saved model inputs and hyperparameters. So 1db config, let's have this copied and then we have that before defining the model. Let's copy that and then just here, we'll add this code cell again and then let's have your initialization configuration and there we go. Okay, so we have that. 1db config learning rate specified, number of epochs, batch size. We'll include the dropout rate, the image size, regularization rate, number of filters, canal size, number of strides, pool size, number of outputs for the first dense layer, number of outputs for the second dense layer. It was us finally now around the cell. So we've now started this configuration. Then the next and last step is to simply have this callback in this fit method. So let's copy this out and then get back. So we have that, we've run our model. Let's be sure we've run this model and just here, let's have this configuration. Let's say we have our configuration, configuration equal 1db.config. And then right here, we have configuration and image size. And we have same for the other hyper parameters like the dropout rate, regularization rate, number of filters, canal size, pool size, and number of strides. We now go ahead to the training process. So we copy this out and paste just right here. So here we would have this callback. Let's take this off. Instead of this TensorFlow, TensorBoard callback, we now have this 1db callback. So we have this 1db callback. And you could always even include the TensorBoard callback. So you could also have the TensorBoard callback we had previously. Now there we go. We have this 1db callback. Let's have that. And then we run this cell for metrics. We compile the model. All the models compiled. Let's include this learning rate. So let's have this configuration and then specify the learning rates. So we have that learning rate there. And then we compile the model and start with the training. Now the training is complete. We could go to our weights and bias dashboard right here and see exactly what went on during the training process. So here we have these projects. You could see in your learn, we have the projects. We click on this malaria prediction project. And then we select this run. The run we selected here is the sandy water run. So there we go. Now you see we have 19 different chats. AUC, validation loss, false positive, false positive. Well, this is epoch versus epoch. So that's why you have this kind of straight line. And then you have this precision recall loss to negative to positive accuracy. In fact, all this is basically gotten from all those metrics we had here. So this means that with this simple callback we have put right here, weights and biases is able to get all this or capture all this information during the training process and give it to us or present it to us after we're done with the training. So here we can see those different charts with what we already used to see in this already. So you should be already familiar with these different chats. Now, apart from these charts, you have the system information. So here we have the CPU utilization. We have the system memory utilization, process memory news, process memory news. While this is megabytes, here we have this information in the percentage with respect to the total memory available. Then here we have process memory available, process CPU trace and use, disk utilization, network traffic, GPU utilization, GPU temperature. So you could have this information here, GPU time spent, GPU memory allocated, GPU power usage. And so you see how easy it is to get all this system information without writing any extra line of code. Now from here we could go on to the model. You see, we have this table right here, which shows our model. It's kind of similar to the model summary we had pulled out previously. So yeah, it's kind of similar to what we had here. So now we have, instead of that, we have this beautiful table right here, output shape, number of parameters, type and the name. So that's it. And then from here we have this logs. So we have everything that was logged out, like all we had here. So you see everything logged out. We have it in this run. Now recall that when doing this 1DB init last year, once we do this 1DB init, we actually create a run. And once that run is created, everything we do after that is going to be stored in that run. And that's why if you notice here, like you don't only have this information, not only the string information, but even this data which was logged, even this model because we run this several times. So you see here that even this information which was logged is actually stored by 1DB. And so you see that experiment tracking here is done very easily and actually seamlessly. Now you look at the files will start. You have the model best. See you have automatically this Keras model file will just start. You have this metadata. You could open this up and you have this information start. You see here that we're using a Tesla P100 GPU. So that's it. Python version, operating system version, and then other information like the GPU count, CPU count, and so on and so forth. So that's it. Let's get back to this overview. See it's still running. That's because we have our Collab Notebook still running right here. Now one thing you could always count on is this documentation right here. So you just come straight to this integration and then you pick out Keras. You would have your Keras and then you have this 1DB Keras 1DB callback. And right here you have this. Now you could see this arguments of those different arguments we have here and you have the explicit definitions here. So let's check out on this arguments. You see we have monitor the validation loss, which plays a similar role to this model checkpoints callback right here. And also we could specify the mode. So let's get back to the documentation where we are. Okay, so we get back to this mode. So you could pick out the mode by default is automatic. We could also select mean or max. In the case of validation loss obviously would select a mean and the case of if we are dealing with a validation accuracy or validation precision or recall, then we'll select the mode to max such that we're saving our model when we have the maximum precision or maximum recall for example. Here we have the safe model, safe graph, safe weights only, lock weights, gradients, training data, validation data. Now we should know that with this we're able to pass in our data set to 1DB. And the fact that we pass in this data to 1DB can permit us come up with some visualizations of what the model is predicting since now we have the data. Then we also have the generator, validation steps, levels, predictions, input type, and so on and so forth. So you could always check out on this. Now the next thing we'll be doing is take the validation data, pass into this 1DB callback right here, and then be able to visualize the different predictions as we go through the training process. So let's break this up a little. We have this training validation, number of epochs. This should actually be configuration, configuration, number of epochs, number of epochs. Okay, so we have that. And then we have the verbosity callbacks. Now in this 1DB callback right here we have validation, validation data, which is all of our data set. And then from here, let's check out the documentation. And then we have the levels and the data type. So let's specify the levels. From here we have levels. And the levels we have here, we have parasitized and uninfected. Anyway, we could create this like we could have the classes. So we could have your levels and then specify this. So let's just take this off from here. Cut that and then paste it out here. Okay, so we have that and then let's have this levels. Now we have the level. We could also specify the data type. So that said, we specify this to be image. Okay, we have that, number of epochs and all of that. Let's go ahead and modify this number of epochs here. So number of epochs, 100. Let's change this. Let's just have three epochs for now. So let's get back 1DB config. We should have this up here. So once we make this, we get this new configuration. So we run that, looks fine. We get back to the training. From here now, we could run this training again. Training complete. Let's go ahead and check out what has been logged in our dashboard. So let's have this here. We pick up this run. There we go. We see we have still our 19 different charts. And then we have this media, which has been added here. We click on this and you see what we get. As you could see here, we have the different images and the predictions. We should view full screen mode by doing this. And then just here, you could select the step. So if you pull this to the end, you see we have 36 steps, meaning we have trained for over 36 different epochs. Now, selecting this malaria detection project and having all these three different runs right here, you see you could view all the runs simultaneously. So I could click on this one. This is one run actually. This is another run. So I could view all this now simultaneously right here. See, we have that. You see the difference in GPU power usage with these two different runs. That is the sandy water to run and this exalted night to rerun. And then from here, if you want to stop a run, if you want to stop this current exalted night to rerun, what we could do is come right here. That is, let's say we have this code here. We could come right here and just simply put out 1db.finish. So we call this and we should be able to stop the current run. You can always check out on this 1db.run documentation right here to get more information about this. You see here this example given where the create the run and then stop the run. And then this shows that there is no current run. And then after recreating the run, you see that there is now a run. So this is simply how we could stop this run. And that's what we've just done here. We could simply run the cell. And after running the cell, you have this run summary. The accuracy and the other different metrics and loss values. Notice how the different metrics and losses have been put out in this command line formatting. So you could see how this loss, for example, here drops. And then getting back to our dashboard, we see that there is no current run. Then we've again recreated a new run with a 1db init right here. We've created this new run on this elevator. We could check it out here. We created this new run. And you can see this little green circle right here. Coming back to the notebook, we have this 1db callback, which we've used so far in logging information into weights and biases. But then this is limited because we are not yet able to define our own custom callbacks. Now that's set, let's get back to this callbacks we have defined here. Here we have this log images callback. We're going to actually try to log images into weights and biases using this callback, which we have created. Now recall that to create this kind of custom callbacks, you inherit from the callback class in terms of flow. And then you get to put in some code in here, which defines that custom callback. Now we've just copied this and paste it right here. The difference here is this is 1db. So this is what we had for tensor board and this one of 1db. You're going to see how easy it is, even though we're building these custom callbacks. So right here, so let's suppose we want to predict, rather we want to log this confusion matrices and we want to log this to weights and biases. Now what we'll do is we're going to take off all this here. So let's get back. This is what we did with tensor board. We saw that we had to make use of matplotlib and then log this information into tensor board. But here what we have or what we need to do to log the confusion matrix is simply just this part of this code right here. So here what we have is we'll take all this off. See, we take all this code right here off. Take this off. Let's have that. So we have this code right here taken off and then we're just left with this. So you see that instead of having to write all this code, all we need now to write is just this one. Now if you're wondering why or how to get the other plots or what other plots we could get automatically like this with weights and bias, we get into this weights and bias GitHub repository. And here we have, we have this link, 1DB client, TreeMaster, 1DB plot. You see all the different plots we have now. And since it's constantly under development, surely in some time to come, we will have many more of this different plots we could do very easily. So here we have a bar.py, confusion matrix. You see, we have the confusion matrix, histogram, line, line series, position recall curve, ROC curve and scatter PY. So this means that already we could plot out this confusion matrix and the ROC curve very easily with weights and bias. So let's get back to this. We'll check this out. First thing we have noticed is log, 1DB.log. So whenever you're using this 1DB callback here, what's actually happened, what happens under the hood is this information has been logged like this. So we have, we make use of this 1DB log and then here we are going to have, let's change this. Let's say confusion, confusion matrix. Anyway, let's write like this. Okay, here we have the confusion matrix and then you have 1DB plot confusion matrix. You could modify this and put PR curve, that's position recall curve or ROC curve. That's it. And then this props, then we have Y true, prets, the predictions and then the class names. So here we should change this class names and have parse size, parse size. And what do we have here? We have uninfected. And then you can either pass the probabilities or the predicted score. So here we are going to have props equal predicted since what our model predicts or what our model outputs are the probabilities. And then the Y true is equal levels. From here, we're going to take out the spits because we've done that already. So let's have this here. There we go. This doesn't take into consideration the threshold. So we take off this threshold. That said, now we have this callback 1DB and then we're ready to train our model. After the training process, we have those results which look great. And we could now go ahead to look at the confusion matrix loaded in the dashboard. So let's get to our dashboard and this is what we should have. Let's get back to runs and then click on this current run and this is what we get. Now we see that we have this confusion matrix which doesn't actually show us what we expect to get. And this is simply because the way this was conceived or the way this callback here, this login was conceived was such that we have a multi-class problem. So even in the case where we have a binary classification problem like in this case, we expect to have an output of two values. So that's why when you have this output, let's get down here. Click on this. Let's add this code. So that's why when you have the value like say in the output 0.9, it's considered to be parasitized although we defined already that parasitized is meant to be zero, uninfected equal one. And so when we have 0.9, this is greater than the midpoint of this two, which is 0.5. So we should consider this as uninfected. But the way as we said already, the way this has been conceived is such that even in the binary classification problem, we shouldn't or we don't have a single output but two outputs. That is if we have a value of 0.9, what 1 dB expects to see is something like this, something like 0 and 0.9. So showing that here, let's even put here zero one, even 0.8 and 0.2. Okay, so let's have this. So even in this case, it goes back to the same output. So this simply means that we are having an uninfected cell since this first index here has a higher value. Now to solve this issue, what we're going to do is, we're going to take each and every output we have and convert it to this format. So in the case where we have, for example, 0.1 as output, we're going to convert it into 1, 0 because this means that the higher, this one means that it's parasitized cell and this two means the same because here we're seeing that this zeroth index, which is the parasitized index, has a higher value. Hence, this is a parasitized cell. So that said, what we're going to do now is, for all values less than 0.5, we could say less than, yeah, 0.5, we are going to convert it into 0, sorry, we're going to convert it into 1, 0. And then for all values greater than 0.5, we're going to convert them into 0, 1. So this is the transformation we're going to make in order for this 1DB log method to correctly log our values. But note that if we're having a multi-classification problem, it will be needless doing this transformation. So let's get straight away and see how we're going to transform this into this required format right here. Let's start by copying out this first part. Let's copy this into this other cell here. And once we run this, we have this output right here. Then we can go ahead and modify this predicted. So what we now have is, we will define the spread, this other list. And then for i in range, range the length of predicted, we're going to make sure that if the value, the i, we're going to take this zero index. If it's less than 0.5, then what we have to do is append this to this list spread right here. So what we're going to append is 1, 0 because this is less than 0.5. So if it was a multi-class problem, we would have the highest or the higher value in this case since we just have in two classes, we have the higher value representing this output which was predicted, which happens to be less than 0.5. So we have here 1. And then here we'll have else, spread, append. What do we have next? We have 01. So here we have 01. Let's take this off, 01 and that's fine. So here we've modified this spread and then we could then print out the spread. So with this now, we should be able to have the expected output. There we go. We have this output. Let's add this code cell and then print out our pride shape. We run this and what do we get? You see, we have exactly what we expect. So that said, let's copy out this part from here and then we're going to add this here. So we have this pride now. Let's have this forward and then we're going to have pride equal np.array of pride. So convert it into a non-py array before passing into this log method console. That should be fine. Okay, so that's it. Now we have this set. Everything looks fine. Let's take this off and have that pride. So we have this pride now. Everything looks okay. And we could now rerun or restart our training. So let's have this year. Let's take this off. From here we train again for just about two epochs and then the results we get. So we can click on this run year and then you could have the tables and the custom chats. Let's start with the custom chats. You could expand this this way. So here you see that we have the predicted and the actual. So we have the model. You see now that we have this number of true let's say true positives which increases. Number of true negatives which also increase. While we have this number of false negatives to be 112. While here we have 62. This is 1278. 1305. And while you compare this with this previous run you see that now we have reasonable outputs. As with this previous run we had that error where the confusion matrix method of 1 dB considers all the outputs to be parasitized. Let's have this back and take this off. Then from here we could go ahead and to look at how to plot the ROC plots. Let's get back here. Scroll up. What we do is here we just have to let's comment this here and then now do the same for the ROC plots. Let's get back to the GitHub repo and click on this ROC curve here. Here we have now those arguments which have been described. So here we have why true why probabilities. Obviously this is the level predicted and then the classes, the class names. Classes to plot. That is a set of classes which we are going to plot. We are going to exclude all classes which aren't specified in this classes to plot argument. Now this is actually optional so you don't necessarily need to have this. We also have this title. So let's copy this out. Let's copy out this here. Let's copy this out and paste just right here. We then specify the levels why true levels. Here we have red and then right here we have this class names. So let's copy this out and paste here. OK, so that's what we have. Let's run this cell. Everything looks fine. And then we start with the training. Training now complete. Here are the results we obtained. We get back to our dashboard and this is what we have. You could go back to the runs and you see that you click on this custom chart right here. No, let's click on this run here first and then we have this two custom charts. So we have this previous confusion matrix which we locked in and then now we have this ROC curve which we just plotted out because we check out the tables for the ROC curve and the confusion matrix. So here we go. We have this ROC plot and then we have this two plots here where one is for the uninfected and the other is for the parasitized class. I haven't seen how to plot the confusion matrix and the ROC curve. We could check out the documentation and look at the other plots. So here we have basic charts, lines, color, bar chart, histogram, multiline and the next model evaluation charts like what we've just done, the confusion matrix, ROC curves, PR curves, that's precision recall curves. You have them here, interactive custom charts, matplotlib and plotlib plot. So this means you could actually come up with your own plot with matplotlib and then log that to 1db and then apart from this method in which we used the 1db callback here, not this one, the 1db callback we had defined previously, apart from using that 1db callback to log the loss, metrics and other information we could directly just do 1db dot log and then we say for example we want to log the loss, so we just put out loss and then since we got the loss from this locks, we have this locks here, we scroll down and then we put out your locks, locks and then we specify the loss. So this is all what it suffices to log this loss values. From here you could also log the accuracy and all the metrics. This actually a dictionary, so let's have that and there we go. So we've seen how to log these values. Now let's also see how to log images like with this, we have this image and then we log this to tensorboard. Now we'll see how to get this image and then log it to 1db. To log those images, we'll then check out your documentation and you'll see that you could log rich media files as really any kind of media file, 3D point clouds, molecules, HTML and histograms. So that said, you'll scroll down and you have this code here which permits us to log image data. So here's login arrays as images, login PIL images, login images from files, image overlays, segmentation masks, bound in boxes and so on and so forth. So you have this, you also have these histograms, you have 3D visualizations as you could see, point cloud and molecules. So you see that 1db actually permits just any practitioner be able to log any data or any kind of data they really want to log, hence making them more efficient. So that said, let's get back to this here. We're logging arrays. Let's copy this out. Let's click here. Just simply copy that way and then you get back to the code. So here, what we're saying is, we want to be able to log these images at this level. So let's paste this out somewhere here and then copy out this part again. We're going to copy this part. Recall that with this 1db we had defined here, we kind of like got, we ended at this level where we got the levels and the predicted and automatically we got this ROC curve and coefficient metrics. Now what we want to do is get right up to this point where we actually have the image that is our own image which we've created and then from this image log it to 1db and so right here we have this piece of code here and we're going to integrate this. One thing we could do again, let's copy this again. No, let's copy this other one because here we have the image. So we'll copy this again. We consider that this is for TensorBoard. Let's write here TensorBoard, TensorBoard. Okay, we have that for TensorBoard. We have this for 1db. Now 1db plot and then this next one is just 1db. Let's add this here. There we go. Okay, so here we have this 1db. That's fine. So what we're saying is we're going to take off all this here. So we take all this off and then we copy this out. Let's cut that out and then paste it here. So there's how we're going to log this to 1db. As usual it's going to always be simpler than what we have with the TensorBoard. So just all we need to do here is just to take our image and then log as simple as that. Now our image array here. Let's see this image array. Our image array is this output which we have and then we pass this image array into this caption. Let's put a caption. Let's say confusion metrics for epoch. And then we get the epoch from here. Like we have this epoch. So we're going to be logging this confusion metrics for each and every epoch. So we just put out this epoch here and that's fine. So that's our caption. We have 1db image and we log it in here. Let's have those, let's say confusion metrics. Yeah, confusion metrics. Okay, let's have that. Now we run this. We copy. Anyway, we had that already. Let's copy this out. We have an error here. This should be closed. Okay, so that's fine. We have this 1db log right here and everything looks okay. Okay, so let's run that again and check this out. Looks fine. Now we could go ahead and start with the training. Let's run this. This should be okay now. And compile the model. After compile the model, we now go ahead and train the model. Let's say we want to have 3 years. Let's keep that aside for now. And then here we have this. Let's paste it out. It should be the same. Okay, here we have it. Now let's run the training process and see what we get. Training now complete. We can get back to our dashboard and see what we have as plots. We'll click on this year. This run, current run. We have this image here we logged previously. Now let's check in this hidden panels and see what we get. It happens that we have this completion matrix logged in these hidden panels. Anyways, we have that. What do you notice? For the 3 different epochs, you could go to the second and then go to the third epoch. Start with the first. We have the completion matrices which should be logged per epoch. Now let's do this so it appears clearer. And we see we have this number of true positives true negatives false positives false negatives and so on and so forth. So that's it. From here you can notice how this value leaves some 68 to 52. And then finally we have 53. Now another interesting functionality is you could simply have that. You could download it and now you have this full screen and you can see that clearly. So let's press escape close and then get back. So that's it. That's how we log this image very easily with 1 dB. Now getting back to the documentation, you might have noticed that you have Keras and then you have TensorFlow somewhere around. You have TensorFlow you see that these are kind of like considered to be two separate libraries. But then just note that if you're building Keras model, that's if you're building this kind of model where you have or you're making use of this method, this feed method to train your model then making use of this Keras documentation right here is appropriate. But then sometimes you want to have control or full control over what you're doing and then you want to be able to do custom training like we have seen previously let's scroll down. Okay custom training loop. So if you're having a custom training loop like with this year you would find that you would not be able to use the 1 dB callback as easily as you had done with the Keras code. So in that case you'll see that instead of having like for example like here we have training block and then we have this loss here all you need to do is come and put in the 1 dB you log and then you log the loss. So here you just say loss and then you simply log this loss, let's say loss numpy and that's all. So this way it takes now to log this loss values the different metrics and so on and so forth. The integration with TensorBoard has been made quite easy too. So if you're already using TensorBoard it's easy to integrate with 1 dB now you're going to see how 1 dB is different from TensorBoard. The ability to reproduce models automatic organization fast flexible integration, persistent centralized dashboard, powerful tables and tools for collaboration. That said let's copy this out, let's copy this again and now when we'll be creating our run with the init method we're going to instead have this. So let's get back to our code let's reduce this year and then stop this run so we have this 1 dB finish we stop the run the run stopped and then we're going to create this other new run which will take into consideration our TensorBoard logs. So let's get back to the top and just right here we'll paste this out. We've already had this so we can take this off. Now let's comment this and then let's have project entity just copy that and paste out here that's fine configuration we have that same TensorBoard that's fine okay let's run this now we get this error so let's take this off here now this isn't compatible with version 2 of TensorFlow which is what we're using so let's run this again and this now should be fine. We now have created this new run let's scroll down and we get to our custom training loop so we're going to run this we have the optimizer, metric metric validation, IPOCs number specified and then let's say we have this as configuration and then we have an epochs an epochs okay so that's it we have all this set and then we run this we've seen this already under the section TensorBoard. We get this running when using several event log directories like what we're doing right here please call 1DB TensorBoard patch and then specify the root log directory before the 1DB init so let's take this from here, copy this out and even if we check in the documentation you should have that so here you have this the 1DB TensorBoard patch and then you have this root directory right here so this means again we're going to stop the current run where are we exactly let's stop this current run it should be at the level of training we're going to stop this current run here, we run this again, we stop that run and then what we do now is we have this here and then we specify logs so we have that root directory specified now we have the root directory specified that's logs documentation, we have that specified before the init so we have to run that before the init looks fine, let's run this now let's run this cell there we go here we go, TensorBoard already patched, removed this TensorBoard true from 1DB init since we've already done this patch we should remove this from 1DB init let's take that off and run again we're still getting the error, so what we'll do is we just get back, and then have that, and then we'll just go as we started initially so let's run this, this should work now, from here we're going to continue what we're about to do with a custom training process so let's scroll down okay, so we are going to this point, we have to, anyway we'll run this already so that's fine, now let's recommend this one and then run this cells, run this neural and method, and then start the training process, as you can see training now completes, let's go ahead and launch TensorBoard, we have the loss values and accuracy values which have been logged in, you can see them here, let's reduce this, validation accuracy, and the validation loss, now from this, we are now going to look at our 1 dB logs, so we take on this run, and you see we have this log data in here you see we have the train, we have the plot for the loss, we have the plot for the accuracy, and we have the plot for the global step then for the validation, you see we have this here, we have the global step, validation loss and validation accuracy, exactly what we have with TensorBoard without adding any extra line of code for the 1 dB that is the only thing we actually did was at this point here, at this point here, we said we wanted to sync TensorBoard, so this is all we need to sync TensorBoard with 1 dB so if you have been using TensorBoard or if you are using TensorBoard on a particular project, that's all you need to log your information now to 1 dB, apart from this, you will notice this TensorBoard here, so you can click on that we are spinning up your TensorBoard instance, hang tight, take about 30 seconds and we will keep it online as long, before we have been completing the sentence you have this already, so you see we have this TensorBoard which has been logged here we have this PyTorch profiler and then time series scalers, just exactly what we have with TensorBoard, so that sounds cool we have the logs, we have the different files which have been saved system information and we have these charts and then overview, where we could see all this information and then get back to our runs, so that's it, we have seen how to sync TensorBoard with 1 dB and so now you are ready to track your experiments with 1 dB, thank you for getting around to this point and see you next time Hello everyone and welcome to the session in which we are going to treat hyperparameter tuning with Weights & Biases there are 3 methods available for us to implement hyperparameter tuning in Weights & Biases, that is the random search, the grid search and the bias and search method. At the end of the session you will be able to search for the most suitable parameter values, which optimize the accuracy. Previously we saw how to implement hyperparameter tuning with TensorBoard. We created this model tune method which takes this argument HPROUNDS as hyperparameters which are actually going to be tuned as you could see here and then we have as output of this model tune method the accuracy so what we are trying to do here is we are trying to tune or we are trying to obtain the optimal values for these different hyper parameters which maximize the accuracy of the model the exact method we used was a grid search with a grid search method we actually go through each and every option or each and every possibility and for each possibility we are going to log its accuracy as we did right here now with Weights & Biases we are going to see how to redo this but in an easier and more reliable manner so let's check out another documentation right here we have this documentation here we have hyperparameter tuning click on the sweep start which shows globally what are hyperparameter tuning with Weights & Biases we set up Weights & Biases we configure the sweep we initialize a sweep we launch agents we visualize the results and then finally stop the agent. From here we will see how to run these sweeps in Jupyter. So the Weights & Biases sweeps allow you to easily try out a large number of hyperparameters while tracking model performance and logging all the information you need to reproduce experiments now we are going to focus on this pure Python method where the sweep configurations are in the form of a dictionary like this the way these sweeps work the way Weights & Biases sweeps work is we have a central sweep server that is this one year the central sweep server and then we have these different agents right here we could have many more agents. Then once we set the configurations the sweep configurations in this central sweep server right here the agents now take over and do the actual hyperparameter tuning that is a search for the best hyperparameters which help in optimizing the model and this parallel method of implementing sweeps in Weights & Biases help make the hyperparameter tuning process even more efficient. As you could see here this could be done in just two steps. Initialize the sweep and put all necessary information or give all necessary information for this central or to this central sweep server and then we run these different agents right here. Now that's set we have the sweep configuration which looks similar to what we had when we were working with TensorBoard. So here we have this configuration right here where we specify for each hyperparameter the values you can take given that we're having a grid search or making use of a grid search algorithm or specifying simply all those different values and that was it. Now coming back to Weights & Biases you see here you just need to specify for example the hyperparameter, that number of epochs, give the values, learning rate for example, you give the minimum and maximum value and so on and so forth. But then note that here in this example given the documentation, the method you're using is not a grid search. That is you're not going through each and every possibility in the list of values. What you're doing here is actually a random search which happens to be more efficient than the grid search algorithm. Since it's possible in a shorter period of time to get very optimal values of the hyperparameters as compared to the grid search algorithm where you need to go through each and every value. Now let's get back here. We have the name, we have the method and we have the parameters. To understand where all this comes from check out in this sweep configuration right here. So clicking on the sweep configuration we have the structure of a sweep configuration then here we have the different keys and their descriptions. You'll notice this method which we had seen here. Let's open this another tab, open a new tab and have that. So let's have it this way. There we go. We have this name method and parameters which we could see here. We have name, method parameters and then for each of these configuration keys like this method for example, there is this more explicit description. So here we have the method you're going to use. If it's a grid grid search iterates over all possible combinations of parameter values the random search chooses a random set of values on each iteration and weights and biases also comes with this other method which is the bias method. This bias in hyperparameter search method uses a Gaussian process to model a relationship between the parameters and the model metric and chooses parameters to optimize the probability of improvement. The strategy requires the metric key to be specified. Here we see now we have these three different methods and that is why when we wanted to work with a random it suffices to just put this out here. Either you put it random or you put grid or you put bias. Now we move to the next. We have the parameters. We have the parameters. Anyway the name, you could put the name for the sweep. That depends on you. Here is my sweep. Then the parameters here it gets a little bit more tricky because here you have this here this key and then the value is a dictionary which itself is having its own keys and values. We have these parameters. Let's get back to parameters here. We have the parameters. Where are we? We have parameters and then we have these different possible values and the descriptions. Getting back here you will notice that this hyperparameter epochs takes values in this list that is added 10, 20 or 50 while the learning rate can be chosen between 0.001 0.0001 and the maximum 0.1. Each hyperparameter has its own distinct way of describing the values it can take. Now getting back here we have different parameters and then for these values you can say a value. Sometimes here we may decide and say we want this epoch to take only just one value. There we just have value and then in the case where we want several values you see it specifies all values for this hyperparameter compatible with grid and all of that. Now here in some cases you have a distribution and this selects a distribution from the distribution table below. So you can see here this specifies how values will be distributed if they are selected randomly e.g. with the random or biased methods so when working with the random or biased methods you may want to select your values based on the given distribution and your other list of distributions you can use. You have constant, categorical int, uniform uniform distribution, cure uniform log uniform, cure log uniform, normal distribution cure normal, log normal and cure log normal. Then from this distribution we now move to min-max. Min-max is what you actually saw here here you had this min-max so here you are simply saying you want your learning rate to to have a minimum value of this and a maximum value of that so we randomly pick values between this range and that's it. You have MU, MU mean parameter for normal or log normal distributed hyperparameters. So here you have a normally distributed hyperparameters and you are specifying the mean while here you are specifying the standard deviation. Then for cure you have the quantization step size for quantized hyperparameters Our next key is the metric and with the metric we have to define the name of the metric, the goal, and the target. Now here you could have for example like your validation loss you have this metric validation loss. You could say you want to optimize your parameters or you want to choose the hyperparameters which minimize the validation loss. Now you could also change this into an accuracy. Let's say validation accuracy or just the accuracy or just the trained accuracy in that case you would want to choose the hyperparameters which maximize the accuracy and hence here at the level of the goal you either minimize or maximize. The default is minimize. So if you have a validation loss it's needless specifying the goal because by default it's minimized and for now as this documentation there's really no automatic way of deciding whether it's minimize or maximize. Anyway you can always do that manually by specifying. Now the next we have is a target. So here with the target as you could see here for example 0.95 it means that if you happen to get a set of hyperparameters which permit you to get this validation accuracy to this target value then at that point you will stop the process of searching for the optimal hyperparameters and what happens exactly is all agents with active runs will finish their jobs but no new runs will be launched in the sweep since we've already attained the objective. From here we have early terminate which is an optional feature that's a piece of hyperparameters searched by stopping poorly performing runs. We get back here, copy out this code just copy it, okay copy it we get back now to hyperparameter tuning we paste out the code here and then we just going to try to replicate what we have done already with TensorBoard so here we paste this out here and then we have this numUnits1 we have this numUnits1 this here and have that numUnits1 values there we go let's copy this out and paste here so we replace these values which we had already and then the next will be numUnits2 it's kind of similar so we should numUnits2 so here we should just copy this and paste numUnits2 values it's kind of similar, okay we have that and then there we go so we have numUnits1, numUnits2 and then the next drop out rate we're going to have learning rate so let's just have the drop out rate here so we have the drop out rate and this drop out rate we take values 0.1 to 0.3 now what we're going to use is like this min max so let's just copy this out here and then let's copy this out and then paste it out here okay so we have drop out rate but here we're going to go from 0.1 to 0.4 let's say 0.4 and this random actually is not agreed so we have that and then here okay name, lesson, name malaria prediction, sweep okay mental random parameters is fine so we're getting each and every parameter now we have the drop out rate set we now move to regularization rate regularization rate 0.001 and 0.1 okay so we have 0.01, 0.1 and then what we could do now is we make use of distribution and then we have your uniform so we specify that we want to make use of your uniform distribution and then here so we're going to use the same again here for the learning rate distribution uniform there we go we have that uniform and that's fine here we're going to go from this let's say 1e negative 4 min and then max 1e negative 2 1e negative 2 okay so looks good let's take this off now let's take off what we've set here and then we have that so from this we'll be able to create now this sweep id we'll get our sweep id by running this 1db sweep and passing in the sweep configuration to run an agent the first step is we're going to define a function to run the training based on those hyper parameters and then we're going to pass that function with the sweep id here in this 1db agent method so now let's copy this code out and then paste it out here paste it out here and you'll also notice that this is kind of like similar to what we had done here because here we had defined this method which takes in the hyper parameters which we're trying to tune and then we go through this method here for each and every sweep now let's get back here we have the model we could simply make use of this model so let's copy out some part let's just copy out this here let's copy that out and paste this here paste it out and we have model tune the net model which we create here hyper parameters and love that and then instead of what we had here where we pass the we had the compile and the feed method what will return here will be just this model so we return the net model right here so let's have this new net model let's take out this part of the code and that's fine okay so we have this model tune here instead of make model we have model tune and we'll pass the configuration in here we add the project and the entity in our init method then we've modified this keys right here to match with those of our 1db configurations which we have seen already let's get back here let's just copy this and put right here so you could see that so as you could see instead of number of units 1 we have number of dense 1 and here we have number of dense 2 so that is it we have the different values you could take we have the dropout the regularization rate and the learning rate now right here in this model this our model we should call model tune we shall make use of this 1db configurations which we've already passed in here and so here we have config and config and we're going to do the same for all these other different hyper parameters like here we'll have number of filters here config number of filters and then you should note that in this example we are not going to turn all the hyper parameters so the hyper parameters which we want to turn should be in this parameter or parameters dictionary in our sweep configuration the method is random and the metric is the accuracy with the goal of maximizing it so if you want to tune any parameter or hyper parameter you put that in here for the rest we're just going to use this configuration which we've set already so it'll just have some fixed values so that said we would modify all this and there we go so we now have all these different modifications run this, run this and then back here on this train we could now replace all this with the compilation and the fit method now again here we'll have this learning rate so we have learning rate and then we could modify this number of epochs to say three or actually the config so we have config number of epochs and you should note that we're carrying this only on the validation set so you could try this on the full training data set that would take much more time and also you could feel free to carry out this hyper parameter tuning on more parameters so you could take up say the image size like this parameter here, you could add that up and then see how this image size affects the model performance so that said we have all this already we have our agent which takes the sweep ID which we've defined already, we have the function which is the string right here and then we have this count which is the number of runs to execute. Before we proceed note that we specified this configuration which is essentially this configuration we have here so we could just take this off and save it, this is our configuration, ok so that's our configuration and now we are ready to let our agents do their job so let's run this now after training runs here's what we get, start from here you see that it starts by picking the drop array, the learning array the number of dense 1 number of dense 2 and then the regularization rate while obviously keeping the other parameters constant here we have this loss and its corresponding accuracy and you could check out for the other sweeps down here, you see we have 20 different runs and we could get right here you see and we click on this to view the sweep and now on this page you could see the different runs we have here from 1 up to 20 the epochs or rather the run, the accuracy and its corresponding loss and then we'll skip this tool for now, let's look at this we'll go ahead and check out the hyperparameters which produce the highest accuracy which is this here, let's highlight that, we have this well let's pick this out from here there we go we have this, we have this, we have this that comes down here, we have this and then we have this so here we have the hyperparameter values which give us the highest accuracy score then coming into this let's take this off we could check out these different parameters or parameter importance with respect to the accuracy so you see that the most important is the learning rate followed by the run time number of dense 2 dropout rate, regularization rate, number of dense 1 and here we have this other hyperparameters which we did not modify now let's click on this here and see what we have we told automatically shows the most useful parameters, so let's check this out you see that after clicking on that the other fixed hyperparameters and even the run time is taken off, so we left with only this hyperparameters which we had fixed or better still which we had put in this slip configuration then one other point you could note from here is this correlation so apart from importance we could check out the correlation and here we told that this learning rate has a negative correlation because you could see with the red, the red indicates negative correlation while the green indicates positive correlation so you could check out the value, you see this is 0.051 while this is negative 3.53 now what this means is the lower the learning rate then the higher the accuracy and then in this plot to the left we have the accuracies for each and every run so that's it for this section, thank you for getting up to this point and see you next time. So we are getting into this data here and then this other preprocessed data or this preprocessed version is again preprocessed to give us this one and then finally, we have this last step, this last preprocessed step which gives us this data set version which happens to be an augmented data set version so this shows us that if at any point in time we are identified with this preprocessing which was done right here to produce this data set version we could simply preprocess from or making use of this data set version right here and so this greatly simplifies our data set management when working in our different machine learning project just as gate pyramids us do code versioning, weighting biases gives us the possibility of doing data set versioning and model versioning and this can be done using weights and biases artifacts which help us save and organize machine learning data sets throughout a project's life cycle. We are going to start with the data set versioning and the most common ways in which weights and biases artifacts have been used for data versioning are to version data seamlessly, pre-package data splits like training, validation, and test sets iteratively refine data sets, juggle multiple data sets, and finally visualize and share a data workflow before getting into seeing how artifacts could be used in data set versioning let's look at this simple example in order to obtain the malaria data set we have to start by loading this data. We have this data loader which is represented by the square and now once we've loaded this data we have our data set so let's say we have this original data set right here we now go ahead to split this data so from here we can split up this data and then have the three different parts we have the train data, the validation data, and the test data. At this point all three data sets have been passed in this pre-processing units right here so we pre-process the train pre-process the validation and pre-process the test and this gives us an output here. So for the train we have a pre-processed training data. Here we have this pre-processed training data we have this pre-processed validation data and here we have also this pre-processed test data again at this point we are now going to carry out augmentation on this pre-processed training data. This PTRD means pre-processed training data and then pre-processed validation data pre-processed testing data ok so we have that and then we pass this through this augmentation process. So we have this process was rolled with that of carrying out data augmentation on this data set to produce another data set which is actually now an augmented version of this so here we have pre-processed training data. Now we have augmented training data and this is kind of like the life cycle of our data set and this particular problem in which we are working on. Imagine that this original data set contained mislevelled examples in that case what you want to do is now to carry out this level incorrectly such that we have this data set here which now has been cleaned. Overall this leads to a high level of accountability as when working on a team and let's say you have done some modification, let's say you have cleaned the data, all the people in your team can now view this cleaned data set and decide whether to modify it delete it or keep using it. So if we suppose the team accepts this cleaned data set and everyone is happy with this newly cleaned data set, we now see that instead of passing this directly into the split, we will now pass this one into the split. So we'll have something like this, we'll go this way, let's take this off. In that case we will have to go this way we go this way this and then into the split. And then talking about data set versioning for each and every data set we've created here, like for this one, this one, this this, this, this, this, or this one we have different versions so we could have for example this PVD here, that is the preprocessed validation data and it could have a version 0 or version 0. So we have the version 0, right, yeah and then later on you may modify this preprocessing and it leads us to have another version of this validation data. So you could have another version, version 1 and the version, version 2 and the version, you could say version best, you could have version latest and so on and so forth. And a good thing is when weights and bias or when using weights and bias to store this data or when we using weights and bias artifacts the data is taught in such a way that if we have data in this original data set, which is exactly the same as what we have in this preprocessed validation data then that data wouldn't be duplicated. Now it's true that if we preprocess this data obviously we'll have all the data set or all the elements of the data set changing. Now let's take this example for the cleaned data set. So here we have this original data set. Let's suppose we have say we have 10, we have 10 different examples, different samples in this original data set and then after cleaning our data set only two of these samples have been modified. So we have 8 unmodified and 2 modified. So we have modified. We have changed the levels of these two samples here. What weights and bias will do is it will ensure that this 8 year aren't duplicated. That is we don't create extra space for those 8 orders which haven't changed from this previous original data set here. And so in that case we're going to only store these two new samples here while weights and biases keep track of the fact that these other 8 samples haven't been modified and so don't necessarily need to occupy extra space in the storage unit which weights and biases makes available for us for free. That said we could look at those different processes right here. Those processes in the square boxes as 1 dB runs while these different forms and even the versions which are data takes are the artifacts. And so we could consider those key here where this represents the artifacts and the runs. Also we see that the artifacts are connected together by these different runs. So these two artifacts, this trained data and the original data set has been connected together by the split run. And then these two PTRD and trained data are connected together by the pre-processing run. Getting back to the documentation, we'll see how we create this artifacts here which is called the new data set of type raw data and then this is created within this run right here. So we'll see how we create this run and specify the project, my project and once we create this artifact we're going to add data into it. Now one thing you could do with 1 dB is you could simply add a whole directory. We're supposing that all your data is in a given directory and so you just all you need to do is specify this path and then you make this data part of this 1 dB artifacts which we've called my data and then finally you log this artifact to 1 dB. Let's copy out this sample code, paste it out here. We have that sample code and then we could get started with our data set versioning. Now we are going to put this in a width statement so we have here width 1 dB in it and then we specify the project, project we're working on which is this malaria detection project, the entity you will learn and that's it. So we have this here and then we are going to create our original data. So we have original data there we go. We are going to create this artifact actually. We have that original data, 1 dB artifact new data set type raw data. Now to check out the different arguments we could pass in here get back to documentation let's check out here and then you could scroll down. So here you have this references let's reduce this and have this clearly. We have references on the references you have this Python library and then you have 1 dB artifact so you could click on this and what do you get? You have this documentation right here. So we have the different arguments name, type, description metadata, incremental and use us. Now note that this 4 are optional so that's why in the example we just had the name and the type. We'll now add up this here. We have name and then type and we have description with metadata. So let's have this description right here. Description simply we could say malaria data set or tensorflow tensorflow malaria data set so that's it. For the description it's actually a string and then we have this metadata which is this dictionary which contains information related to our data set so here we define this dictionary we're going to start for example the source so we could have your source and then we say tf.data.set. Okay so we have that source. We could also add other information from this homepage so let's even copy out this description here and then paste it out here. So in place of this description we have that and there we go. That's fine. Now we check out on the homepage we have homepage source code. Let's copy this out and paste out here and now we have all this necessary metadata information so here is it. We've created this artifact. We then paste out this code which we had seen previously and which premiers us load the malaria data set from TensorFlow data set and then we'll save this data set in the NumPy compressed format. So with our original data which is this one here, this artifact.new file new file so unlike here where we add directory here we add in our we created this new file which now contains this data set so with original data data.new file we'll call it original original data .npz so that's our file name and then the mode is going to be a write mode and we have this as file. So we have this artifact. We add this new file with a file name and then we save this file while putting in the appropriate content. So right here we have np save that's our compressed format call here we have npz and then here we have this file and then what we pass in is our data set which is this one and so at this point we've read in our information or this data set in this artifact and then we're now ready to log this. So here we have rounded log artifact. This basically this here. So let's do this. Let's take this off. We have there we go. We have rounded log artifact and then what we pass in here is original data. Okay. So that's it. So now we've seen how to create this run and in this run we have this artifact and then we put in the information in the artifact. Now one thing we could do is put all this in a method. So we'll define the method load original data. So we load original data and simply does it. So let's send this one step and that's fine. There we go. So we have that load original data method defined. Let's add this code cell and then we can call it right here. So we can call load original data and there we go. So let's get back to this diagram we had previously. You see that in this diagram if we take off. If you don't consider this path that this information flows this way and then gets pleats and so on and so forth. So here what we have is we've had this load original data method which takes into consideration this tool. Here we have this run which is the data loader. So we have the little run and then what it does is it recuperates this original data set from TensorFlow data sets and then produces an artifact which is this one now. So the artifact we have just created here. This artifact original data is this which we had drawn previously here. We then run this cell and load the original data. We get this output. Unfortunately this process has failed. Let's check out the reason why this failed. Cannot convert a tensor of d-type variants to a non-py array. Now since this variable data set we have here is of d-type variant. What we're going to do is we're going to take out each and every element of this data set and save it in a directory. So from here we're going to copy out this code. Let's copy out this code and paste it out here. Click back. Now what we're having here is we're going to go through this data set. So we have for dean data set or for data in data set. Now we have a list here and then we can pick out the zero element. So pick out the zero element and then for that we're going to create this folder data set right here. So let's have this new folder and call it data set. Okay so we have that new folder data set. In that folder data set we're going to put in these different tensors which are going to be stored in this non-py compressed format. So that's it. We have this. We modified the name. So let's say we have Malaria Malaria data set and then we give it a number. So let's have that plus we'll give it a number plus this. So here now we have this K. We've initialized K and then we're going to continually be incrementing this K. So here we have Malaria data set. We have K and then we have this extension right here. So that's how we're going to save this file and then the data we're going to be saving is going to be this. So we have data which we're going to be saving in each and every one of this file. Now let's test this out and let's put out a break here so we see exactly what we're getting. We run this cell original data not defined. We instead need to do open here. So we're going to open this file and then save that. Let's run this again. We check out now this directory. We have your data set and there we go. You see we have this information here saved. Now let's run this for all the data set. So let's take out the break and then run this. Let's print out K and then like after a thousand steps so if K modulo a thousand a thousand equals zero we should print out K. Okay so let's run again. We have now all this data locked. Let's take off this one now and then get back to documentation and you have my data.add directory there. So you pass in the directory right there. So from here we have now instead original data.add directory and then we're going to pass in the directory which in this case is data set. So we have that data set passed. We don't really need to have this again so we can take that off. Okay so that's it. Everything looks fine. Let's now run this cell right here. We run that and run this. And this time around the artifact is loaded successfully. So now we are going to check out in our dashboard and here you see that we have this raw data, new data set and then you have this version two different versions of this new data set. We have version zero and version one. Now click on this version one. You should be able to have this. Let's check out the overview. You see you have this data set. Malaria data set on all this. In fact this is metadata we put in already. So that's it. We have this API. We're going to come back to this actually because to use this artifact we're making use of this API. Then we check out on metadata. We have this metadata which we logged in. We should put in in the notebook. We have the files. You see those different files here. This directory. You see you have the root and then you have basically all those files. Now you have this graph view which for now is very simple. So here you have a run. The run is actually this run here. So this run and then we have this artifact which has been created. Now this artifact contains the raw data. So we could click on explode and basically that's it. Here you have the name of this run. Vividwater name of the run and then the data is this new data set and it's version one. Then the next thing we want to do is to move to the next step. That is be able to split this data set into this other, into these three different parts. Now what we could do other than this is process this before doing the splitting so we don't have to do the split processing tries. So instead of having this let's take this off here. We'll modify this such that here is what we now obtain. So coming back to this we have this run and this artifact which our raw data set is basically what we have here. And then the next step will be to preprocess this. So we'll do the preprocessing on our raw data set and then produce this preprocess data and then from here we'll now do the splitting. We'll have this run which whose role will be to split our data into the training validation and testing and then for the training we'll have another run. Whose role will be to do augmentation on our training data. So in fact in summary this is what we want to achieve. Like when we're done with all this we want to have a graph view which looks like this. We now go straight forward into the preprocessing and that's copied. We get back to the code that's fine. We paste this out here. We then put this in the wait statement and then run this code and here's what we get. We've now downloaded this 27,000 files. We check out here and we should have this artifacts new data set version one and then see we have all those files which we unlocked in previously. So now this means that the next time you want to work on this data set you don't really need to come and run this year. Like we don't need to run this again. So we don't have to do this again. All we need to do now is just to use this artifact which was in biases starts for us. And so now that we have this we'll get back here and then we do this resize, reskill. Particularly our processing here is resizing and reskilling. So you're going to do a resize and a reskill of all our images. You can copy this and then get back to where we actually have loaded our artifact. While we print this out we have this path to our different files. So we will have your artifacts. We can check out all those files here and then right now we're going to create this auto run. So let's copy this out and then have this pasted here. Now here we have this preprocess preprocess the data. So this is this method we're going to be defining 1DB init project entity as well. We've created this new run and then we have now this preprocessed data preprocessed data which is now this new artifact and the name is preprocessed preprocessed data set. And then the type is preprocessed data. We have our preprocessed data. Okay. Description we'll say a preprocessed version of the malaria data set. Of the malaria data set. Let's take this off and then for the metadata we could let's take all this off. Let's have this taken off but you could always put information in the middle of the data. We are now going to go through each and every file we have in this directory here. So we will have for file in let's say for F for F in OS the list of this directory here. Let's copy this from here. So we're going to create a list from this directory and we go through this list. So we go through this list and then for each and every file in this list what we'll be doing is open up that file. So here you have artifact directory which we've just defined and then we have this artifact directory plus F to specify the current file and then we read that file as file. So from here now once we've read this as file we have x y outputs. Recall this is our output. So for each file since each file we had has an x and a y we'll take this from here and then we have this load file. Now once we have this we're going to have x or better still x full. That's x data set. Let's call this x data set or data set data set x. So we have our data set x dot append. We're going to create this as a list. So we have here data set x a list and then we have data set y and not a list. So that's it. We've created these two lists and then we want to take each and every element and append it to this data set x and data set y respectively. Here we have for x and then we have data set y that append y. Now recall what we have to preprocess as this is our preprocessing method. So here we have this resize. Let's make sure we run this resize rescale and it takes in the image or it takes an x basically. So here we have resize rescale instead of passing x we'll do resize rescale and then we pass in x. Now we ensure that we'll define this aim size. Let's have this aim size defined here aim size equal to 24. Run that again. That should be fine. We need to also specify the fact that we'll allow pickle here. So we have this argument which is done to true. You could always check out the documentation right here. And then from here we are not going to take this directly. Here what we get is npz array and then to obtain x and y we have x and y which is equal npz array. We get the numpy array and we get the values. So that's what we do to get this. You could always check out the documentation to understand how all this numpy load and all of this work. Now once you have x and y this is where we pass in here. So from here we can run this. This looks fine. We have all dataset and that's ok. But before running we have to ensure that we convert this now into TensorFlow dataset. So let's get this done by having this called dataset and we have tf.data dataset from TensorSlices and what we're going to pass in here is dataset x and dataset y. Now we're going to be saving this as a file. So we're going to take out this and we have this portion of this code right here which is from the documentation. So here we have with artifact. The artifact here is pre-processed is pre-processed data. So we pre-processed data the new file. We specify the file name. Let's call this pre-processed dataset as file. We then save this with numpy. We have np.save z in the compressed format we specify the file and the data to be saved. In this case all data is this dataset right here which is this TensorFlow dataset. This looks okay we can now do log artifact pre-processed dataset or rather pre-processed data. Then before running we are going to take out just a part of all this dataset. So we'll take out only a thousand elements. And the reason why we're doing this is because we do not have enough memory to store 22,500 2500 2500 different data points as a single variable. So we'll have that for now let's have this 1000 elements so you could see how this is done. Now if you have a real world problem where you have say 100,000 different elements or 100,000 different data points then you could break them up into simpler parts. So here we have that a thousand we take out just a thousand and then one thing we'll do is copy out this part here. We'll copy out this part and then include it in this run. So just before this so we include this here in this run. And the fact that you include this in this run will link up this new dataset with a pre-processed dataset. Now with that set we could run this and then run the next cell. What we get is this error saying cannot convert a TensorFlow of D type variance to a non-pyre. So here we are having this Tensor of D type variance and we're trying to store it as a non-pyre. And to solve this problem now what we'll do is we'll just ignore this and then save the dataset X and Y as this list we have dataset X and dataset Y. So let's run this now and pre-process our data. We now have successfully logged this to our artifact. Let's get to our dashboard, 1DB dashboard let's refresh this page. Here you'll see your artifacts you could click on artifacts, media detection artifacts and then you could select from this. So this is what we had previously this new dataset under this raw data. Now pre-processed data we have this pre-processed dataset and this almost recent version there we go. You could check out the files you see this file right here metadata API which you could use now to do or carry out all the operations the overview, graph view, this graph view right here now let's explode this graph view ok we have exploded this graph view, let's zoom ok now you'll see that let's drag this one here and then what you'll notice is we have this first path which has to do with the creation or rather with the loading of our initial original dataset and then once we load this original dataset the next thing we did was to now pre-process this dataset now we'll carry out all the runs previously that's why you have this so you don't really need to take this into consideration from here we'll just continue from this point here we copy out this API here we have that copied and then we get back to our code where we're going to start now where the data is splitting you see that we're going to have, we've already had two artifacts created one was original data, the other the pre-processed data the next will be the trained data validation data and test data and so that's why we call this section the data splitting we'll again copy out this part from the pre-processing data and paste here now we have this let's go ahead and take this off from here and then replace this one here we have this artifact ok we have this artifact now which is the artifact we're using now is pre-processed dataset and then from here we have three artifacts which we're going to create we have trained data we'll just call this trained data set trained data set type let's say pre-processed data, description trained data set and then the artifact directory we could get it from this year so when we create this artifact we're going to get this artifact directory for now let's just create the other artifacts so we copy this out, trained data, we have validation data and then the test data to obtain the artifact directory we're going to run this we have this output here click on this artifact and see we have this pre-processed dataset which has been loaded and we have our prep dataset.npz file here so this here we copy this path and then at the place of this artifact directory we're going to place this path let's take this off we have this path now which has been placed we have now the artifact let's call this artifact file and then let's paste this out here take this off and there we go we've now taken all this off ok so we have this here and then we're going to have our artifact file instead past here, artifact file, so we read that as file and then we're going to look this file, allow pickle and get the array then at this point we define the trained split, valid split and test split from here we're going to paste this out we have our trained array because we're trying to create these three different arrays, we have trained array which goes, takes values from zero to the trained split we define a data length data length which is the length of array zero now recall that the array the array we have in this array is made of is at least actually made of two parts the x and the y this x has a length of a thousand the y length of a thousand, that's why we're doing array, the zero index of array and then we're taking its length, so here we're going to get data length to a value of a thousand and once we have this data length now we're seeing that we're picking out this x we're picking out this x right here and then we're taking values from zero to eighty percent of the total length and so we have trained split which is zero point eight times data length so that's what we have and then we repeat the same for this y then for the validation array, we're going to start from this trained split we're going to start from here, times that, so from the each index with respect to the total data set, that's why we're multiplying by the data length we're going to go from here, right up to the trained split plus the validation split, so now we're going to from zero point eight to zero point nine and we have that set, we'll just repeat this for the validation for the y right here, so let's copy this out we'll copy this out and then paste this here that looks fine we have that paste that, we have it here and this here this looks fine, let's repeat the same process for the test array, copy this and paste out your test array we're going to go from trained split plus valid split right up to the end, so we're going from zero point eight plus zero point one, that's zero point nine times the total data length from the nine hundredth value right to the end we'll just copy this again out here, copy that out and then paste this for the y value so we have this here take this off, paste it and then we go right to the end so that looks fine, again we have now our trained array validation array and test array, we're now set to write this information in our artifacts so here we have with preprocess data, this is instead now we train data, we call our artifacts we just created here is trained data trained data, valid data, and test data so we train data, the new file, we have our we'll call this trained data set npzat mode wb save the file and then we're saving what we're saving is actually just trained array so we're just saving the string array and then we repeat the same process for the validation and the testing, okay we have that, validation validation test test, and then here we have test validation okay we have okay, here we have validation, and then here too we have test then now we log our different artifacts, so we not only log in this one artifact but the three different artifacts stick this back, and then here we log trained data trained data valid data and test data okay we have that set we could run this, let's change it to split data we run this cell here everything looks fine, and then we move on to split all data split data, we run that cell and wait for the response here's the output we get, now the reason why we're having this is because we must have integers and not floats at this level, so when we have this indices we have to convert this all into integers, so we have this int here there we go, int and from here we now run this cell again, and then split all data, the data has now been split successfully, let's get back to our dashboard right here we have our 1db dashboard artifacts, let's refresh this page as you can see we have our data preprocessed data, and now we have the test data, validation data and our training data let's click open this training data so you can look at the file here metadata, API we can always make use of this in creating other artifacts, and let's get to the graph view, in this graph view now we'll be able to see the link between this original data set, the preprocessed data and the training data set, here's what we get, let's click and explode and you can see this clearly now, so here you see you have this original data set this artifact here, we have this run which produces this data preprocessed artifact, and then we have this run which produces this training data set, validation data set, and test data set, so let's draw this, take this off we have this path here, you see we have this here, we take this, this, and this, let's copy out this code right here, and then start with the data augmentation so from here we now download the training data set, we can check this out in our artifacts you should have training data set right here, there we go, we have our training data set, we can now copy this path let's copy this path and paste here, now we can click on this one to be finished to stop that run, and that should be fine now let's go ahead and copy out this part of the code which was used for preprocessing so here we have this here, preprocessed data, we copy this and then, there we go let's paste this here, we have our augment data and then project artifacts we're going to use this artifacts here, so let's come right here and then get this path ok, so we're going to get that path and then replace this one with this op-on part or the path to the training data set this paste is out and this should be fine this is actually a file so we have artifact file now we have this take this off preprocessed data, instead of preprocessed data, we're going to use augmented augmented data wanted to be artifact, augmented data set augmented data set type, let's say preprocessed data an augmented version augmented version of the malaria train data set so everything looks fine for now, we have that and then here, let's get back to this and then copy out this here and then paste this right here, so we're getting this from our training data, obviously there we go and everything looks fine next thing to do is do the actual augmentation and then log the data set to our artifact so let's take this back and then we take this off now we have this, we open up this artifact file right here, artifact file, we open that up we obtain our array let's call that array then for images, or for image we obtain the array our array O, we've taken out the X so for image X before moving on, we're going to create this empty list here, data set X, and then we have that ok, we have data set X and then data set X with a pen augment of whatever image we want to pass in let's send this one step, and that's fine so for all images, we are going to do this augmentation then after this, we have data set Y, which is simply the unchanged levels we've had already so from here now, we have all data set X and R data set Y, then we can now run the salt, and then we run the augment data we have here augment data, and they should be fine we obtain this error, preprocessed data is not defined, so let's check back here and we see that this should be augmented data instead so let's change this, and we have augmented data there we go, we run this cells and see what we get now those artifacts have been locked successfully, we could get to 1 dB, and check this out here we have this, and then we click on augmented data set so loading the artifacts now let's load that let's click on this, and check out on this graph, so let's explode, and this is what we have now, so you see again that we have this path, we have our data we split this into train validation and testing you see here you have the training, validation and testing, and then after the training we have this run, which converts this training data into an augmented data, which is this one right here and so at this point, we have different versions of our data, which we could make use of depending on our needs thank you for getting around to this point and see you next time hello everyone, and welcome to this new session in which we'll treat model version with 1 dB now, in the previous session, we had looked at data set versioning, in which we had created an augmented version of our data, which happens to be this version right here from previous versions like a preprocessed version another preprocessed version and the original data, and so from here we are now going to see how to implement model versioning using this augmented data and the untrained model version, so we have this version of the model, which is untrained and the augmented data, which are going to be passed in this run to produce our trained model version. In the previous session, we saw how to implement data set versioning with 1 dB artifacts. We left from this original raw data right here, onto this preprocessed data we created a trained validation test sets, and then finally we had this augmented data right here, and now we are going to see how to make use of this augmented data in training our model but while training this model we are going to implement model versioning, and the two model versions we will have here will be the model before training and the model after training and so we are making use of this sequential model which we had created previously, and the good news here is the way we implement model versioning is quite similar to the way we implement data set versioning we will start by copying out this code, and then we will simply modify that and here we have this 1 dB model versioning, let's paste it out and then here instead of augment data, we are going to log out model so here we have log model, which is in fact this Lynette model right here, we have the log model 1 dB in needs, our project entity specified, the artifacts to be used here is known, so you are not going to make use of any artifact to log this model artifact right here, and so with that we have this, take off this augmented data, we have our model that's called untrained model, and then we specify that it is this untrained model type model, so we have that type model the initial version of our Lynette model there we go, now we have that done this artifact file here, we are not going to make use of this we are not making use of any previous artifacts we just log in this model initially, and then here we will take this off we are not going to make use of this just before writing in our model we are going to save that model so here we have this model to be saved we call, this is our Lynette model actually, so we have Lynette model save, and then we will give it a file name, so let's call it Lynette.h5 and that's it, so we will save this model, let's have this file name to be Lynette h5, so let's define this Lynette h5 file here, and take this off so we have now our file name now, since we have already saved this file instead of going through this process here of creating a new file and all of that, we are just going to add a file, so here we have our untrained model.add file, and then we specify the file name so that's all we need to do, and then we are also going to do 1db.save that file based on the file name, and that's it. So let's take off this part here, and then we are going to log our artifact, which happens to be our untrained model so we have your untrained model and that's fine. Now we could add in some metadata, so let's have metadata which equals our configuration, which we have defined already. In here let's modify this name, this is our untrained model so here is untrained model and that's fine. So with this we could now run this here with this successful run right here and then we can now go back to 1db. Getting back here we have this under our artifacts, we have the model so here we could click on this model, and then we specify the untrained model so let's click on this and we have this overview you see we have that API metadata, you see the metadata we've logged in, we have the files you see here the Linux model and then we have as usual our graph view which we could explode notice that it's quite simple because for now it isn't linked to our data set, so all we have is just this run which produces this untrained model or this initialized model from this point we'll go ahead to train our model and log that trained version of the model. So again here we have this artifact which we are going to use in training the model which happens to be our augmented data set. We call that we had logged this augmented data previously, we could have this overview from here, we have this augmented data set and then it's of type preprocessed data, we've seen that already we downloaded data and then we create this new artifact which is going to be our trained sequential model we call this trained sequential model so of type model, description a trained version of our model, simple as that. So that's it. We could also pass in the metadata just as we had seen here with this log model where we pass in the configuration so metadata equals our configuration and that's fine. Getting back to this augmented data method we had seen previously we're going to follow this same pattern that is we're going to download the data as we've done already we download the data and then we have this artifact file here the artifact file we're using was a trained data set because we needed to take this and convert it into an augmented data set and then we had this processing right here to produce data set X and data set Y. Now we're going to do this exact procedure for training our data on the augmented data so let's paste this out right here and then here our artifact file will be this org data set so here we have org data set and we get this right from here. Let's get back to org data set, files and there we go. We see we have this org data set.mpz which we started previously and so once we are going to download this we're going to have this file which is going to be found in this folder which is also found in this artifacts folder. So we have artifacts augmented data set version 0 which is from this. This means that if we want to use the version 1 we just simply have to specify your version 1 and so on and so forth. So that said we have our artifact file data set and then we go to this processing to obtain data set X and data set Y and then from here we'll go through this series of steps to convert this data into a tensorflow data set type data. So here we have DX, DY this data set X we convert to tensor so we convert this data set X to a tensor and data set Y to a tensor. Then from here we apply this from tensor slices method which takes in DX and DY. That's our data set X and our data set Y to create this new data set D. Now we go to the same processes we have seen already that we shuffle, we batch, we prefetch and then we have our training data set. Now note that you could repeat the same process for the validation but here we just do this for the training and you could take it as an exercise to do the same for the validation. Note that for validation we didn't make use of the augmented data set we instead made use of the validation data set itself. Let's get to the graph view so you could see that clearer. And as you could see here we have this data which is pre-processed pre-processed data, the training data set pre-processed data, the validation data set, pre-processed data, test data set and then from here we have the augmented data which is this one. And so when you want to carry out validation you're not going to be making use of this training data obviously. So we're going to make use of this one instead. And so if you are to put in the validation in here, let's convert the validation data into a TensorFlow data set type data and then use it for training and evaluation then you would have to make sure you modify this here. So that said we continue with this process we have the trained data which we just created is now a TensorFlow data set type data and so training can now go on smoothly. So you see here we have the metrics we've used to seeing this already compiling and then we fit the model. So here we are not going to include the validation we have that already we're going to do the training and after training our model we would have this so let's simply just have that lunet model that fits and then after this we will have similar to what we have seen already here in this log model method the file which has been saved and then saved to 1db and then the artifact which has been locked. So let's copy this out and then get back here and paste this. Now let's change this name let's say lunet trained. lunet trained that's our file name untrained model.add file it's actually here we have this trained sequential model so let's have this as trained sequential model there we go. We have the trained sequential model.add file file name 1db save file name run the log artifact the trained sequential model. Trained sequential model. Okay so that's it. So we have this set we could now run this method and then see what we get. We obtained this error where we've been told name lunet model is undefined. Now as you had seen previously with this log model method we made use of the lunet model which we had defined right here. So we defined this lunet model using the sequential API and that's what we had and this led us to having this year where we had a run so initially we had this run which produced our untrained model so we have this year untrained model. Let's call it um untrained model and since in our case we're trying to build our trained model it's going to be reasonable for us to make use of this untrained model which we've started already in 1db since we're doing model versioning and so instead of us making use of this lunet model we're going to make use of this version of the untrained model which was thought already in 1db. So from here we're going to run another process which now will lead us to this trained model but then to run this process we will also need data. So as you could see you have the data that's our augmented data from here we have the trained data. We have different other nodes which came before this so we call what we've seen before having the untrained augmented data which is this one here and then this augmented data together with this untrained model will produce our trained model and once we have this trained model we can now use it for prediction and so coming back again if we had to use this lunet model which was defined here in the notebook would have this data set that will have our data which will pass into the run and then we'll have our outputted trained model and then on the other side we'll just have this outputted like we have this run and then we have this outputted untrained model. So we'll have the untrained model and the trained model which will exist like two separate entities in our global graph whereas what we want to do is be able to link this two up and so that's why we're not going to make use of this lunet model which was defined here and this is why we're going to make use of the model which has already been stored in 1db. That said, getting back to the code we're going to copy this out here and then in the place of this lunet model as instead of using the lunet model we've defined here we are now going to make use of our artifact which was stored already. So let's have this here. So let's have that pasted here and then we'll make use of this artifact which is in fact called the untrained model. So you see we have this here. Let's copy this out we copy that out and then we paste this here. Let's paste this here and there we go. Just after downloading the model the next step will be to have this artifact file which is simply our untrained model which we should have downloaded. So in that case we're going to have this here artifacts stored in the artifacts directory artifacts and then in here we'll take this out from here. Let's copy this and paste out here our untrained model and then to be more specific we could get the file name from here. So here you see we have the file name you could get back here overview and then you have files in the net h5. So this is a file we're going to download and here is it here. So we have that. So now we have the artifact file which we want to download we have this path and then now we could make use of TensorFlow's load model method. So here we have now the net model which is equal tf.keras, models and load model which takes in the artifact file there we go. So now we have this lunette model which is not gotten from this definition here but which is gotten from a previous version of our model which has stored in 1db. So let's now run this and see what we get. After training for over 3 epochs here are the results we get and then we get to 1db you see here we have the model trained sequential model and then the untrained model. So let's click on this trained sequential model loading artifacts. There we go we could look at this API metadata which we loaded and the files. So here we have lunette trained and then we also have our graph view. Explode in this view here's what we get. Now let's focus on this part here and as you can see we have this run which produces this untrained model here. We have this untrained model and then this untrained model together with our augmented data set been passed through this run now produce our trained sequential model. And that's it for the section in which we'll see how to implement model version with 1db. Hello everyone and welcome to this new and exciting session. In this session we shall be building a system which permits us automatically detect weather and input which in this case is an image is that of an angry, happy or sad person. So this person may be sad as you can see here. This person may be angry as you can see here. All the person may be happy so in fact we want to be able to have this kind of input pass it into our system which we want to build and then automatically infer that this is a happy person and so to train the system we'll be making use of this data set which has been made available to us by Muhammad Hanan Asgar and since this data set is available on Kegel just anyone could have access to it. We are now going to take a closer look at this data set as you can see here we have this test and train directories you open up the train open up the test you see you have these three different folders angry, happy, sad. Here for the train you have angry, happy, sad too. So what we do is generally when trying to create this kind of data set for classification problems what you want to do is make sure that you put each or you put the different images in separate directories. So all images representing angry people are in this directory, happy in this directory and sad in this directory. Now this way it becomes easy for us to build our model taking this data set as input. Also in the previous sessions we explained the reason why it's very important for us to split our data set in this kind of manner such that while you use part of the data set as a training to actually train or build your model you're going to use the other part to evaluate this model. So although in this case we've just been given this data which is very unlike what you have in real world scenarios where you will be asked or tasked to build your own data set like this one thing or one very important thing you have to note is the fact that if for example you're building this kind of data set where say we want to monitor the usage of an app or better still monitor how the users of an app react to a certain feature then it's important to get a data that reflects or is very similar to what the data or rather what the model will be seen during inference. And so this means that if you have this kind of data or if your model is going to be seeing this kind of data during the inference or when it's going to make use of the model to predict whether there's a happy image or not then training on this kind of data isn't a good idea although this is images or these are images of happy people. So you have to ensure that the data you train on is representative of the data or the kinds of inputs the model will be seen when it's going to be deployed and left to predict or make predictions from these kinds of images. Then if you have noticed not all classes we have here have the same number of files. So here you have a happy 1000, angry 500, sad 757 for the train is 1525 year 3000 sad 2255. So this shows us that when solving your real world problems it may happen that some class or it's easier to gather data from a particular class as compared to the other. So here maybe the other of this dataset found it easier to gather happy images as compared to angry images. Another very important point to note here is the fact that the kind of problem we're trying to solve is that of multi-class classification. So unlike previously where we had a model, we had some data which when passed in we would say whether this is a parasitic or non-parasitic input cell whereas here what our model outputs isn't just either one class or the other but a model which outputs a given class out of several options. So those options could be 3, 4, 5, 6, 7 or even 1000 different possibilities. Now all the different possibilities are termed classes. So in this case we have 3 classes. The first class is that of the angry person, the next class happy person, the other class a sad person. Now to be able to download this data, the very first thing we want to do is create this new API token right here. So you get to your profile, click on account and you'll have this here. Click here and save this Kaggle.json file. So we'll get that saved. Then getting back to the Colab notebook we'll simply copy this in here. Okay and then we should have this here. Now before moving on, it's important to note that you will have to sign up for a Kaggle account. And then the next step will be to install this Kaggle package. So pick install Kaggle, run that cell then we create this directory. So we make this directory and then we copy this JSON file in this directory. So again we run this cell. Then from here we give the user full rights to read and write into this Kaggle file right here. We run the cell and then get back to Kaggle where we are going to copy this API command. Now after copying this API command, we're going to paste this just here. So here we have Kaggle datasets download. So we're going to download this dataset, this human emotions dataset. Let's run this cell. Our dataset has been downloaded. Now we're now left to unzip this downloaded file. Let's click this here. You see we have the emotions dataset.zip. We now unzip this and store in this dataset folder. So you're going to run this cell. You'll see that it's going to create this folder dataset and in this dataset you have the emotions dataset which we saw in Kaggle. This is called the dataset right here. From this point we'll go on to generate a TensorFlow dataset based off the images which are found in the directories. So we're going to make use of this image dataset from directory method which is in the Keras utils package right here. So here we have this description. Here we have these different arguments directory. Now the directory let's copy this. So we see clearly this in the code we copy and paste this out here. This directory is going to be simply one of these directories here. So here first thing we'll do is create our train dataset. So that's our train dataset which is a TensorFlow dataset. We've seen this already. We're going to specify the directory. So let's call this train directory and we'll simply copy this path. So let's copy this path and then paste it out here. Copy that path. Paste that. Now we have our validation directory. We're going to do the same thing. So basically again we're going to copy this here. Copy here and paste. Now here we're going to use this as our train dataset and we're going to use this as our validation dataset. So here's our train. Here's our validation and that's it. So with this let's run this cell. That's it. We get back here. We change this to train directory. Now here the levels are inferred and this means that the levels will be generated from the directory structure. So if you could look at this here this example. Okay you have this main directory and you have the directory structure. That's why at the beginning we made mention of the fact that it's important to maintain this kind of directory structure when dealing with classification problems. So because we have this kind of directory structure where we have a class we have images we have class and then images like we can see here we have this in the train we have the class we have the images. We have this class and this images. This class and this images. We are able to automatically create this kind of dataset. And that's the role of this inferred right here. Then the next one we have is level mode. Now for the level mode we have several modes. Here the default we have the int we have categorical, we have binary unknown. Now let's explain what these different modes mean. For the integer mode we simply meaning that when we design or the way we design our dataset is such that we have an image and we have a level. So let's say we have this image of this person. The person is happy. And we have three different possibilities either the person is angry happy or sad. So here what we're going to have is angry angry we're going to give it a value of zero happy a value of one sad a value of two. So here are the three different possibilities. And so when designing this dataset or when creating this dataset we'll criticize that we have the image and this integer which reflects what that the emotion in that image. So here in our case in the case of happy would simply have an output of one. Now that said either we create our dataset in this way or we use a one heart representation of this levels. So what this means is instead of having a zero we'll create a vector of size three which has three different positions. Where if we have an angry input so if the person is angry we'll put a one on this position and zero zero here. If the person is happy we'll put a zero here a one and a zero. And if the person is sad we'll put a zero here a zero and a one. And the way this encoding works is that where we have a zero you see the position is one for the one heart representation where we have a one happy person we put a one at the first position where we have a sad person who put a two we have a two and so we put a one at the second position and then with this there will be some differences in the designing of the last function which we are going to see in subsequent sessions. Now we have this binary for the binary is like the previous example of the previous project we worked in where we have two classes so where we have two classes we just have the level mode to be binary in our case we may pick out the integer or the categorical then from here we have the class names. Now if you decided to infer directly from the class directory then it's important to make sure that those class names match the subdirectories. So you should ensure that if you're for example you have this class names let's put out the class names here. So here we'll have class names and let's say we want to have angry we have angry, we have happy, we have sad, we have sad. Okay so here's our class names. Now if we don't put this exactly as it is here we should have an error so let's run this. We have class names there. We get back here and instead of having this here we're going to specify our class names so here we have class names there we go. So we have that. The next is the color mode, RGB, the batch size by default is 32 the image size, here the default is 256 by 256 we could always modify this. Shuffling by default is true so by default we're going to shuffle our dataset to we do not need to explicitly do this shuffling so we just have to specify this to be true and that's done. Now for reproducibility we could give a seed such that we always have the same shuffling. Then we have this validation split so here in a case where let's say we have just one directory. Let's say we just have the dataset directory. We do not have this test. We may want to split this directory into the training and validation. So here we could have your 0.2 and automatically we're going to have a split of this data set into the training and validation data sets. Now once you have the split, since you're creating the train data set, you specify that this is training. So training is actually in the documentation here you have your validation split an option of float between 0 and 1 which is logical since it was split in our dataset. And then we specify either training or validation. So let's get back here. So in this case it will be training. The interpolation bilinear, following is false crop to aspect ratio false. So these are the default values. We're not going to use this. Take this off and in this case, or in the case where we would have had validation, we just have your validation. So anyway, we're not going to use this. Let's take this off. That said, by default the validation split is known. So optional float, there's no, we're not going to do any validation. So that said, let's have this off and let's run this and see what our dataset, our training dataset is about. So here we have 6799 files belonging to three classes. Now let's go ahead and modify this. Let's add that. This is that let's run this now and see what we get. You see already we have this error because there's no match and the class name is passed in a match. The names of the subdirectories of the target directory so expected this, but instead received this. So let's get back, modify it here. There we go. That's fine. We run this here and that's fine. Okay, so that's our training data. We could do the same for our validation data. So validation or let's say a vowel and your string. Okay, so your string, your vowel and we have the vowel directory. We have the vowel directory. Let's ensure that here we call this vowel vowel directory. We get back and that's it. Okay, here's inferred, level, mode, int, class name, same class names, RGB, 32, image size, 256. Now let's change this and have configuration and batch size. Okay, so that's our batch size. Here we have this image size. So image size. Take this off, paste that out. We have here image size. Okay, so we have this configuration and then we could create this here. Configuration there we go. We have the batch size which will take a value of 32. We have the image size which will take a value of 256. So we have that. Let's run this. We get back here. This configuration let's copy this here. And that's fine. We have 2,278 files belonging to three different classes for our validation data set. From here we're going to look at our data set. So let's say for image and level in for I in the data set. Let's take one batch. We're going to print out I. So run that. Scroll down see we have the images. We're interested in looking at the levels for now. So let's go down. Scroll up to this. You see the levels? We have two, zero, zero, one. You see the level is between zero and one because we have three different classes. Now let's modify this. Instead of having the level mode to ints, let's know this is ints already. Let's change this to categorical. Categorical and this here categorical. Run that. Run this. And okay. So we'll see what we get. So here instead of having two, here instead of having two, two, what we have is this one hard representation here where everywhere is zero except at the second position. So here instead of having zero, we have everywhere zero except at the zeroed position. So that's it. Now we shall go on to visualize this there. So here we have this figure and then we specify the figure size. So we have here fixed size and we have this 12 by 12. So we have that. And then for images and levels in our train data set we have this in our train data set. You could always change this and put, for example, the validation data set. So we have that. Now train data set we take just one. Then we create a subplot here plt.subplot it's going to be 4 by 4 and i plus 1. So we have that and then we'll do an image show. So we plot out the image basically this image right here and then we select a given image. So we have that and then we divide all the pixels values by 255. So we have that and then next one, next step is to plot out the title. So now we plot out the images then we then go ahead and plot out the different titles. Then here we have the outputs, the levels basically. Select the level so select the particular level just as we selected the image and then we have let's, because the levels will be a 1 heart representation then we have to take the max. Now if you're new to this you could check out the previous sessions where we talk about this kinds of methods here. So we have the argmax of this and we have an axis. Let's specify this axis 0 that's fine. I think here we should also just do the plotting. So we should take the axis sorry. So axis and off. So let's run this. So you see what we get before we have that. Anyway here we have numpy so let's convert this to numpy run that again. You have this output there we go. You see that we have the image and it's class above. Now to convert this into some words let's make use of the class names. So we have class names class names and then here we have this. We run this again. We now get the different images and your levels. At this point our dataset is now ready for training. We just have to include this prefetching here for a more efficient usage of our data. We saw these kinds of prefetching. We explained what prefetching was all about in some previous sessions. So here we have the prefetching. We're not going to include the batching because here already we've included the batch size previously. So in this data loading right here we specify the batch size. So our data is already bashed. Now we have that. We run this solve for the training and then we simply redo this for the validation. So this is our validation data validation and now we're ready for building our model. But before going on to build our model we shall copy out this code here from the previous session and paste it out here. So we have this resize with scale layers which we're going to include in the model. Now recall that we could do this resizing like we have our data set here. So supposing you have your data set. You could do the resizing and reskilling. Here there's a reskilling and there's a resizing to the required size before passing this data into the model. So this could be done before passing into the model such that you could train your model on this resized and reskilled data. Now another method will be to pass in your data here. So let's suppose that we have let's take this off. We suppose that here we have in this middle we have the resizing and reskilling unit. So we resize and reskill before passing into the model. Now instead of doing this what we could do is we could pass the data into the model directly and then carry out the resizing and reskilling in the model as a layer in the model. Now we've seen this already but I'll just explain here again that doing this is great for deployment because when you have to deploy this kind of system you no longer want to resize it again. So here all you do now is just pass in this image and then the model on its own the model which has been deployed takes care of resizing and reskilling. Unlike in this kind of system where when you deploy this model you have to do the resizing and reskilling on your own. So this means that if you deploy this model on some cloud solution and then you're calling it from a JavaScript client for example or say some Android client then you would have to carry out this resizing and reskilling in JavaScript. Unlike here where you just pass in the image and the model takes charge of resizing and reskilling. So that said let's take this off get back to our code here we're going to run this simply and then we start with building our model. Here we have this error resizing not defined so simply we have Keras layers here so we should just as we did here to the resizing so we have resizing there we go run that cell again and then get back to our model. At this point our data set is now ready for training we just have to include this prefetching here for a more efficient usage of our data we saw this kind of prefetching how we explained what prefetching was all about in some previous sessions. So here we have the prefetching we're not going to include the batching because here already we've included the batch size previously. So in this data set loading right here we specify the batch size so our data is already batched. Now we have that we run the cell for the training and then we simply redo this for the validation. So this is our validation data validation and now we're ready for building our model. But before going on to build our model we shall copy out this code here from the previous session and paste it out here. So we have this resizing scale layers which we are going to include in the model. Now recall that we could do this resizing like we have our data set here. So supposing you have your data set you could do the resizing and rescaling here there's a reskilling and there's a resizing to the required size before passing this data into the model. So this could be done before passing into the model such that you could train your model on this resized and rescaled data. Now another method will be to pass in your data here. So let's suppose that we have let's take this off we suppose that here we have in this middle we have the resizing and rescaling unit. So we resize and rescale before passing into the model. Now instead of doing this what we could do is we could pass the data into the model directly and then carry out the resizing and rescaling in the model as a layer in the model. Now we've seen this already but we'll just explain here again that doing this is great for deployment because when you have to deploy this kind of system you no longer want to resize it again. So here all you do now is just pass in this image and then the model on its own this model which has been deployed takes care of resizing and rescaling unlike in this kind of system where when you deploy this model you have to do the resizing and rescaling on your own. So this means that if you deploy this model on some cloud solution and then you're calling it from a JavaScript client for example or say some Android client then you would have to carry out this resizing and rescaling in JavaScript unlike here where you just pass in the image and the model takes charge of resizing and rescaling. So that said let's take this off get back to our code here we're going to run this simply and then we start with building our model. Here we have this error resizing not defined so simply we have Keras layers here so we should just as we did here to the resizing so we have resizing there we go run that cell again and we're now ready for modeling Hi guys welcome to this session on modeling in which we are going to build our own model which permits us pass in these kinds of inputs and it tells us whether the input is that of a person who is sad, angry or happy. In the session we are going to start with the lunette model which we saw in the previous session while modifying some parameters to suit the problem we're trying to solve and then move on to even more complex and better computer vision models. We copy the code from the previous session and paste out here. So there is it we have this lunette model which we saw in the previous example and then we also have this number of classes here which we have to set so that set here we change this number of classes from 1 to 3. Now the reason why we're doing this is simply because in the previous session we're building a lunette model which takes in images and outputs whether these images are those of a parasitic or a known parasitic cell. Now in this example we're taking in images and our model has to decide whether is that of an angry person a sad person or a happy person. So we see that here we have 3 different classes and because of that we're going to change this to from 1 to 3. So we have this we run the cell that's fine we also have this other configuration the learning rate, number of epochs, the dropout rate the regularization rate, number of filters, kernel size number of strides, pool size number of layers or rather outputs in this dense layer and number of outputs in this other dense layer. So with that we're just going to run this cell. Now note that you could always get back to the previous sessions to understand each and every parameter we have here as we've discussed this already. So we just seem to replicate in this Lunet model we've seen previously. Now that cell you see the kind of output we get here. We have 6 million parameters. We have our different layers going from count of layers to our fully connected layers. Now the next point here is this activation right here. So this activation now with the previous case study was the sigmoid and this was because we were actually deciding whether an output would be one class or the other. It was a binary classification problem wherein we could use this kind of sigmoid activation function. With the sigmoid activation function if you can recall we have this input and we have this output. So let's say x gets in and y goes out. And here's 0 the 0.5 here we have this and 1. So what goes on here is as we take in higher values of x the sigmoid function approaches 1 while as we take in high negative values of x the sigmoid function approaches 0 and so the range of values here is simply 0 to 1 and this is logical since in the binary classification kind of setting we wanted to output a 0 for one class or a 1 for another class. Now intermediate values like 0.7 will be taken to be either one of these classes depending on our threshold. But then the role of the sigma here is to make sure that all our values that's based on all the inputs we are able to have an output which always lies between 0 and 1 as we could see here. See that we are always in this range here 0 to 1. Now in a case where we are having a multi-class problem like in this case where we have three different classes like this and that what we don't want is to just say for example here we have an output 0.1 output 0.7 output say 1. So we do not want this kind of outputs right here since we are dealing with a multi-class problem with single level. Since here our output cannot be two of those classes so in some problems or some kind of problems we may have a situation where the person is maybe sad and say angry at the same time. Now in those kinds of situations you could thought to have these kinds of outputs where say this two here, these two classes have high values. Nonetheless in the kind of problem we are trying to solve we want the model to choose or pick only one out of the three different classes. So we are not going to pick two one. We are going to pick only one. And because we are going to pick only one what we will have here or what we will try to do here is to make sure that this output sums up to one. So instead of having let's take this off, instead of having this kind of output we will have an output which sums up to one such that this here, if you have that such that one, the one with the highest class or the one with the highest value is considered as that class which the model has selected. And so we could have different kinds of values we could have say 0.1 here 0.2 but here since we already have 0.1, 0.2 we are going to have 0.7 since we want to ensure that the sum of all these values should give us one. We look at this 0.3 plus 0.7 gives one. So we make sure that our values here lie between 0 and 1 and when we sum them up it gives us one. Now an activation function which we could use in achieving this is the softmax function. So let's have here softmax so instead of the sigmoid, now we have the softmax. And to better understand how the softmax works so you clearly see that difference let's take this example from the analyticsvideo.com website. In this example you're considering that we have three classes in this output. Let's take this off here instead of 10 like what we have now here they have just these three classes right here. Now after applying the softmax we're going to have outputs here. Just like when you have some output after applying the sigmoid you have this other output. So just like for example if you have an output let's say 5 after applying the sigmoid of 5 you have a value like 0.99 whatever so it's a value very close to 1. Whereas if you pass in a value say negative 10 after passing in the sigma you're going to have a value very close to 0 say 0.001 so this is how the sigmoid works. Now for the softmax function is quite different. Here what goes on is we're going to make use of this formula here. So we have EX let's say XI or let's say XC for a particular class divided by the sum of EXC for all the classes. Now this formula shouldn't scare you as we're going to explain how it works in detail here. That said we're having this input just like with the sigmoid where you have the input coming from the output of the dense layer. Now here we have this here we have obviously three inputs because we have two different classes so we have this one this one and this one. Now you should note the values here let's take this off so you can see the value here. So you note the values this one is 2.33, negative 1.46 0.56 Now once we have this we're going to simply apply this formula and the way it's applied is as such. You take this value 2.33 you have the exponential of that value so you have e to the power of 2.33 as you see here of that particular class divided by the sum of all these exponentials for all the different classes. So here you have e to the power of 2.33 there is it divided by e to the power of 2.33 plus here e to the power of negative 1.46 plus e to the power of 0.56. So you have all this summed up. That's basically what this formula here means. For all the different classes sum up e to the power of the values that these classes take up. Now once you have that for this first class, class 1 you obtain this value. Now for the next class you have e to the power of negative 1.46 divided by this same sum here. So basically this sum is the same everywhere but the only difference is the numerator so you have this numerator which changes here we have 0.56 we have this other numerator and we have this value. Notice how as the value goes to us negative, high negative values this value is approaching 0 while when the value is going to us a high positive value it's approaching 1. So this means that if we replace this and put a value like say 10 we would have a value even very close to 1 like say 0.9. So that's it and that's basically how the softmax works. So this means that at every given point when you sum up all these outputs you have a total value equal 1. So you could take this 0.83 plus 0.01 let's say 0.84. Let's work in two decimal places. So 0.84 0.84 plus 0.14 is like 0.98 so you add this other parts up to give you 1. So basically what you're doing with the softmax is you're taking some inputs for let's say 3 classes and then you're sharing the values so you have 1 that you want to share. You have this number 1 which you want to share you're going to give this one a fraction of this whole 1. In this case the fraction is 0.83 and then you're going to give this other one's own fraction, this other one its own fraction. Such that at the end all this here sum up to be equal to 1. From here we're going to get straight into the training and start by designing our loss function. So we have this loss function which is going to be the categorical cross-entropy function. So unlike before where we had the binary, now we have this categorical cross-entropy. We have it right here on the documentation. You have TFK-RAS losses categorical cross-entropy and then one of the arguments is this from logits. Now the from logits as given here, let's get back here. The from logits by default is false. So by default we suppose that we have from logits to be false. And as given here, this from logits says whether the y-pred is what the other y which is outputted by the model. So this is whether the y-pred is expected to be a logits tensor or not. And by default we assume that y-pred encodes a probability distribution. So by default we're supposing that what the model is going to output, in our case we have three classes, what the model is going to output is going to be such that when we sum all these probabilities or when we sum all these outputs here, it's going to give us a value of 1. And so when we have by default that's from logits to be false. So by default we have from logits to be false. When we have this it's simply meaning that we're supposing that what is going to get into this loss function is going to be a logits tensor. And that's our case here because we've had or we have this softmax activation. Now in case we didn't have the softmax activation, then we will have needed to specify this or set this to true such that what gets into the categorical cross entropy is a logits tensor. Now that's set, let's go ahead and test this example here. We have this example, let's take this example copy this and then just paste it out here. So we have this example there we go. Here we have CCE, categorical cross entropy. We're going to print this result. So we print out this result and see what it gives us. We run that and see we have a value of 1.17. Now this value here tells us how close the model's prediction is to the true values of y. Now let's modify our model's prediction such that it's very close to the true values of y. So here we have 0, might be 0.05 okay, 0.95 okay 0, 0, 0, 0.1 0, 0.8. Now we should reduce this to 0.08. So it's very close to 0. Now here instead of 0.1 let's say we have 0.0. Now let's change this to 0.05 and then here we have 0.85. Okay, so all this sums to 1. This too sums to 1. That's it. You see they are very close now. Now with this we run this cell again and what do you notice? You see the value drops by almost a factor of 10. So this shows us that these two are very close to each other. Now let's change this and say 0.0 let's have here 1.0 and here 0, here 0, here 1.0 we run that and what do we get? You see you have a value which is practically 0. And this is because these two values are very close. Now if you change this again, so let's have here 0 and have this 1.0. Let's now make these values completely different from each other. So here we have 1.0 and that. So this means that this year the actual prediction is this position here while this one or what the model predicts is a different position. Now here the actual is this but the model predicts this. So the model fails to predict for each case here. So here let's run this again and see how high this value is. So this shows us how this categorical cross entropy loss actually works. Now to gain even more in depth understanding of how this works let's consider this following formula. So this, the categorical cross entropy loss defined here is simply we have this negative here the sum of y true log y pred. Now we take this example here for this case or let's modify this such that this first prediction here matches this prediction and or rather this second prediction matches this second prediction and this year doesn't match. So this example this doesn't match. So the first example doesn't match while this other example matches. So what the model predicts matches with what the model was meant to predict. Now that said let's get into this formula and see how it works. So we understand why we're getting high values when there is no matching and very low values when there is a match. So here for example we have 0 this y true just use the formula 0 then log of 0.01 Now we have this summation so plus this next example 1 this is y true 1 log of 0.05 you have that plus next one 0 log of 0.96 Now obviously this cancels because we have 0 here till we have 0 this goes away this goes away we left only with this let's take this off so we left only with this year after this computation ok now log of 0.05 gives us negative 1.3 approximately negative 1.3 so we have here approximately negative 1.3 but with this negative here it gives us 1.3 so we have this value of 1.3. Now let's take for example here for this case where there is a matching we have 0 log of 0.1 because we have 0 it's not going to be considered here we have 0 again 0 here it's going to multiply that it's going to be 0. Now this other one we have 1 so this other one we have 1 log of 0.7 1 log of 0.7 what do we have here this gives us negative 0.15 so we have approximately 0.15 so we see that with where there is a matching the loss is reduced and when there is no matching the loss value is increased so with that we just simply take this off since it's our default and that's fine so we have this now but before continuing let's also consider a case where our outputs like let's get back here we're going to run this again get the loss we have so now we're going to consider that this Y true that's our dataset is going to be designed such that instead of having this categorical kind of output we're going to have the integers so here instead of 0.1.0 we're going to simply have 1. Now let's get back to why this is so we have 0.1.0 this is translated that's this for categorical for integer this is translated as 1 because the 1 is at the first position starting from 0.1.2 now in the case where we have 0.0.1 this is translated as 2 since this is at the second position 0.0.1.2 so that's it so we have seen this already in the previous session now getting back here let's modify this instead of having this let's put 1 and instead of having this let's put 2 so we are saying that if your dataset is designed in this way recall we have seen it here just here let's get into dataset loading just here if the liberal mode is int then you should have this kind of design there we go right here okay so we have this kind of design instead so we just have these integers so if you have this let's run this here you see it doesn't work now what we will do in the case where we have this kind of output is we're going to use a sparse categorical so instead of the categorical we have the sparse categorical you see you run that and you have the exact same answer you will have in the case where it was categorical let's get back here run that again so you have the exact same answer so that said we could make use of the sparse categorical it depends on how we created our dataset so here we have sparse categorical and we should just comment this since we're not making use of that let's take this off now and then we move on to our metrics so here we have the metrics the metrics we'll be using here will be the categorical accuracy so we have categorical accuracy let's give it a name let's call that accuracy and then we will also have the top k accuracy there we go we have our top k categorical accuracy we'll give it a name no before giving it a name we need to give it the value of k so we'll give a value of k of 2 and then we'll give it a name top k accuracy now before we move on let's explain this top k categorical accuracy metric right here so unlike with this accuracy with this categorical accuracy where if we have for example this four stations here for example this doesn't blue what the model predicts and those in red are what the model was expected to predict accuracy will be computed as such so we'll start with this first one the highest here is this 0.5 the highest here is this since there is no match we have a 0 plus we move to this next one the highest here is this the highest here is this and the highest is this at this position there is no matching so we have a 0 and that's because the class 0 was picked whereas the expected what should have been class 1 now we get here the class 0 is predicted by the model and the class 0 is also what was expected the actual output so here we have plus a 1 because we've gotten this correctly so this is correct, this is wrong this is wrong now we get here we have the same situation the highest here is 0.7 the highest here is this and the positioning is such that there is a match so we have plus 1 now because we have 4 different examples we divide all this by 4 and multiply by 100 that gives us accuracy of 50% now we get back to this case here for the top k categorical accuracy for this first case the highest class here is this one the second highest class is this other one here so we have this highest class second and we have this third so it's in this order for this one here this is our first we have two of them and then this one this is our first and this is 0.05 so here we have two of them this here is our first and this is our second because we've selected k equal to we're interested in just our first two highest predictions so with the top k categorical accuracy we are not interested in making sure that the highest prediction here matches the highest prediction here what we're interested in here is if any of the two highest matches the highest prediction we expect so if any of the two predictions of the model matches the exact prediction the model should have predicted then we'll give that or we consider that as a correct prediction so in this case here this first two do not match this so this is a wrong prediction and we have a zero plus we move to this next one here the highest here is this and it matches with the second so we have a one so this is considered now a correct prediction unlike previously where we would have considered this to be a wrong prediction now for this one the first two of these three are there so obviously this will be a prediction it also matches here the first matches with this first year so we have that plus one and then this other one year we have the highest prediction here matches with this so we have plus one so here we have three divided by four times a hundred this means that the top two categorical accuracy in this case is equal 75 percent unlike the categorical accuracy which is 50 percent now with that we go ahead and compile the model learn that model let's run this we have item learning rate which we specified in the configuration the loss function here the metrics this and that's it so we compile our model and then we set now to train there we go let's paste this out and here instead of this here we should have configuration number of epochs then obviously we have our training data set and we have our validation data set so let's run this and see what we get go down and that's it see our losses dropping accuracy increasing and top key accuracy which is clearly higher than the accuracy training now complete let's plot out the loss curves for the validation and the training as you can see here both the validation and training losses all drop together while the accuracies for the validation and training also increase up to this point here so it's almost getting to value of one now you could check this out here you see the accuracy 97.8 percent year 98.19 percent so the model is performing quite well we could evaluate this model on our validation data and here we have the lunette model dot evaluate so we call the evaluate method and then we pass in the validation data set okay so we run this and there we go we have a loss of 0.35 accuracy 98.33 percent top key or rather or better still top two accuracy 99.88 percent now note that given that the model isn't over fitting the model keeps or the models metrics keep increasing what you could do here is increase the number of epochs so we could train for more epochs to get even better results now we're ready to test out this model on some image in our testing data set so here we're going to have this image or let's call it test image we have test image which is going to be read which is going to be this image we're going to read using OpenCV library so we have here cv2.imread and then just in here let's open this up and the test let's take up say happy we copy this path here copy this path paste it out here there we go and then we're going to convert this into a tensor let's close this we have here now our test image let's say im is equal tf.constant test image and then we're going to specify the data type so here is float32 and then with that we're just going to pass this so let's print out the output let's first of all print out the shape see the image shape 90 by 90 by 3 pass this into our model directly because as we have designed this we have put in the resizing in the model and the rescaling tool so we're going to resize and then we're going to rescale in the model such that now we do not need to do that out of the model so that's sad let's get back here and all we need to do is call our line model but before calling that we need to add one dimension since we're passing this input in the model as batches so we add the batch dimension here tf.expand dimensions and then we have the image we specify the axis 0 so once we have this now let's have our line model model which takes in that image so here we're going to print out this output or print out what our model gives us we're getting this error here where we're told that there's this incompatibility issues between the input image and what the model expects that said we get back to the model and we noticed that we didn't actually put the resizing so let's get back here and we have this resized rescale layer which we have built already and we put that instead of this reskilling we have this resized rescale so we make sure we resize and we reskill ok now next thing we have to do is we have to modify this here because here by doing this we suppose that the input is going to be 256 by 256 but here what we will have is that we will have this known so our input could be any of any dimension but we are going to do the resizing here so we have the resizing in this resized rescale layer and then we also have the rescaling so that's it let's run this there we go as you could see you have from here we have this 256 by 256 by 3 and that's because we've actually passed this input into our resized rescale layers so let's go ahead and retrain our model training and validation plots for the loss and accuracy and with this we could go ahead and test our image so let's run that and this is what we get as you could see here we have 0.99 and then almost 0 so this shows us clearly that the class 1 in this case because here this is our class 0 I'll take that off this is our class 0 class 1 and class 2 so this image is of class 1 and it's correct because this is a happy image so basically you see how to create this image here, this image array from the file path and then convert this into a tensor which is then passed into the model without any pre-processing another thing we want to do is to actually print out the class so what we can do here is instead of this lieutenant model we'll have tf.argmax so we'll look for the class with the highest probability which in this case is this one we'll look for that class so tf.argmax we'll specify the axis now if you're new to this you can check our previous sessions where we treat these kinds of functions authorically so here we have that let's run this you see we have 0 no this is tf.argmax this is negative 1 or 1 let's say negative 1 is the last axis ok we have that you see it picks out this here and then from here let's convert this to numpy and then from here we use the class names to get its name so there we go, class names we run that we're given this let's see what we obtained before this class names let's take this off and also take this one off we run that ok we have this list so we should take the 0th element there we go now we have that we put in the class names and we get the name so that's it you see we get the name happy ok so with that now let's do one last test let's take sag for example and we see how easy it is to carry out such tests let's pick out an image here let's take this one you could actually view this image here so that's it this one the same image take out this other one ok so let's copy this path take this off take this off scroll up and then simply paste it here so this is the path we're trying to know exactly from the model what kind of image it is so you see here it's a sad image and this is from the test set so make sure you're doing this kind of testing with data the model has never ever seen ok so that's it our model is performing quite well we can do something similar to what we had here but with the difference that instead of just giving out these levels we'll give out not only these levels but also what the model predicts so let's copy this code here get back to our testing there we go and paste it out here ok so again we're not going to use the train but the validation data set so there we go we have our validation data set we have this plot and then at the level of the title we'll plot let's have this true level level right here and then we'll have the predicted level so let's have this move to the next line and then we have predicted level yeah predicted level there we go and that's fine so we have this predicted level and we'll get it from here so we could simply have the net model the net model here we have our net model which takes in the images so it takes in the image selects a particular image and let's do the let's expand them so we have expand dimension take in that image axis equals zero and we close that ok so we have this, we pass in the net model and we have this output now we are going to do something similar again to what we had here basically it's even this here so let's just copy this let's just copy this and replace it right here so it's the same thing we're trying to do all we're trying to do here is actually passing the image into the model and get it's class so we compare with the actual level so here we have this class names and this we have plus that should be fine and that's it with this now let's run this cell and see what we get scroll here is what we get you see the model this is supposed to be true the model here doesn't perform quite well unlike what we had in this evaluation so let's check out our code and see if there's any errors scroll this way here you see we have image so we have that that's why you see the predicted is always sad so we have always predicted level always sad because we had picked just one image and it's not dynamic so here we have those images we select the particular image and we run this again this should be fine we get in this error because we didn't add the batch dimension unlike here we added a batch dimension before passing in here so let's get back and then we're going to replace that images the image with this code here so let's get back here and then replace this with this code note that we've treated this already in previous sessions so you can always check out if you're new to methods like expand deems we run that and see what we get ok so that's it you see happy happy angry happy true level happy predicted happy oh let's see if there's any errors the model does quite well see no errors so it's almost 100% in fact it's 100% although the evaluation here shows 98% 98.33% the next thing we'll do is plot out the confusion matrix so we're going to go through our validation data for EAM level in our validation data set we are going to start those levels in this list right here so we have levels that append the level and then we also have predicted that append the lunette model which takes in the images so here we have what the model predicts and what the model should predict so here we have predicted create this list and then levels we also have this list there we go so we have now this that's it so with this now let's run the cell and then before moving on we'll convert this to numpy format so we could easily manipulate it so we have this as numpy we run this again that's fine then here we could print out the levels we have levels let's print this out there we go you see the levels the different levels now let's try to flatten out all these different levels right here so let's have this to be flattened now let's scroll up see this they're in batches of 32 so we can actually see that here scroll down a bit you see we have this batch here then we have the next batch and so on and so forth but then this output format isn't exactly what we want what we want is the classes the class with the highest core so instead of this one representation we want the integer representation so what we're going to do here is use the Agmax so let's print out the Agmax of this let's get back up let's print out the Agmax of this there we go and then we specify the last axis so we have that and we run that or we get an error or we have this error now let's let's do this let's pick out or let's simply select here up to the last value now this error should be coming in because we have batches of 32 but it happens that if you have a data let's suppose you have a data set of 48 items now if you break this in batches of 32 then you have let's even say 98 so here you have the first batch 32 the next batch 32 the next batch 32 so here you already have 96 elements and then the last batch will be 2 because you have you want to have 98 elements here you have 96 plus 2 given 98 so because of this last batch here we get in this error here so because of that we get the error so what we could do is print right up to this last batch so let's run that so we're not going to pick out the last batch here we're not picking out the last batch so we go from the first batch to right up to the batch before the last one so let's run this now and see that that's fine so this works out well we could even print out say the first two batches let's print out the first two batches so you see what that looks like see the first two batches you have this here and this now from here you could actually flatten so you could do this you flatten this so you get all this in this single or one dimension list so this is what we want to get now let's move on so let's get back to this we have right up to the last batch we run we run this and there we go so this is what we get so we've actually flattened out all these elements now if you print out the length of this here see you print out this length you see you have 6784 elements so now basically we want to compare these different predictions here we want to compare this predictions let's take this length off we want to compare this levels sorry we want to compare all those levels with the models predictions so here we could repeat the same process with the predicted let's have predicted or predicted we run that you see we have these two lists here you see already that they are quite similar although this one misses out so what we're doing is we're having this list and this other list this is for the predicted and this is for the levels so now that we have this set let's or let's redefine our pred let's say pred we will call pred to be this year so this our pred which will flatten out so these are the different predictions by the model and then this is what the model was supposed to predict so here are the levels so pred and levels okay so let's have that we run that cell and then we get back to the code for the confusion matrix which we had previously so basically here we had seen in the previous sessions we defined the threshold because we have in binary classification problem here since we have different classes we wouldn't define that we will just simply pass in as year this different predictions so for the levels and for the predicted we have that let's simply copy this out and paste out here so we've seen this already let's take this off now we have level and then here we have the predictions there we go we're going to print out this confusion matrix we have the figure and that's it so let's run this and see what we get there we go we see already here we have this confusion matrix and one thing you can notice straight away is that the most values or the leading diagonal has those elements with the highest values so it's here we have the highest values here and this is normal actually because when you have this confusion matrix like this let's redraw this confusion matrix when you have this confusion matrix like this this is a class zero so let's say class zero is angry so this is class angry yeah happy and sad now here it's angry happy and sad so whenever the prediction or whatever what the model predicts matches with what it was supposed to predict you have an additional one which is added here so we simply go through all the different model predictions for the validation and you see that 1472 times for 1472 times the model predicted angry when it was actually angry so this is correct now for happy you see 2000 here is matches here 2890 times the model predicts happy and the actual was happy and here we have in 600,198 times model predicts sad when it's actually sad and then here we have 21 times the model predicts angry when it was happy and then 26 times the model predicts angry when it was sad then we have 20 times the model predicts happy when it was angry here we have 27 times the model predicts happy when it was sad. Here we have when he was angry and we have a hundred times the model predict sad when it was instead happy so there is you see this this is the highest score we obtained for the wrong predictions and so this means that the model has that tendency of predicting sad when it's actually happy. You can also observe this plot here you see this lighter colors you see here as we go up we get to this higher values see here and then this and this so the lighter the color the higher the score and then the darker the color the lower the score. Now obviously the ideal case will be where we have this purely white so we would have this purely white this purely white and all this you're completely dark with all zero values so that said we've looked at how to have to obtain the confusion matrix which is an important evaluation metric and here we could also change this to training so anyway you just have to have your training and that will be fine. Now we have to deal with that last batch which we did not take into consideration so what we could do is we could do some concatenation here so let's do concatenate concatenate and then we have that then let's copy this we just copy this and paste out your copy and then paste it out here there we go but then we're taking the last element so instead of the all elements before the last now we're taking the last element and that's it so we add in we add in the last element here so that we could actually flatten it out separately before concatenating it with the previous elements we saw that having all of this joined together will cause an error so that's why we're doing this here again we copied this here let's copy this and then paste it out here that should be fine this for the predicted so let's change this to predicted okay so we should have that fine scroll this way and then let's have concatenate okay so let's run this and see what we get we get in this error let's add this here there we go we run that again that's fine and the same should be for the predicted so let's run this again get that error scroll this way and add this here okay so we have this the last batch which has been added up and then we'll simply copy this and paste here so now instead of having just the all the values before the last we have now all the different batches together so here is for the predict now just for the level so let's put this here we have the levels and then this is for the predicted let's copy this get right to the end and we have that okay so here we have this for the predicted space this year and that should be fine let's run this again run that and then we run this and there we go see we have slightly different answers now okay so that's it we've seen how to plot out the confusion matrix for multi-class classification before we move on we noticed that we've made a very big mistake here as with this validation data set we actually pass in the train data set so let's modify this and make sure you never make this kind of error as if you make this kind of error you feel like your model is performing well whereas we are validating on the train data so that we run this again I have to run this again now training is complete and we see that the model wasn't performing as well as we thought you see here the last drops and then at some point stats increasing while that other training keeps dropping and then for the validation we have its accuracy going to us one as we had previously whereas for validation your it plateaus at around 75% as you could see here in those values so you see the validation accuracy the highest we have goes like 75% although the top key accuracy is about 90% okay so that said we see that we see clearly the models and performing that well and the next sessions will see how to better this model performance here we could also run this evaluation now the evaluation CR 7590 and loss of one let's run this testing correctly classifies this next we're gonna try this out to see previously we had generally 100% because that was on our same train data now here you see let's start from up you see wrong prediction right wrong wrong you have here wrong right right right wrong see you see that out of the 16 different predictions we have 10 out of 16 rights so running that you see you have that 62% on this little sample which you took here now we could also plot the confusion matrix let's run this and you could see clearly from here that the model isn't performing as well as it used to perform when we're making that error hello everyone and welcome to this new session in which we are going to treat data augmentation previously we had seen how to load our data set from this data set directory right here and then we trained a Lynette model which performs very well on the training data but didn't perform as well on the validation date and then we're able to evaluate this model on different evaluation metrics like the accuracy top key accuracy and the confusion matrix in this session we are going to focus on data augmentation that is we're going to see the effect of augmenting our data artificially without actually getting to add an element in this data set right here and then seeing how this affects our model performance in the session we'll see how to augment our data like those data right here and see how this technique of data augmentation helps in making the model even more performant now we are looked at the augmentation previously but if there's one thing you have to note about the documentation is simply the fact that it promotes diversity in your data set and so if you have data like this one here let's open this we consider this original data so this is our original data this is the data we actually gather and then we have this brightness here so we modify this data or this image's brightness and we obtain this other data point and then we modify from this we rotate this and we obtain this other data point you see that is exact same image which has been modified and so now the model doesn't only get used to seeing this image right here but now you could see this one or this one and so data augmentation is this method of this technique for promoting robustness in models hence fighting over fitting as now the model can see different versions of certain data points so let's close this up and get back to the code here let's get back up here we have data augmentation we're gonna simply get back to this year we have data augmentation we had looked at this already in the previous session so you could get back and try to understand exactly how this is carried out previously we have seen that we could carry out data augmentation by using these kinds of TensorFlow image methods like this one you see you could rotate the image you could flip left right you could adjust the saturation or you could use Keras layers you're gonna use Keras layers and then also we saw how to use albumentation library which is this amazing library which permits us carry out data augmentation on different types not only classification very easily so you could check out in that video now that said let's simply copy this out here then we paste it here so we have this we're gonna we have this augment layers we have random rotation random flip random contrast let's actually get back here and this layers and then add the random contrast so we have random contrast okay so we have that and you could feel free to get back to documentation on the TensorFlow Keras layers let's get to Keras layers and then check out those different augmentation strategies here you can see random brightness contrast crop flip height rotation translation random weight random zoom and you could check out the documentations in case you have any doubts so you could just have this year and you see how to use each and every one of those augmentation strategies now getting back to the code we have this three year now what was generally done is you could actually run one and then test this or test how it helps in making the better model perform better and then you could add the other and see whether it helps and so on and so forth so it's kind of like it's not a fixed kind of method where you just have some fixed methods or some fixed right augmentation strategies which we just place in this order sequentially and then it will always work magically generally you will have to test this out different different strategies out and then see which one works well for the data you're working with now we have this set let's run this cell right here we have this is not defined let's let's get back here let me show that this we run this cell let's run this cell normally that should be fine get back and we run this one year and that's fine so we have that and then now at a level of this dataset preparation we're gonna include this we're gonna do the mapping so we have our augment layers not that we're not gonna do this for the validation data set we're gonna only do this for the training data set we have this augment layers and then we're gonna specify number of parallel calls so we have known parallel calls and then this is gotten via auto tune so we have that automatically then before we move on you're gonna copy this out here and then simply paste this here so we create this augment layer which takes this and then has that's it outputs the augmented image and the level you're not gonna have this year so let's just take this off we have just this image okay so that's it we have this augment the layers and then we define this function as augment layer so right here let's take this one off and run this again with this we're now set to train so let's go ahead and retrain our model. Training is not complete and we could see that the model doesn't perform as well as it used to do without augmentation so you see here we did augmentation the model performs even poorer as compared to when there was no augmentation let's run this here so you could see I could compare this with the previous results we got other previous evaluation you see we go from 75% to 54% and then you will go from 90% to 83% to understand the reason why we have this drop in performance let's look at this visualization of our data set we'll see that after carrying out the rotation we have images which are rotated at very unusual angles so you see like this image here is unusual this angle too looks very unusual as compared to the kind of data we would have in our test all validation set so we have to ensure that when carrying out this random operation we limit the angle at which we could carry out this rotation so that said if we have a face like this or let's say we have an input image like this we should limit this rotation such that this image cannot be rotated at say 180 degrees so let's have this here so you see that better we have this this and this so we do not want these kinds of rotations what we want is rotations where we have the face be tilted like this and that's it so we want this kind of rotations but not this type so to solve this problem what we're gonna do is we're gonna get back to the documentation that's random rotation we check out this factor right here we see that the value we put here takes us from negative 20% of 2 pi, pi is 180 degrees, pi is in regions convert this to degrees we have pi which is 180 degrees so we have two times 180 which is 360 so when you say 0.2 or negative 0.2 what you're in fact saying is negative 20% of 360 so when you have let's get back to the code so when you have your let's add this cell here when you have your 0.25 what you're having is 0.25 times 360 run on this you see you're going 90 degrees so this means that if you had an image which was already somehow tilted this image we end up in a very unusual position where the face will look something like this and so what we'll do here is we are gonna limit this rotation so we're gonna we're gonna go from 0.025 for example to this so let's limit that and negative we'll go from negative 0.025 meaning that we have here as 0.025 that will be 90 degrees so limiting this to 90 degrees and then going from negative 0.025 to 0.025 simply means that if you have this axis here and then you have the face like this put them out this then after rotation you can only go 90 degrees in this direction or 90 degrees in this or rather 90 degrees in this direction so you have a limit so you can you can only pick a random value between negative 90 degrees and 90 degrees in this direction so you can only go this way 90 degrees or this way 90 degrees so that said the extreme will have a face like this that's after rotation so we'll go from this blue to this red or you could also get something like this so this what we are gonna get after rotation unlike with the case of 90 degrees where if you have a face which is already let's let's change this let's do this if we have a face which is already say tilted like this you have a face tilted like this after rotating 90 degrees what you have now let's let's have here this 90 degrees what you have now be a face tilted this way and this is in a very usual position when taking an image so or when taking a photo so the image is no validation or test data set wouldn't look like this and so that's why we actually limited this year so let's get back to our code now and we run this let's run this training data and then let's visualize our data set there we go as you can see you do not have images which are upside down as we had before and that's it so we have this now we now go ahead and retrain our model and see what we get after training for over 20 epochs your results we get you see that we go up to 78 percent so those are highest we get and when we run the evaluation you see here we have this when we run the evaluation what we get is 77.8 percent for the accuracy and the top case 91 percent so improved compared to what we had before or what we had before the data augmentation now we are going to use another data augmentation strategy which is a cut mix now the cut mix isn't like this other data augmentation strategies where we just modify a single image with a cut mix as we have seen in the previous sections we actually combine two images so what we're going to do here is we're going to simply copy out this code and then put it out in this code base right here now if you haven't or if you're new to the cut mix data augmentation you could check out the previous sessions where we treat this in detail so let's get here and there we go we're going to apply cut mix and see the effect it has after training our model so we have data augmentation in here let's let's add this cell and have your cut mix augmentation add a code cell and let's base out that part of the code we also paste out this part where we have the train data set 1 and train data set 2 then we create this mixed data set so from here we have augment augments layer stick this off and here we have augment layer so we carry out augmentation for this tool separately and then once we have this two year we're not we're not gonna shuffle on the shuffling already so we just do the mapping we do the mapping and that's what we're saying once we have this two year you're gonna combine this into this one mixed data set so that's it let's have that and then from here now we build our training data set so let's take this year let's take this up there we go data set preparation and then we're now gonna comment this part so we could comment this one and we have a validation data set that's fine everything looks fine and that's okay so we have this set let's run this now for cut mix you could always try to mix up all the cut out augmentation strategies and see how it better or how it ameliorates your model performance so let's have this let's run this there we go this our cut mix here and we combine the two data sets and then once they combined we now apply the cut mix augmentation here we have this inside is not defined let's actually you should have okay this should be our configuration so we should have this configuration run this cell now run that again run the cut mix and we run the cells and now everything should be fine so there we go we have all this fine and then we get to validation run that training data set and validation data sets okay so that's it let's go ahead and retrain our model and see what we get after training for 20 epochs we noticed that unlike previously where the training was went to about 99% while the validation was about 77% year the trainings about 80% and the validation let's scroll this way the validation is about or the highest we have here is 78% so this shows us clearly that the model isn't overfitting because the training and the validation data set both are evolving in a similar manner so let's scroll down here and look at this curve you see here before look at the accuracy plot before what we had was something like this we had the training and here we had one and then the validation was like this but now we have these two curves which are evolving in similar manner though sometimes we have these kinds of peaks anyway clearly our model isn't overfitting so what we'll do is we're gonna train for more epochs so that said let's go ahead and retrain this model for more epochs so there we're gonna modify the number of epochs so here we just have to our work and those in place we just keep training so we train for 20 more epochs so let's run this and then we wait for 20 more epochs train are complete the other results we get as you could see the validation accuracy starts to stagnate around this and then after evaluation we obtain an accuracy of 79.71% but what's interesting to note here is the fact that our training accuracy is still having this value of 86.25% unlike previously when where our model was overfitting we had this accuracy of about 99% so that's it we have that evaluated we see these values and then testing on these values here you see that we have 14 out of 16 images predicted correctly and here's our confusion matrix right here so that clearly this model performs best or performs better than all the other previous models hello everyone and welcome to this new and exciting session in which we are gonna treat tensorflow record tensorflow records helps us build more efficient data pipelines as they help us store data which we are to use to train our models more efficiently and also the help in parallelizing the reading of the data hence helping in speeding up the overall training process and so in the section together with the tensorflow data sets which we've seen already we are going to see how to implement an efficient training pipeline with tensorflow in the previous sections we've been working with tensorflow data sets in the session would see how to convert our tensorflow data set into a tensorflow record and then get this tensorflow record and convert it back to tensorflow data sets to be used for training now the very first question you access office given that we've already carried out the stringing process successfully without any problems why do we need to work with tensorflow records now there are two major problems tensorflow records come to solve or their two advantages of working with tensorflow records the very first one is a fact that you can now store your data more efficiently now notice that every time you have to create this data you're creating this data from this data set which will load a year which is made of little files of like say a few kilobytes let's open this up so you could see for yourself we have this year let's take this one open this folder and then we will see the size of this so you can see the size is about 17 point 26 kilobytes now the fact that we have to do with deal with this kinds of files means that we are not going to always have to load this data very efficiently now it's true working with a tensorflow data sets brings in some efficiency but what if we start this data in a very efficient manner that is instead of storing let's say for example 17 kilobytes files like this what if we just store say 10 megabyte files or say a hundred megabyte files so every time we want to read from a file we don't have to read from many of this kind of files but from a single file like this one then another good thing with having to work with tensorflow records which is stored in this kind of form is that this time around you could carry out the pre-processing before storing the data so you could augment your data so we have augmented data which is now stored as tensorflow records so instead of having the images and then getting here and each and every time you have to carry out the augmentation or some pre-processing before training you store the data which has been augmented already so suppose you have your initial data so you have your initial data here it passes through some pre-processing so it passes through some pre-processing and then let's change this color so here we have the data and then here we have the pre-processing and then from the pre-processing we have the augmented data so after this the data has been augmented what we do is instead of having to go through this each and every time what we do is we pass through it will go through this once and we store our data in this is augmented form such that the next time we want to train our model all we need to do is just make use of this augmented data right here apart from this it should be noted that sometimes we have models or we have sections of a model let's draw it this way we have let's suppose that this is our model here so this is our model is made of section 1 and section 2 the section 1 is fixed meaning that we are not going to train this part but we're going to train only this part this means that when you pass in data right here when you take this data and you pass in here the output you would have here is going to be the same each and every time given that this year is fixed the weights here are fixed so what we could do is instead of storing the data we are going to store this outputs which we could call embeddings so instead of storing the data we store this embeddings here such that we now make use of the embeddings directly and train our model or train this part of the model which is actually trainable on this embeddings instead of working with a data so this time around we see that we have data the pre-processing the augmentation and then we have the embedded or embeddings from this data so here we have the embeddings so we can now store this as tensorflow records so you see that gives you it gives us some kind of flexibility as to what we can store and be able to retrieve the stored data and make use of it as we wish the second major advantage of working with tensorflow records apart from this efficiency in storing data is the fact that they encourage the parallelizing of reading data now this means that if we're having to train our model on several hosts let's say we have four different hosts or say we have four different machines on which we train our data like this and we have our data set right here what we could do is we could create shards of our tensorflow record data so here for example this let's suppose that this are our data set our complete data set we could break this up into several parts so let's say we break this up such that each host takes care of two so we have one two one two one two one let's add another part here so each of this takes care of this so this this host here trains on this data this sort of one trains on this this sort of one trains on this this one trains on this now as a rule of thumb is generally advisable to make sure that each host has about 10 of this packs of 10 to 100 mega bytes of our data set so that said if we have a 10 gigabyte data set like suppose that all this now is 10 gigabyte let's take this off we have 10 gigabyte data set let's write this in here so we have your 10 gigabyte data set what we want to do is have here nice each and every host taking care of at least 10 packs of our tensor flow of our tensor flow record which we've created so let's take this off it's no longer two we have to create 10 packs or 10 charts of our tensor flow records and allocate that to each and every host we have here now that said given that we have four hosts then we will have 4 times 10 charts to create so we'll break this 10 gigabyte data set up into 40 different parts so we have 40 different part packs or charts of our tensor flow data set all right off our tensor flow record right here and then each and every one of them will be approximately 250 megabytes since 10 gigabytes that's if we convert this to megabytes we would have 10,000 suppose that a thousand megabytes equal a gigabyte so we have your 10,000 megabytes divided by 40 which will give us 250 megabytes per chart or per pack that we've created your split of our tensor flow records and so this will lead to some reasonable gains because we now can train our model on this parallelized data and then we could also prefetch huge chunks of our data set to be precise 250 megabytes such that once the model is ready the model just fits on this data which has already been prefetched now you could check out the previous sessions where we talked about prefetching under tensor flow data sets that said what we actually store in this tensor flow records are this protocol buffers and the way tensor flow manages this is by making use of this tensor flow example class which is defined here let's check this out your tensor flow example which is defined here as a standard proto storing data for training and inference and so if you have to convert your data into this into the proto files you would have to make use of this tensor flow example class and then you would have to understand this representation right here and so in our case where we're dealing with an image and its corresponding level let's say we have the image of this person smiling we have the level one so we have this image and the presence corresponding level would have to convert this data into this format before creating our tensor flow records now here we have this dictionary as I said here it contains a key value store example features where each key which is a string maps to a feature now note that here we have this feature with s and here without s so we have this features and each and every one of them is a feature so we could click here you click here and you see this feature you check out the documentation and we have the this content let's take this off the content list can be one of the three types a byte list generally this is our information flood list or an int64 list so you would pick your feature depending on the kind of data you're having now here you have this features which is like a combination of these different features here so you have here the int list you have the float feature the float list the byte list and then here you create the you create the features from these different features here so from this from each and because this is a feature this one year this one year is a feature let's take this off this one year is a feature this is a feature and this is a feature this has been defined already here because the ints feature is of type feature there is it float feature the same there is it bytes feature the same that is it now all this combined forms features which is of type features see the difference you have this and this we doubt the s obviously is a single on this plural so that said if we want to create our tensorflow records I want to convert our data sense tensorflow records we have to take into consideration this formatting of our data so we get back to the code and then we add this two imports here here we have this tensorflow train we import by list float list and in 64 list so these are the types of our feature and then we have from tensorflow training again we import example we employ features and we import feature so with that let's run this cell get back to our code we have your tensorflow records now what we'll start by doing is unbatching our data so we'll start by unbatching our data note that to run this we've taken we've taken off this prefetching as we will not need this we just take our data as it is so we have our train data set which has been augmented you could carry out any pre-processing you want before storing this data and then the validation data remains validation so we have that validator set there then from here we'll run the cells already from here we'll go ahead and unbatch this data there we go we have training data set we run this we have this year then we do the same for the validation so here we have validation data set and validation data set unpatched there we go we have that you could see for yourself training data set training data set this is gonna be unpatched so you have the unbatch data set notice that the batch dimension is taken off could simply have this for the validation there we go validation you run that see that taken off too and then we get back to this documentation recall that before creating the tensorflow records we need to put our data in the sediment format more specifically we need to create proto files and the way we do this is by making use of tensorflow example and then to create this tensorflow example we have this features which we need to combine to create this tensorflow examples and in this documentation here we have all that is needed to create this so here you see for example this example here we could copy this out simply and then we paste this right here so you could see the int the floats the bytes but since we are not gonna use the floats here we could take this off we're gonna use this float feature what we're interested is the byte feature because recall we have the image and we have the level so we have the image of the person happy and then we have the level one so this level will be this int feature and the image will be the bytes feature so let's go ahead and get this done we have here let's put this first so here we have our bytes and then we have our int now we're gonna create an example from here you can see clearly we've imported this already so we just have that even here could have this features there we go here we have bytes list see here we have feature here we have feature here we have int 64 list feature of type in 64 list and that should be it so we have all this here we need to take this off we don't need that so here we have we change this name we put we call this image we'll call this image our images and then we'll call this levels so that is it okay so we have this year back that's fine we have that the bytes does images and we have the levels now once we have this the next thing to do is to put in the correct values in here so instead of having this year we take this off and then here we're gonna create a method which will call create example call this method create example is going to take our image and also the level so it takes in the image and the level and then what it returns is our serialized example so we've had this example which have created and this are serialized example so we pass we call the serialized to string method right here now that's set instead of this year we'll take in our image and then instead of this year we'll take in our level so that's that should be fine we now run this and the next thing we'll do is define the number of charts here we have 10 charts and then the path so you call this tensor flow records let's create this new folder here TF records that's okay TF records that's fine and then we will have the charts with your specific number so here we have the chart or basically we have this name let's have your file name and then here's the extension TF record okay so that's it and then we can run this and then the next thing we want to do is to get back to documentation TF.io and then we get this TF record writer where we are going to see how to write in a TF record file so we have this year here's a simple definition arguments to specify the path and you also use this example so we have here write the records to a file so let's copy this simply and then we paste this year so we have with this TF record writer will specify the path here our path is gonna be this path we have that path and then as file writer we're gonna write our information in that now we want our file to get those different names so here we have path.format and then we'll specify a given chart so a given part of our data so let's get back here recall that when we create our tensorflow record we could create this as a block like this and then we'll later on chart this break it up into different parts so here exactly we have 10 charts so break this up into this 10 different paths 1 2 3 4 5 6 7 8 9 10 okay so there we go we have this 10 different charts and then we want each chart to have a different file name and that's why if you look at this year you see we have this formatting such that we pass in a given chart number in here so we have the chart number so that's it and what we'll do now is for each chart as out of this 10 charts here for a chart number in range number of charts that's more of shots will specify here on more shots 10 so it's basically for short number in 10 so we are gonna go through we're gonna loop through this and we are gonna create a file for each chart so breaking up here so for each and every one of this we're gonna create a file this one a file this a file and so on and so forth so let's take this off there we go now getting back here once we have had this in place or once we've set this up now recall that we had created this create example matter right here and if you notice you find that in this year we actually all this example it's kind of like doing exact same thing with this create example where we have those different features which are created and then at the end we have this serialized to string meta which is called so you could see here those different features see those features and then they all combine to form the example and then serialization so this means that all we need to do here is pass in we have here instead of this we have create example and then we'll be passing in the image and the level now we'll take this off yeah we don't need this physically we don't need that and then we will go for image and level in our data set in our tensorflow data set let's get back up the what we're doing with now is our training data set so for this in our training data set that's it we're gonna write this so we're gonna have to write this image and this level in our file will be created so in this our tensorflow record file now given that we are we have to write a given shard and not just a full data set here would change this to shard sharded data set and then let's have this year we have our sharded data set which is going to be equal you have that our training data set and then we'll shard so you could get to the definition or you could get to the tensorflow data let's scroll this tensorflow data data set here you could have this or you could find the shard method right here let's scroll down and click here see we have let's click on shard shard okay so you see here we have the definition takes the number of shards and then it takes the specific index so here we have 10 shards and then for each index we're going to pass in the value here dynamically so here you see that this creates different packs or parts of our data set so here we have each pack which is going to be created so we specify the number of shards there we go and then we specify the shard number so that's all we need to create a part of our data set so that's it once we have this let's now run this and see what we get get in this arrow for sharded number run that again what do we get this but expected one of time bytes so what we pass in years tensor but we expected a byte now the next question is how do we convert this tensors to bytes so we do a quick go search here convert image to bytes in tensorflow there we go we have decode image but in fact what we'll be using to convert this image into bytes is this encode gpeg right here so we haven't gpeg images so we could use this here using it is quite simple we just simply pass in the image and then we'll consider all this to be default values so here we have encode gpeg and then we're gonna create this encoder method here encode image takes in the image and the level and then what we have here is that we're gonna start by having getting the image we'll say image is gonna be the encoded version so we have tf.io encode gpeg that's it we pass in the image and then we're gonna return the image and the level so once we have this let's add this code so let's run this and then we're gonna create a new encoded data set encoded data set equal this we have training data set there we go we're gonna map encode image that's it so that each and every time we want to pass this into this file writer we're gonna make sure that we have this in the form of bytes so that matches up with this here with this feature which is defined in our create example method right here so we need to match this up so that said let's get back we have encode encoded data set take this off we run we have encoded data set we told that this image has type float 32 that doesn't match expected type of unsigned int 8 so what we need to do here is we need to convert our image first into this unsigned it so we have here now to get a solution we'll get into tf image convert image dtype so that we could convert let's actually convert let's get to see convert image dtype so as we're saying we could actually convert our float 32 to the unsigned int so here all we need to do as you could see here you see you you have the tensor and then you have the dtype specified so that said let's copy this and we have that anyway we are we're going to take the default value for the saturation so let's get back here and then at a level of this or before the encoding we're going to convert that so here we have image image equal that and then we pass in the image here so we have tf dot unsigned int 8 int 8 and close that so that's it then set we can now let's run this and run this again so we could see our encoded data set see that works fine now so we have our encoded data set and then in here instead of having this we will call this encoded data set so we have encoded data set and then we'll run this and see what we get value must be iterable so we must convert this into an iterable data set so to do this we just have your as non-py iterator we have as non-py iterator we run that again we are getting here 255 has type int but expected one of bytes this error is coming from our create example method so let's get back to create example everything looks fine but here we should have a list so let's have this year there we go and run this again and that's it the creation of the different files now complete you could click open year and you see we have all this 10 different charts at this point what we could do is we could save this files in the drive such that we could use it next time for training so let's go ahead and see how we could make use of this for our training process now what we want to do here is to convert this back to a tensorflow data set so we had our tensorflow data set we converted it into a tensorflow record like we see here we converted this into some different tensorflow record files and now want to reconvert this into our tensorflow data set now to get this reconstructed data set we are going to pass the different file names in this TF record data set right here so what we have here is this list made of all these different file names right here so this list in essence let's call it L will be our path which will format and then we'll pass in this variable P for P in range the number of charts for P in range number of charts let's print out this L run that and we get this list which is made of all the different files here so that said what we're gonna have here is we're gonna copy this we're gonna copy this here and then simply replace this list with it so we have this list now let's have this we have this list now and then we run this and get our reconstructed data set but then we need to parse this TF record data set such that we could get our original data which was in the form of the image and the level where the image was an array and the level was some integer and so with that we have here parse single example method which we're gonna make use of which takes an example which is basically what is contained in our reconstructed data set takes the example and permits us to split this into the image and the level so right here we'll have example and then we'll get the image we have the image then we'll make use of the decode gpeg method so previously we encoded the into gpeg or we convert it into bytes now we're gonna convert from bytes back to the unsigned integer so here we have decode gpeg and then we're gonna pass in the example image so we have that image we specify number of channels to be three now we have all the set what we're left to do is specify this feature description right here nonetheless this feature description is basically what we had in this create example and so here we have this dictionary let's have this images levels your images we have this dictionary which is made of the images and the levels and then we have the data types of the images and the levels respectively so we need to pass in this feature description in this parts single example method now we have that always set to return our output so we take example and we have images and then example and we have levels so what we're doing here is we take an input example and then we're breaking it up into the images and to the levels while converting these images from the bytes back to the unsigned int so we have that now let's run the cell and then here we have our parsed our parsed data set there we go we have parsed data set and we have recounts data set there we go we map this method here so we have parsed TF records parse TF records so we have this method let's run this that's fine and then let's see what is contained in our parsed data set so let's take a single value and then print this out from that we get in this arrow for I in this let's run that and there we go as you can see we have our input and then our output level right here now the next thing we could do is specify the batch size so let's have this batch size here we have our batch configuration and then batch size that's fine and then we could do some prefetching so let's carry out the prefetching let's just do auto tune auto tune and then run this again there we go now we should have 32 elements let's look at our batch our parsed data set looks fine you see we have this four different dimensions and then we have output here so that's it we now have our parsed data set are we ready to train so as we've seen already it's important for you to save this in some location say for example in the drive such that the next time when you want to do training all you need to do is start from here so all you need to do is to reconstruct this so you just come and reconstruct this data set you will parse it and then you're good to go with this parsed data set right here so you don't longer have to load all these images from memory and all of that now before we move on you should also note that while encoding the image or while encoding the data that the image and the level what we could also do is take the argmax of the level so if you have a level so let's suppose we have an input image like this and then you have an output level say 0 1 0 instead of taking the considering the levels to be this we could take the position with the highest value so this is 0 1 2 so it tends to 1 now if we have 0 0 1 then after getting the argmax in this case it's going to be 2 because here we have 0 position or first position second position and this is the one with the highest value so this is another way of encoding our levels and that's what we are going to do so we will take this and we run again and create our data set or create our tensorflow records and we are still going to run this cells again so let's rerun this and we have our parse data which you could see here see now it's different from what we had before so now we have the image we have the images and then we have the levels then we could go ahead and run the model we have the loss function defined now the loss function is a sparse categorical cross entropy and this is simply because instead of the one hot notation that is instead of representing our outputs like this for example we convert them into this single integer so like this one for example would be 2 if we had 1 0 0 then this will be 0 if we have 0 1 0 then this will be 1 so we've seen already that when we have this this kinds of outputs then we use this parse categorical cross entropy and so that's it we have the sparse accuracy categorical accuracy and now let's go ahead and start with the training so there we go you can see that training has begun and we training as we usually do so that's it for this section on tensorflow records see you in the next section hello everyone and welcome to this new and exciting session in which we are going to look at other state-of-the-art convolutional neural network based model and talking about state-of-the-art models ten years ago in the image net visual recognition challenged the Alex net convolutional neural network beat all state-of-the-art solutions previously or before this Alex net solution state-of-the-art methods achieved a top error rate of our top 5% error rate of 30 25.3% but with Alex net we drop this error rate to 15.3% now this is 25.2 this breakthrough has led to a widespread adoption of convolutional neural networks in solving recognition tasks like this one and although today we would hardly use this kind of model that's to say the Alex net model we are gonna discuss this model because it was a precursor to most of the modern confidence we have today like the mobile nets dense nets and efficient nets that said we are gonna see what makes or what made the Alex net model so powerful the Alex net model was first published in this paper entitled image net classification with deep convolutional neural networks just from the title you could get some idea that this was one of the first times conv nets were used for this image net challenge this paper was by Alex Krasinski, Ilya Sutsicker and Jeffrey Hinson in the abstract the start by presenting the results as you could see here on test data they achieved top one and top five error rates of 37.5% and 17% which is considerably better than the previous state-of-the-art now this model was a 16 million parameter model composed of conf layers and max pulling layers with some final three fully connected layers as we shall see in the model section shortly now here 1000 ways softmax because we have 1000 classes they also try to reduce overfitting by implementing strategies like dropout and data augmentation that said here to discuss the data set which was used obviously the image net which is a 15 million labeled high-resolution image data set although the finally used roughly 1.2 million training images 50,000 validation and 150,000 for testing then it's also important to note that the images which were used for training were down sampled to 256 by 256 images as for the overall architecture as we could see here we have the conf layers followed by max pulling layers sometimes you see your conf layer max pulling conf layer max pulling then we have several conf layers this max pulling layer and then we have three dense layers you see here we have this dense layer this dense layer and this final dense layer with 1000 way output then another point to note here is given that at a time many times the non-linearity used was the tangent or the sigmoid let's get back to this top year you see here they talk about the non-linearity which was used just here the previously were the mostly use a tangent or the sigmoid as the same year but it turns out that after working with a relu the relu if you can recall we've seen this in the previous section the relu is simply this function which takes in a value x and then if x is negative the value is 0 if x is positive remains the same value so basically we have x f of x f of x here which is our relu function which is 0 if x is less than 0 and it is x if x is greater than or equal to 0 so this is our relu function right here and what the discovered was that the relu permitted them to train your model much faster than this previously used non-linearities like the tangent of x so as you could see here just after a few epochs just after let's say five epochs they attained this training error rate as compared to this other non-linearity here so this is what we get when we use the relu and this is what we get when we use some other non-linearity like the tangent x and it's important to note that till date most conf nets we build make use of this relu non-linearity another thing the deed to speed up the training was to use multiple GPUs then the actual number of GPUs they use here is two and the device a method of communication between these two GPUs to speed up calculation from here the authors make use of this normalization strategy for regularization known as a local response normalization and this normalization strategy was used alongside the relu non-linearity so from here we have some input let's take this schematic from this post by Akhil Anwar where he shows this even more clearly here we have some input and then we normalize it based on its surroundings hence the term local response normalization here he explains that there is inter channel local response normalization and there's intra channel local response normalization here as you can see this is between pixels of a given channel or neurons of a given channel and here this is inter channel so this is carry out between pixels of different channels now that said the exact mathematical formula use here is this one so here we have given neuron and then we divide its value by this summation right here which is a square of some neighboring values and according to the authors terms here you see this sort of response normalization implements a form of lateral inhibition so take note of this and is inspired by the type found in real neurons create and competition for big activities amongst neuron outputs computed using different kernels so this means that if we consider this three neighboring channels from or rather the three neighboring neurons from these three different channels if we take this particular neuron right here and we try to normalize it or pass it through a normalization layer given that it is surrounded by this pixel whose value is relatively high because of this square term right here and this is a square right here meaning that you take this value say 1 divided some by some summation and then you have this alpha obviously you have this K right here let's omit that let's just put it right here we have this alpha and then we have this value squared so obviously this would be a function or basically this will be this value here so when you square this value it means that this overall value here will become very small hence the term lateral inhibition and so for a neuron to maintain a relatively high value after going through this local response normalization layer right here it has to ensure that it has one of the highest values among the surrounding neurons nonetheless this local response normalization as compared to other normalization techniques like the batch normalization layer normalization and the group normalization hasn't proven to be very effective when it comes to regularizing a neural network hence not used by modern conf nets we had seen in the previous sessions that the pulling layers permeate us down sample information from the inputs such that as we go deeper in the neural network we have a reduced number of features now on this paper they make use of this pulling layer more specifically the max pull layer and the way it works is quite straightforward so we're supposing we have this kind of input and then we have a 3 by 3 max pull layer with initially a stride of 1 what we're gonna have here is we have this positions which we are gonna fix here let's get back and then fix some values so let's say we have a value of 1 2 and then all this other values if we want to carry out the max pull operation with a stride of 1 we'll start with this year see and because this max pull we have we're gonna pick the max of all this so we have the highest value yours 11 and so here we're gonna have 11 and then the next thing we'll do is we are gonna shift this so we shift this here this is a stride of one we're gonna go one step to the right and so we have this now it's here notice how we still pick out 3 by 3 pixels let's not take this one off so we've done this shift and now we have this position we take the max here the max here again is going to give us 11 so we have 11 year and then we're gonna do another shift so from here we're gonna take this year let's take this off this other shift and then we still have this the max here again is gonna be 11 so you see at this top we have 11 11 11 and then from here we'll move on to this next one so we will go downward one step downward we'll have this here and the max here is gonna be here's 11 still so we're gonna have 11 year and then we'll move this way this way we'll go downward and all of that so we will move this way I think we should have 11 still the way 11 we go downward still have 11 practically we will have 11 everywhere so we'll have 11 and your 11 so this is going to be our output from this input here after the max pull operation now when we modify this stride number from one to two take this stride number from one to two as it was illustrated in the paper instead of having or instead of moving through one step we move through two steps so if we take this off here you would see that we'll start with this so from the first one we're gonna get 11 so this was tried equal to so the first one we get 11 and then from here instead of moving just one step like previously we had this year we have this year and then we move one step now we're gonna move two steps so we move this way and then again we have 11 and then instead of going one step downward we're gonna go two steps downward and so we'll end up here and then we have a maximum of 11 and then we'll go two steps again this way so let's take this one off take this one off we go two steps again and then we get a maximum of 11 right here so as you could see here when you talk of overlap and pulling they actually use a stride of two just as we have described and then the founders to give them to give an improvement in the results though this improvements aren't very much so in practice we generally use the classical max pulling with S equal one does stride number equal one and we also use two by two kennel size so instead of using three by three kennel sizes as you see here most times or in modern conf nets generally use two by two pulling size now getting back to the general architecture we could see here that this very first conf net has a kennel size of 11 by 11 and although these kinds of kennels permit the network capture much larger spatial context we'll see that they are computationally much more expensive compared to this kennels with smaller filter size and as we'll see in subsequent sections the confidence developed after this didn't use these kinds of large kennel sizes as they were able to make use of these kinds of smaller filters to still capture this large spatial context the 11 by 11 filters capture then to overcome overfitting the others make use of data augmentation and the dropout technique so you could check out on our previous sessions where we talked about this two different techniques now that said you would see here the training details and then one very interesting advantage of working with the 11 by 11 kennel size filters is the fact that we could have visualizations like this so because those kennel sizes are large enough we could visualize them in this manner and then clearly from here we see how our conf layer captures these kinds of low level features like here we have a slanted line here we have yeah many slanted lines we have this vertical line we have this horizontal lines right here and then we have this this checkerboard pattern we have this colors sometimes dwell sometimes single color and so we'll see how this first count layers permit us capture low level features in this session they discuss this record breaking results as you could see we have this top one error rate which for now or at that time was about 45.7 percent and then with this curve net model this was dropped to 37.5 percent then for the top five we have dropped from 25.7 to 17 percent now they also developed this other variant which comes with an even better top five error rate of 15.3 percent and so as you could see here we move from this previous method does a shift plus FVs which had 26.2 top five percent error rate to the CNN which has 15.3 percent error rate then in the session on qualitative results we see the different inputs the correct levels and then what the model predicts on the top five best predictions see the model does well here does well here correct correct prediction yours wrong see it predicts convertible when it's actually a grill here it does this wrongly here it's also wrong but unlike here this level doesn't even occur among the top five best predictions and so that's it for this breakthrough model we're gonna look at other confidence models in the next sections hello everyone and welcome to this new and exciting session in which we are going to discuss the VGG model VGG actually stands for visual geometry group and this was presented in the paper by Karen Simone and Andrew Zimmerman entitled very deep convolutional networks for large-scale image recognition in the session we are going to discuss different methods which the authors of the VGG paper used to drop the top one validation error rate from 38.1 to 23.7 where this 38.1 was achieved by the breakthrough conf net model which is the Alex net model now in the previous session in which we treated the Alex net model and we saw the power of working with conf nets and solving recognition tasks one thing we could notice already or very clearly from this model is that it's quite shallow and so Simone and Andrew Zimmerman go even deeper with the VGG model in this paper the authors investigate the effect of the conf net of the convolutional network depth on its accuracy notice words depth and accuracy so unlike the Alex net where the depth is relatively small and is actually a shallow network here the authors use a deeper conv convolutional neural network and make use of smaller convolution filters now recall with Alex net from the very first layer we already had 11 by 11 filters and we argue that this helped in capturing large spatial dependencies now we'll explain how is possible for us to make use of the smaller and more economical convolutional filters while still capturing large spatial dependencies like the bigger five by five and eleven by eleven filters will do to better understand why it's better to work with two three by three comp layers as compared to working with a single five by five comp layer let's consider the following examples so here we have this three examples let's start with this first one here so let's have this we start with this part here and this one if you notice is five by five so here you have kennel size of five input size of ten and no padding dilation of one and stride of one so here we have this year and you can see we have this output which is six by six so you have the six by six output and the way each and every pixel on the output is garden is quite simple you have the kennel right here let's have this kennel we have this kennel which has been passed on a particular patch on the input and then this produces the output so simply take this kennel values multiplied by these values add them up and then get this output now this is the case of five by five so the receptive field Spaniard is quite great see we could get we could capture this information in this patch in the input right here now for the number of parameters you could simply count this we have five times five which is 25 parameters here we have 25 parameters and then for the learning capabilities this one is quite limited here here we have 25 and here we'll say this is quite limited because if we suppose that we have an input let's say we have this input which is passed through a single conf layer and then obviously the conf layer ends with a non-linearity in our case the relu non-linearity so let's say we have the non-linearity line right here and then we get the output now this doesn't capture as much complex information as we would be able to capture if we had two conf layers stacked now here when you stack this first one this is a case of three by three and this is a case of five by five so here after this first three by three we then have this other three by three with this other non-linearity so here we are able to capture much more complex information from our input information of my input data as compared to when we just have one single conf layer so that's why here let's take this off here we the learning capabilities isn't as much as this two three by three now let's get to the receptive field span for the two three by three to understand this we are going to take this example right here so take note that here we have input size of 10 so we're gonna maintain this we have input size 10 okay now the candle size is three unlike here where we have five padding dilation and stride number the same now notice that the output we have here is eight by eight unlike here where six by six now since we are having two three by three conf layers we're gonna get this output just like the output we get here is this so let's draw let's put this here we have this input the here is it and then we get this output which is gonna be input of another three by three so this is the first three by three and then this is the next three by three so this will be the input of another three by three comp layer and then we should produce the output so what we're trying to show here is when we have this output here which is the input of this next layer so this is this two here are combined as one this forms one so this year this eight by eight now is an input so here instead of having ten we have eight so we take this input size of eight there we go same candle size patterns dilation stride the same and what we'll want you to notice let's take this off what one you notice here is the fact that this output is six by six and so this means that this year as if we follow this let's take the mouse if we follow you see this year captures the same information as you would as this one year would capture and so we could confidently see that the recessive field spannier is quite great now for the number of parameters here we have nine and here we have nine nine plus nine is 18 so 18 you see clearly that this model now is cheaper as compared to a model which uses a five by five or even eleven by eleven now for the learning capabilities we've seen this already because we stack two we able to capture much more complex information from the inputs and so this is great so in all we see that is better to use this conf layers with smaller kennel sizes we want to thank Edward young for providing this convolution visualizer which you can find on as young that github.io so at this point we've understood why the authors of the paper prefer to work with this smaller kennel size convolutional layers that's three by three and with a smaller conf layers they were able to push the depth to 16 that's between 16 and 19 layers where the 16 layer version is VGG 16 and then the 19 layer version is VGG 19 in this table we have the summary of this models focus on the 16 and 19 weights layer models so yet we have 16 weight layers you see we start with two conf layers and then max pull and then to conf layers and the max pull and in three couple layers max pull three comf layers max pull and then three conf layers then from Europe we have the max pull we have a flattened layer and then we have this three fully connected layers which end up with a softmax since we are dealing with a multi-class classification problem. The authors also noted that the usage of the RNN normalization as a local response normalization which we saw in the AlexNet did not improve performance but instead led to increased memory consumption and computation time. In this section to describe the training process then from your testing and we could get some results. Here we see we have this top one validation error and we also notice that we have this ConvNet layer or this ConvNet model here A. Now the ABCDE you could get them by checking on this table here you have A with the RNN normalization B, C, D and E right here. So basically these are the different models and your other results. So we are working with the local response normalization we notice that this seems to have even the highest error. So that's why the authors did not make use of this normalization technique. Now we get the best results with the VGG19 so this year VGG19 we get 25.5 for top one and then 8 for top five. Now if you are new to this notion of top one valid error and top five valid error you could check out the previous section where we discussed the top one accuracy and top five accuracy. Here again now we have this comparison with the state of the art solutions at the time you see here AlexNet, Overfit, Inception, MSRA, Clarify, this model by Clarify AI and the Zeller and Fegris model. So here we see that the VGG at the time had the best results and with this we can conclude that stacking up those Conv layers with smaller kennel size actually helps in getting better results and in subsequent sections we'll see the limit of just stacking up many Conv layers as with the VGG. Hello everyone and welcome to this new and exciting session in which we are going to discuss the ResNet model. This model was first introduced in this paper entitled Deep Residual Learning for Image Recognition by Kaimin Hayao and up to date that's about seven years later this model is still greatly used and the high performances gotten when working with the ResNet model come due to the fact that the ResNet model relies on this residual block right here which permits us get even better error rates as we could see here as compared to the VGG and Google Net models. In this section we are going to focus on understanding how this residual block works and how the ResNet model is constructed based off this model. This curve right here depicts the limit of models like the VGG which are just based on stacking up Conv layers. To best understand this plot, recall that with the AlexNet we had fewer number of layers so we started out with AlexNet fewer number of layers and then we moved on to VGG where we had the VGG 16 version then the VGG 19 version and then we expect that if we keep increasing this number of layers then the error rates should be dropping but what happens actually is the opposite. So what goes on here is you see for this 20 layer network we have a lower error rate as compared to this 56 layer network. So we expect that this instead should be lower than this because this we've stacked more Conv layers but that's instead the opposite. Now this same phenomenon is witnessed with a test set. So here there's a test and there's a train. So for the test tool we have the 20 layer performing better than the 56 layer. And so it's clear that just blindly stacking up Conv layers wouldn't help in making this or in dropping the training on test errors even though they are more expensive. And this is why the ResNet model introduces this residual learning which is based off the residual block which we've seen already here is the residual block. Now note that this weight layer here or this weight layers here are simply convolutional layers. And so now unlike before where we will just stack this here let's call this WL for weight layer stack this weight layer and then we'll stack this other one and then we'll keep stacking up that way just keep stacking up like this. Now what we'll do is we create a connection between the input and the output. So we create this connection and then here there's some addition. So we get this output and we add it with this other output right here to produce now this new output. So here if we suppose that this input is x and then what goes on in here is f of x. So let's change this to red what goes on in here is f of x that's this f of x here we have this f of x then our output can be given as h of x which is simply equal f of x plus x that's the input plus its output. Now to produce this new output h of x. But again to better understand why we need this residual block right here. We need to first understand why models which are based on just stacking up the comp layers like the VGG actually under fit even when you add up or when you increase the number of layers. Now the reason for that is exploding and varnishing gradients. So let's explain what it means for gradients to be vanishing. Now recall that in the gradient descent process we have a weight and this weight the way this weight is updated is such that we take its previous value. So we have the previous value of the weight minus the learning rates here let's call it our LR really denoted alpha minus this learning rate times the partial derivative of the loss respect to that given weight. Now during the training process in order to compute this partial derivative right here very efficiently the method used is back propagation or one of the most common methods used is that of back propagation. Now the way back propagation works is that you have say this model let's call this model M and then you have some input right here with the output obviously you have what the model outputs and let's let's call the model output y cap and then y that's not from model we have y right here which is what the model is expected to output. Now is this difference that produces the loss and well finding that partial derivative of this difference with respect to each and every weight which makes up this model. Then if we split this up into different layers let's just say we have different layers like this note that each layer is composed of several weights but let's let's say we just split this up like this now the layers this layer has its own weights but one point to note is that during the back propagation process to obtain the partial derivative of the loss we respect to this weight here we make use of the partial derivative of the loss with respect to weights which come after the layer or which come after this layer right here. So we to get this year we will need this different partial derivatives to get let's let's get back let's get back to get this for example to get the partial derivative with respect to this weights here we will need this others right here now the problem is if we have this year and and also before before going to explain the problem with this is that we need to understand that to get for example let's take this one to get for example the partial derivative of the loss we respect to this different weights there are many weights in this layer let's say weight for this layer is actually equal some values let's say alpha 1 times whatever value times this partial derivative of the loss we respect to this weights here so let's consider that this is the sit layer 1 2 3 4 5 6 then L6 here we have L7 layer 7 so because we're multiplying here it means that if while getting this partial derivative we obtain a value very close to 0 if we get a value very close to 0 say 0.00001 for example it means that it's going to affect this other partial derivative in the sense that this two will be a very small value and if this partial derivatives are too small then we will not get a change in this weight because you have the the new way you're trying to get being equal the previous weight minus a very small value here and so there will be no there will be little or no changes in the weights and that is why even though you keep increasing this number of layers let's take this off even though we keep increasing this number of layers we cannot achieve better performances due to this vanishing gradient problem as the model is now finally difficult to update its weights such that the training error can be decreased since the gradients right here are vanishing that is getting towards 0 so now we've just seen that making our network deeper or increasing the number of layers makes it difficult to propagate information from one far end to the other end and so what the authors suppose is that if the added layers can be constructed as identity mappings a deeper model should have training error no greater than its shallower counterpart so this means that you have this model let's suppose this model and then we this is the shallower model and then this is the deeper model right here and we're saying that if we construct this deeper model such that it's identical to the shallower model so basically it's the same as this here the changes in blue so we have this same shallower model and then the remaining layers here this other layers here are constructed such that this is the identity function or a group of identity functions which are stacked together then the training error of this one although deeper shouldn't be greater than this training error right here or the training error of this shallower model and so this means that if we want to pass information from this point to this point right here and that this weights or the values in your dampen this information then there is this path right here which permits us simply copy this input information to the output and obviously this is simply the identity function and so here if after passing through say 20 layers and we get to this point so let's consider each and every one of this is a single layer so we've gone through 20 layers and then we've gotten to this point where the values we get here are almost zero such that when this information passes is going to be also or practically zero then there is this path which at least restores this exact same input we have here and so this means that just as the authors of the papers supposed in this example we took here if we make our model or neural network deeper by adding this residual blocks then there will be no increase in the error rate and in practice this instead leads to a decrease in the error rate which is exactly what we want and one other argument which accounts for the fact that this residual blocks help in improving the performance of the model is the fact that since we have several paths this residual model now looks like a combination of several shallow model so it looks like you're combining different let's let's draw it better let's get back it looks like we're combining this shallow model here with this other shallow model with this other shallow model and then producing what we call an ensemble of models or an ensemble of shallow models to be more precise which help in making the overall model much more performant as compared to when we just have a single path. So another way you could look at this is that for this let's take this first shallow model you could have information which passes this way and then gets here and then goes this way so that's the first model and then another time you could have information which goes this way goes this way goes this way and you have this other model you could have an illustration where the model goes this way it goes this way and then goes just straight and giving us this other model later in this paper entitled visualizing the lost landscape of neural nets Lee et al produced this visualization here which shows a resnet without skip connections and a resnet with skip connections and this shows how easy this or how easier it is for the weights to find their way or to get the optimal weights which minimize the loss as compared to when there are no skip connections here it should also be noted that this addition we have here is an element wise addition and so we have to ensure that the dimensions of this input should match the dimensions of the output we have here for this operation to be valid now in the cases where on the case where this two aren't equal then we need to do some modifications in this skip connection right here now when we look at this three models compared we have this VGG 19 we have this 34 layer plane convolutional network then we have this 34 layer residual network we find that we have this skip connections that's our residual block so this is our residual block right here we stack this residual blocks now instead of just stacking the conf layers as we do with the VGGs now we stack the residual blocks and then sometimes you could see here sometimes this line is solid sometimes it is dotted now when is dotted like this it's simply because there's going to be a change in the dimension so if you notice here you find that every time we have this dotted lines you'll find that there's a change here the number of channels so here you have 64 channels and then here you have 128 and so since we're getting this 64 channel input and we want to match this with 128 output we're getting here then we need to do some adjustments here now as we've seen here or as we can see in the paper here there are actually two ways of making these adjustments the first way is this A the next one is this B for the A the shortcuts still perform identity mapping with extra zero entries padded for increasing the dimensions so to get from 64 to 128 we add this extra zero entries then either we do that or we take the B that's the projection shortcut which was presented in equation two is used to match the dimensions but this is actually done using one by one convolutions now to better understand how the one by one convolutions work let's take this example where we have this input size 10 by 10 cannot size now since it's one by one then the cannot size is one so this is we have just this one weight right here and then you see just goes through each and every pixel value so here we have this and then notice that the input is the same shape as the output and so if for example you have this input made of two channels let's add a second channel let's suppose this input has two channels so we have this one channel and this other channel right here then to obtain an output of say four channels so we have this input two channels so it's ten by ten by two two channels and if we want to have this output to get to four channels all we need to do now is just make use of the one by one convolution and then we add the we have four of this different weights or this fall for the different kernels since obviously one by one if we're three by three then we'll have something like this would have three of the four of this now is one by one we have is here four of this and then in that case this one will give this output then this other one will give this other channel here this other one let's change the color this other one will give another channel which we'll just add here then this other one here stick that to be green will produce this other channel so that's how we can move from these two channels to four channels and so in the case of the resnets where we want the inputs we get some inputs let's say 64 channel input and we want that at the end we want that this input what we have here should match with this output which is already 128 oh yeah 128 year then we need or we could add this one by one convolutions with a certain number of filters such that we could get this desired number of channels here so here now we could have 128 one by one filters and then this now will match up with this output right here such that we could carry out the element wise addition and now the difference with the vgg and other previous conf nets is that instead of making use of the max pool as you will have here you see this pooling layer with pool size two what we do here is we use a three by three convolutional layer but we use strights so we specify the stride number of two and this permits us to down sample this feature maps right here again we could check this out here let's suppose we have this three and if we're to do or if we want to down sample this what we could do is increase number of strides let's take that to two and you see this is going to be down sampled and if we take the padding to one you find that the output is half of what we have here then the authors also make use of augmentation and then batch normalization where they apply this batch normalization layer right after each convolution and before the relu activation now batch normalization is this technique for accelerating deep neural network training by reducing internal covariate shift to better understand batch normalization we'll start by explaining the notion of covariate shift to better understand the notion of covariate shift let's suppose that we're trying to build a model which classifies or which says whether an input image like this one or say this one is not of a car so it's not a car or it is not a car now if you're building this kind of system and then you start by or you create batches of these kinds of toy cars and you pass through the model and model learns how to see this and know that it's a car and see some other image and knows that that's not a car then later on when you take a car from this other distribution and you pass into our system it becomes difficult for the weights of this model to adapt to this change in distribution though the inputs are all cars and to visualize this let's consider this plot right here what we're gonna have is uh someone like this so we'll have this uh year for car and then this for not car and then we'll build this classifier or this model which distinguishes a car and an image which is not a car by say this function for example now when you bring some other distribution like say this distribution you would have uh something like this you see you have something like this here it's other um distribution and not car let's say not cars about this and then you need to have something like this to separate the cars from images which are not cars and this then makes it difficult for us to have a function which separates uh images which are cars from those which are not cars when those images come from this two different distribution now this shift here is known as the covariate shift and so that's why most times before passing the image into the model what we do is we normalize this input so let's suppose we have an input x we generally carry out some normalization in order to account for this covariate shift so now after normalization what we're gonna have is that all those images be from this distribution or this other distribution right here will now have been normalized to reduce the effect of the shift and so now we could have our single or could have this uh uh function which separates the cars from the non cars and with much more ease now that said what if this kind of covariate shift instead happens in the hidden layers that's those layers which make up the model right here so let's suppose that we have some confidence like this stacked with the activation functions and then we have this weights that's these parameters which make up or which are part of the layer now coming from different distributions then in this case we have an internal covariate shift and to remedy this situation we now make use of the batch normalization and the algorithm for the batch normalization is described in the paper so here we have a mini batch and we obtain its mean so here we try to obtain the average value of the different weights then we also obtain the standard deviation which is sigma and the variance which is sigma square so basically we obtain the mean and we obtain the variance and it's this that we make use now to normalize our data so now you take every weight you subtract by the mean that's here which is calculated here and then you divide by this standard deviation and then we add this small epsilon to avoid having a very small number or zero at the denominator so that's it this is how the normalization or the batch normalization process goes on and it should be noted that there are other normalization techniques like the layer and group normalization which are kind of similar to this but different in a sense that with a batch normalization this mean is calculated over a given mini batch so like here this is the mean of values calculated in the mini batch and the standard deviation from that mini batch now after getting this new value of x x shuffle what we now do is we multiply it by gamma and add better now this gamma and better are actually trainable parameters so when working with batch normalization in say TensorFlow or PyTorch you'll notice that the batch norm layer will also have its parameters now here the role of this gamma and this better is to scale and shift and these parameters are learned along with the original model parameters and restore the representation power of the network and so when we set gamma to be the square root of the variance and beater to be the to be the expectance or the mean of x then we could recover the original activations if that were the optimal thing to do so essentially what they're saying here is if it's instead optimal for us not to use the batch normalization then we could adapt the value of gamma and better such that we get this original value of extra here and the way this can be done is quite simple all we need to do here is multiply this x by let's say gamma square plus epsilon so we have gamma square plus epsilon and then once we've multiplied this you see once you take this here I multiply by this you set this to like this do this this cancels out with this and you're left with xi minus the mean now when you're left with xi minus the mean if beater is equal to mean then you will have xi minus the mean plus beater which in this case is the mean and you see it cancels out and gives you xi which is this original value of x so if we set our beater to be this our gamma to be this and our beater to be the mean then in that case we retrieve our original value of x and so that's why you see these two parameters are trainable such that we get the best values or the most optimal values for gamma and beater then there's also this initialization that's the model or the network is trained from scratch stochastic gradient descent is used with a mini batch size of 256 linearity starts from 0.1 and is divided by 10 when the error plateaus so basically uh when we get to the points where let's have this when we get to a point where the arrow starts to arrow starts to plateau then at that point we could update the learning rate from 0.1 to 0.01 and then if it plateaus it if it drops you see any plateaus let's go this way it drops and plateaus again then we carry out this same computation now that said is for over 60 times 10 to the 4 iterations weight decay and momentum are used and there is no use of dropout so again in testing year we have different scales which are used and average so we pass the image at these different scales and then the average value or the average scores are recorded now before we move forward to check out some results it's important to note here that after this last comp layer we do not carry out flattening instead what is done here is average pooling in order to better understand how the global average pooling works let's consider this example from peltarion.com so right here we're supposing that we have this as the output of the final comp layer then what we do is instead of just flattening that's just picking all those values and flattening them out and then passing to a fully connected layer what we are going to do is depending on the number of neurons we want the next fully connected layer we are going to create a certain number of channels so here for example we have this depth here the number of channels here is three we have the height and we have the width and then since we're dealing with we're actually doing global average pooling then for each and every one of these channels for this channel this channel and this channel we're going to get the average value so you have the average value is this eight here the average value is this three here the average value is this five and so if you want a thousand of this then you should have a depth of a thousand right here and now with this you see that it looks quite similar to the flattened layer as now you have all these different values or these different single values which cannot be passed into a fully connected layer here now it should be noted that in certain tasks like in classification which we are trying to do this global average pooling will be great as the position of the pixels don't really matter now this simply means that here since you get this average and here you get this average you get this average it means that this position or this pixel wouldn't be close to this other pixel as in the case of the flattening but since like in our case we're interested in saying whether a person is angry happy or sad the positions of this output values right here won't really matter as much as would would have mattered in the case where we're dealing with an object detection or say object counting problem where the particular position of the person or of whatever we are trying to detect actually counts so to better explain this again let's consider these two examples example one example two right here for classification problems all we're interested in detecting that this person is happy so whether we have a face this way or this way the position doesn't really matter all we're interested in is in knowing whether this person is happy angry or sad now for object detection the exact position of the person matters and so the exact position of this neurons here will matter and so employing global average pooling for such tasks isn't a great idea so in summary if you have a task where the position doesn't matter that much then you could use a global average pooling if not then your advice to use the flattened layer from here you could see the different variants of the resnet you see here we have the 18 layer resnet as a resnet 18 resnet 34 resnet 50 resnet 101 resnet 152 and here we could see that with the plain networks that's without the res residual block you find this 18 layer performing better than the 34 layer but once we have the resnet block you find the 34 layer performing now better than the 18 layer meaning that we could now go deeper we also have this table right here which shows the vgg model the google the net the prlu unit and then the different plain networks and the residual networks it shows clearly here that the resnet 150 performs best regardless of the fact that it is deeper than the 101 and 50 counterparts and before we move on also note that here this resnet block as you can see here is composed of these two conf layers for the 34 layer while from this 50 to 150 layers the resnet blocks are composed of three layers or three conf layers as you could see right here hello everyone and welcome to the session in which we are going to implement the code for resnet 34 in tensorflow 2 now here is a resnet 34 model right here we have other variants like the 50 101 and 152 after going through the section you'll be able to implement this other variants right here and also you'll be able to get results like this where we can see clearly an improvement in the accuracy of our model we are going to construct our resnet 34 model while making use of model subclassing so you could check out in the previous section so you better understand how model subclassing is implemented in tensorflow 2 now that said here we have this resnet 34 model right here and then the first layer we have is our convolutional layer which has seven by seven filters and there are 64 in number also we know that the stride here number of strides equal two so you could get all this from the paper we have information from the paper seven by seven 64 stride two and then followed by a three by three max pool with a stride of two so we get back here we have this three by three stride of two now from this we then get into this residual blocks so we'll get back to the paper you see we have three of this residual blocks now that's it's actually this because this is a 34 layer we're implementing right here meaning that if you are implementing a 50 layer resnet then you would have one by one three by three and one by one but for the 34 version we have three by three and three by three now these are repeated tries so that's why you will notice here we have this repeated three times and each one of them is our residual block now our residual block here our residual block is this block right here so we are going to later on implement implement this residual block here let's get back to the code we have simply our residual block you see the parameters here number of filters 64 64 and year 64 just exactly as we have it in the paper you see for each block we have number of filters to be 64 now if we want to get into our into this residual block we could or we would see exactly how it's implemented i would see that we're gonna have this two count layers one three by three and another three by three but for now let's do it this way let's just consider that that has been implemented now another reason why i want to implement this is this way is because now if you want to convert this to a resnet 50 all we need to do now is just to update the code for the residual block since what makes this different year is just this residual blocks right here so that said let's get back again here we have now this four resnet blocks so we have this four residual blocks actually here you see you have oh this is four and it's similar like here you have the three by three three by three and then here the number of channels equals 128 so you notice here that we have 128 and we have four of them now because uh here we're living from or we're modifying the number of channels we need to take into consideration this number of strides we have right here as this permits us to down sample our features now getting back to the code you have this year oh we here we have the down sampling and then we move on 128 128 128 again here we have down sampling now we go to 256 exactly as is in the paper oh we look at this directly from here here we have 256 and we have a certain number of them which are aligned we could see that in the summary year six of them check this out here you see we have six of this aligned and then again we have 512 three of them as is in the paper you see here 512 and that's it now from here we have the global average pooling 2d global average pooling and then we have this final fully connected layer which with an activation which is softmax as we've seen previously now that said uh we just in this our call method we just simply gonna call all those different layers which we just created by passing in the input so here we have our input x which goes through each and every layer and we get the output right here now that said we're going to move on to uh looking at this residual block right here so we're going to implement this residual block and now this is basically our residual block let's increase this so we could see that clearly this is basically our residual block right here so we could take one of this let's let's take this one for example we have this residual block right here and then we'll get back to the code see it here now this is our residual block it's a layer unlike the full model here we have this residual layer and then you would see that we have this um dotted brilliant right here which is true when the number of strides is driven from one so let's uh run this here let's let's have this we have dotted and then let's specify number of strides say equal one we let's print out dotted after this so there we go we have your dotted we run this take this off see it's false now when we set this to true turns to true so basically that's what dotted here does and if you get back you will notice that we let's get back to this we have this we selected this part but since this is this isn't a dotted uh link you see this link is a full line so here our dotted variable will be false and when we get to this our dotted variable will be true now let's get back to the code you see we have let's take this off you see here we have um this which we understand already and I will get back here then after we have our two convolutional layers now we'll define this custom come to d which again we are going to break up uh subsequently so let's just understand that we have this custom come to d which is presented by this uh year so when we have this this is uh it right here and this is the other one right here now you'll notice that the number of channels has been passed here and the number of strides has also been passed and that's exactly what was done here so you see we passed in the number of channels and number of strides now since by default our number of strides is equal one it means when we don't pass we simply say number of stride equal one but in cases where we have these transitions we have number of strides equal two and so our value here is going to be changed now that said we see we define this uh conv layer which has number of channels the kind of size three as in the paper number of strides and then we have the padding same so we ensure that the height and width of our input features remain unchanged now that said also notice that we have this number of strides here for the second which is equal one and getting back to the paper that's simply because even when we're getting or even when we have these transitions here where we're getting from 64 to 128 and that we're also doing a max pooling we have this stride or rather that we have in the striding not max pooling we have this stride value change for only one of the conv layers and not the two so you see here only one another two and so that's why right here you see only this one which actually changes for this other one it remains fixed always one now that said we have the activation layer and then uh if it's dotted then we're gonna have this link right here so if it's dotted we're gonna have uh let's draw this here we're gonna have uh one by one conv layer uh one by one oh it's actually here so we're gonna have our one let's take this one off you're gonna have our one by one conv layer just right here to ensure that this two number of channels match that's the number of channels we get as input here and as output actually match up so um that's the role of this we've seen this already now we get back to the code we see here let's take this off we see here this here and then we see the number of the the kernel size here is one unlike here where we have a kernel size of three and then we also specify the number of channels to ensure that it matches up with uh what we expect now the number of strides here is gotten from this so if it's two you're gonna have two if it's one you're gonna have one from here we have uh this set and then we can go ahead and do the calling so again please check out on the previous sessions where we treat models of class and so you understand exactly what's going on so here we have uh the input see the input it gets into the first conv layer then gets to the second conv layer that's the output from the first gets as input to the second and then we get this output and now let's suppose that uh we have a normal layer let's say we have in this one let's get back to this let's suppose that we are working with this one right here in that case then the input will be added to the output directly so here you see we have this add layer right here this add layer then so flow takes the output which is this and adds this to the input and we now get x add goes through the activation the relu and that's our output now in the case where we have this the case where we have this one by one convolutional layer let's specify this in the case where we are at this position then you'll see that we will take the input and modify it before passing it to the output or before adding it with the output so you see here we have this output there we go it remains here and then we modify this or we modify this input sorry so we take this input there there is it here let's take all this off and try to redraw it so we have this we have our red block then we have our output we have the addition which is going to be right here addition and then we are going to have our one by one conf layer right here which comes and adds up with this now this one by one convolution is exactly what's going on right here and that's what we define here since kennel size one now that said you see we add this up and then we get our output x add so if we are having dot it we have that else we go through the normal path and that's basically why you see here we specify this sometimes and we don't know the times now we have understood how this works let's go ahead to look at the custom conf 2d layer now the custom conf 2d layer is basically made of a usual conf 2d layer and with a batch norm remember the resnet model the resnet paper makes use of the batch normalization layer so basically here instead of writing batch norm batch norm every time in our code what we just want to do is combine this two and then we have our batch normalization with our conf 2d layer together so that's it we now run the cell there we go we run the cell we again run the cell and then we can define our resnet 34 which is our resnet model which we've just seen resnet 34 there we go we have the resnet 34 summary let's run this we get in this error we need to build our model so what we're going to do here is quite simple we will now take this resnet 34 and then we'll call this resnet 34 so we will pass some inputs into this our resnet 34 model so here we'll suppose tf0s and then we have 1 by 256 by 256 by 3 so we have this kind of input we run that and we see that now we have our model so our model summary so this our model summary 21 million parameters and that's it now we go to the training but this time around we're going to include some checkpointing so we got this from our previous session where we treated checkpointing the model checkpointing so here we're going to ensure that as we train we save our best model weights so that said here we have this checkpoint callback again you could check back on our previous sessions where we treat these callbacks so here we have this callback which will permit us total weights for our best or our best performing model our best performing weights actually so here we have a monitor we want to monitor the validation accuracy so let's take this off enter validation accuracy save best only true so that's it let's run this our last function but before we move on let's get back to the section where we have this parameter training right here now we have this custom conf2d model let's delete the cells we have this custom conf2d model which we've seen already and then we have this batch norm layer but it should be noted that with the batch norm layer we have to specify whether we are in training mode or in inference or testing mode now the reason why we are doing this is because the parameters of our batch norm layer will react differently or behave differently in these two different modes this means that during training this layer will normalize the inputs with the mean and variance of the current batch of inputs now we've seen this already and then when we're not training that's when training equals false we're in inference mode the layer will normalize this input using the mean and variance of its moving statistics learned during training so this simply means that we have this layer right here with some parameters let's call the parameters say p let's call the parameters p so you have some parameters right here and then during training our layer updates these parameters but then during inference we do not want to update these parameters as they were learned during training and so we have to specify or we have to pass in the training or set the training to false when we are not training the model or when we are evaluating or testing the model now what this simply means is that here we'll have this to training so here you see we've passed in training and then here we have training so by default we could set this training to true so by default in training mode and that's what we have now let's run the cell and then get into our residual block for residual block here we have training again so we have training and then since we're calling this we would have training there we go we have training and that's it we have training let's run this we get back to our complete network and here we're gonna have this training so paste that out here and that should be fine so now when we are not in training mode we could specify the training parameters such that the batch norm layer or the batch norm layers parameters aren't modified so we have that and that's it okay so we have that set now let's run this and this time around we're going to set this training so let's set training to be true we have that true let's make sure that this was passed in here so let's have this training and the default is true okay we run that we run this and that looks fine now we could set this to be false or we don't we don't put any value it means it's true so we could set this to be false it's training to be false and that's it so there we go we have this set now let's get back to our last function so we were at this point we have our metrics we run the metrics we compile the model but this time around we use a higher learning rate so one thing you could also do is as described in the paper decrease this learning rate as soon as the model starts plateauing so we could start with a learning rate in the paper it should be 0.1 although here we're going to start with 0.01 so here the say you you have this learning rate and then when it starts plateauing you drop to 0.01 and then you go with that and then it starts plateauing you drop and so on and so forth so this is what they proposed in the paper and you could always implement this and we've seen this in some previous section where we implemented this kind of callback which permitted us to schedule our learning rate so that's it let's take this off get back to the code and we have we've run this already so we could start with the training now yeah we're going to train for 60 epochs and we're going to include the callbacks so let's have this callbacks and we have our model checkpoint callback which we have defined here let's copy this there we go copy this and we have it here so that's it we can now run this cell now we've launched this training and we've had this error here saving the model to the HDF5 format requires a model to be a functional model or sequential model it does not work for subclass models like in our case because such models are defined we had a body of a python method which isn't safely serializable hence consider saving to the tensorflow saved format by setting save format to tf or using save weights so now we're going to save these models in the tensorflow format all we need to do here is specify this folder and that should be fine the mode is max now since we want to store the weights which have the highest validation accuracy that's fine let's now run this again training now complete and we achieve an accuracy or the best accuracy of 83.6 we have our accuracy plot right here and then we could evaluate the model here we get 82.3 percent and 94.6 percent for the top k accuracy now what if we load our best model because the model we have in here is the latest model the very last one now let's load our best model and we evaluate this let's add this code cell then we go ahead and load our best weights here we have resnet 34 dot load weights then here we have our best weights so this should be a string since our folder there's a photo where we store the weights resnet we run this that's fine and then we evaluate our model see here we get in this accuracy of 83.6 and then top k accuracy of 95.6 percent now we go on to test this that's fine and here are some results we get happy angry sad happy yeah we miss we miss one we miss this one oh that's two three and that's it so we miss three meaning we have let's add the cell we have 13 out of 16 angles correct 81.2 percent correct okay so that's it we've gone from 79 percent to 83 percent by modifying or changing our model now let's plot out this confusion matrix and see what we get there we go your results which are much better than what we have had so far hello everyone and welcome to this new and amazing section in which we are going to treat the mobile net architecture this was first developed by google researchers in 2017 and we had the mobile net version 1 and 2019 the again developed the mobile net version 2 and after this there was a mobile net version 3 but yeah we are just going to focus on this mobile net version 2 entitled inverted residuals and linear bottlenecks by just looking at the title we could guess the type of environments for which this model was built for in fact the mobile nets have been built for environments with low compute resources like the mobile and edge devices so in this section we are going to focus on what permits this model that's the mobile net v2 to perform quite well in terms of speed while producing high quality results there are two major techniques which make the mobile net version 2 very powerful or which permits them or which permits us work at higher speeds while still maintaining reasonable quality results now these two are the depth wise separable convolutions and the inverted residual bottleneck which we have right here here's the separable convolutions this is a regular convolution your separable convolution and we'll start by explaining what a depth separable convolution is a depth separable convolution is simply a combination of a depth wise convolution and a point wise convolution now this point wise convolution is not in different from a normal convolution layer but with countout size of one so one by one convolution here that's a point wise convolution and before this we have the depth wise convolution now to understand like this now the depth separable convolution which is this two put together in sequence or sequentially now to understand what this depth wise convolution is actually let's get back to this demo where we saw how usual convolution operation works as you could see here we have this input basically let's have this let's toggle the movement so here we have some input now this input is three-dimensional as you see here we have one two three so zero one two we have these three dimensions let's change the color so it's clear for you to see we have this first dimension we have the second dimension and then we have this other third dimension now what goes on during a convolution operation is that we have this uh kennel so we have this here and then we have also some kennel so in this case three by three kennel uh we have this on a kennel here three by three and the reason why it is three by three is because we have three channels here in the input so because the inputs are three channeled or have three channels we have a three channel kennel now here we could add this we could add a foot channel right here add this foot channel and then we will also have here four of these kennels and then what happens is going uh here during the convolution operation is exactly what we're seeing here so we have this one which uh is placed at a given position let's pick its color it's placed at this position for example and then we multiply all the values like you could let's uh say we add this point here you see at this point we can see how this filter in this case this filter so this filter is actually we match this filter with this one we match this one with this in red we match this one uh with this in green uh now since you're three obviously we don't have four now uh let's get back to this operation where we take this one and multiply with all the values we have here so when we multiply all the values we have here we get for example in this case one by zero plus zero times zero plus one times zero you see all this at the top are canceled then we have zero times zero zero times one canceled we have one times one here so we have one and then we have one times zero zero negative one times one negative one so we have negative one and then negative one times two we have negative two so this gives us a value of negative two and then we move to the next one this one let's change the color you see we have one times all this at the top is zero so when we multiply all this by because it's just simply matching so when we have one times zero we have zero zero times zero zero one times zero zero one times zero zero negative one times two negative two then negative one times zero we have zero negative one times zero zero negative one times two negative two so this gives us negative four at this point we have negative four we move to the next one here we have negative four all this is zero obviously so we just get to this one one times 1, we have 1, 0, negative 1, and then 0. So we have 1, let's write that, we have 1, and minus 1, because this is negative 1 times 1, minus 1, it gives us 0. So here we have negative 6, when you add all this up, basically we're taking this, we multiply, and then we add, here we take this, multiply, we add, take this, multiply, we add, and we add all these values, and this gives us here negative 6. Now because we have this bias of 1, we add plus 1, it gives us negative 5. That's how we obtain this value here. That's how we obtain this 1 right here. Now, we'll repeat the same process for all these filters, all the different positions. So basically, let's toggle this, so you see what happens. You see, we repeat this process, we move, we move, we go down, and that's it. There we go. So we repeat this to the end until we have this final value right here. Then if we want to have, you see that we get this, so what we have is, let's take this off, let's take this orange off. What we have is, now we have this output, so we have the input which is of the channel, which has number of channels 3, here we also have this number of channels 3, and then we have an output which has number of channels 1. But if we want the number of channels to be equal, say 2, like in this case, for the output we have 1 channel and this other channel, then we need to increase the number of filters we have here. So basically, here we have how many filters? We could create this again, so we will have now 2. If we want to have output 2, then we will need to have 2 of this, of this 3 dimensional filter. So we will have this again, we will have this with its own weights, obviously, and we will have this, and that's it. So because we have this 2 now, we will no longer have an output with 1 channel, but now an output with 2 channels, as you could see here. Now notice how, as we take this, we move to this next. So this year is this one year, let's pause this. So this year is this one right here, and this one is this one right here. Now with this year, we are able to get this other channel for the output. And so that's how the convolution operation works for a normal convolution. Now if we get to the depth-wise convolution, we will find that it will be different from this method. For the depth-wise convolution, as the name goes, these computations are done depth-wise. So first of all, now here, this output will only be gotten from interactions of this channel and this channel right here. So what goes on here is, we take this, and then we pass it around as we usually do, and then we obtain this new output right here. So if this is 3 by 3, then we obtain some values 1, 2, 3, 4, 5, 6, 7, 8, 9. So these values will be different from this one because the way we compute these values is different from the way these values were computed. The way we computed this value was, we take this year, pass it here, take this, pass this, take this, pass this here, and then add all our resulting sums to obtain this value of negative 5, as we saw. But in this case, what we'll obtain for this first value wouldn't be this 5. What we'll obtain here will be 1 times 0, 0 times 0, plus 1 times 0, 0, 0, 1 times 1, we will have 1, and then here we'll have 0, here 1 times 1, negative 1, negative 2, negative 1, negative 2, it gives us negative 2. So what we'll obtain here will be negative 2. And then we'll move on to the next, we'll move on to the next, we'll go to this next position, we'll get some value as we've seen here, and up to this last value right here. So the way we got this was, we took all this for different channels, and then we added them up to get this by year. We just get this directly by taking for each and every channel and just producing the output like this. So this means that with this we are going to have, like here, let's take this off, with this, because we are having each channel for the filters producing its own output, we're going to have this three producing three different outputs. So here already we have three outputs, unlike here where we had two outputs. And the two outputs here were controlled by the number of canals we used, because here we used two canals, we have two outputs. But here we, this doesn't matter. Here the number of output or the number of input channels we have here will dictate the number of outputs. So here we have three channels. So we just obviously have these three different outputs. Now since we have these three outputs and we want to be able to control the number of output channels, what we'll do now after the depthwise convolution, which is in fact what we've just explained here is we're going to add now the one by one convolution as a point wise convolution after adding the one by one convolution, we are going to specify the number of channels here and this number of channels of this one by one convolution that will permit us leave from a certain number of channels, like in this case three to another number of channel or to a given number of channels, let's say two. So after getting these three channels or getting this output with three channels, we now get, we pass to this point wise convolution and now we're going to get just two channels. To better understand this, let's take this depthwise convolution image from paper suite code. So here, let's try to reduce this. Anyway, let's have this. You see we have one, two, three, you see this one, two, three, this is a five by five canal. And then we pass this, notice how each and every one is now responsible for its own output. See this, see we have this orange with this for this particular channel gives us an output, the red gives us an output, the yellow gives us an output. Unlike previously where we'll take this pass this year and then add all this together. But now instead what we do is we just simply carry out that addition at the level of the channel and then we get this output right here. Now let's see why the depthwise or the depth separable convolution is more efficient. And we'll do this by calculating the number of filters. So here we find the number of filters we need to get from this input, which is one, two, three, four, five, six, seven. So we have this seven by seven by three inputs, which want to convert to this three by three by two output. And here the number of filters or number of parameters we used can be calculated as here we have nine times three. Obviously in this nine times three comes from the fact that we have for each of this, we have nine and times three. The three is from here actually, because we have three input channels, then we will have three filter channels. And so we're going to have the filter size, which is three by three. I think we should change this color. So it makes it clearer. Let's change this. So here we have seven by seven by three, the three is for this number of channels. Here we have three by three. And then this is three by three for a single one, then times three, this number of weights. But now because we want to have, let's change this color again here, because we want to have an output with two channels, then we would multiply this again by the number of output channels too. So this is like some general formula to get a number of parameters. We're going to omit the biases. So here we have three by three by 327 times 254 parameters. Now this means that if we have, if we want to have number of channels of say 16, then this will give us 432. Now let's consider that we're dealing with depth wise convolution and the point wise convolution, which would form the depth separable convolution. With a depth wise, what we'll have, first of all, we'll have to note that this is no more needed, we just need this. So what we'll have will be three by three by three. Now this three comes from the number of inputs by now to get the number of outputs, we wouldn't carry out any multiplication here because basically the output from the depth wise convolution is a three by three by three tensor. In this case, or we could just say it's a three channel output. Since we have three channel inputs, we'll have three channel output. So here to obtain the outputs, we just need this, we saw this already, we just take this multiplier, get the output, get this first one, we take this multiplier, get this next one, take this multiplier, get this next one, and we're good to go. So once we have this, we now add the weights for the point wise convolution. Now for the point wise, it's a one by one canal. So you see it's quite cheap as compared to the three by three canals. So one by one, here we have one by one. So let's just put this here, we have one by one. Now times the inputs, just like with the usual convolution, it's the same thing actually, because here for the usual convolution, here we have to calculate the number of weights, just get the canal size, then like one of the canal size times the canal size times the number of input channels times number of output channels. So here we have the canal size times the canal size times the number of input channels, which in this case is three, and the number of output channels, which is two. So that's what we have. Now if we multiply this, we have 27 plus six that gives us something like 33. You see that this gives us 33. Now one interesting point to note here is if we modify this and say we want to have 16 output channel, then we will change this to and put 16. In that case, our answer will be 27, here we have 27 plus 48. Now 27 plus 48, we get that quickly. 5, 7, that gives us 75. So you see that here we have increased this number of channels from 2 to 16, number of weights 75. But when we did that year, number of weights went to 432. So clearly, the depth wise or the depth separable convolution is one that is way cheaper than the normal convolution. And in the paper, the authors argue that this kinds of convolution permits us to reduce the computational cost by eight to nine times than that of a standard convolution while we have only a small reduction in the accuracy. That said, from this diagram here, you should now understand why when representing a regular convolution, the authors have this filter, you see the filter here, which has this depth into the input, you see this depth right here, you see this. Whereas when representing the separable convolution, we have this filter which doesn't have any depth. And that's because here we have no inter-channel computation or calculations as compared to this one. And then after we have this point wise. Now the point wise is a regular convolution. So we see we have this depth gain. But now it's smaller in size because it's just a one by one filter. The next improvement we should look at is this inverted residual block right here. So it's first of all called inverted in comparison to the residual block. Now the residual block, as you may notice, you have this large, relatively large channel, and then it becomes small in the middle, and then it becomes large in the output. So this means that we have some input, we have some input, and then we have residual block and then we have some output. Then we have obviously this link right here from the input to the output. Now with this, you see this becomes the data or the inputs big, it goes to small and then big. But here what we have is we pass in a relatively small channel, small number of channels on input with relatively small number of channels. So here is small and then in our block, it becomes the number of channels increased. And then as output number of channels reduced. Hence the term inverted residual block. Now in addition to the fact that we're using depth wise convolutions, instead of the normal convolutions, the fact that we have relatively lower dimensional data getting into this block and lower dimensional data getting out means that we could transport very low dimensional data throughout our mobile network. So here we have low dimensional data getting in, low dimensional data getting out, and then inside we have this expansion layer right here. As this expansion layer permits us capture as much information as possible from our input features. One thing to also notice the fact that we're using a relu 6. The relu 6 is different from the usual relu in the sense that with a relu, we have for all x less than zero, the value is zero. For all x greater than zero, the value is x. So we have this y equals x line right here. But with the relu 6, as from the value 6, we actually clip this output. So what we have here is all values. Let's get back. What we have here is we have the values, it remains x, but once we get to 6, it gets clipped. So for all values of x greater than 6, the value remains at 6. So that's a relu 6. And then one of the important points is the fact that because we're carrying out this projection from high dimensional data to low dimensional data, this relu non-linearity generally will cause us to lose too much information. And because of that, there is no relu activation in the final layer here. You can see this in the summary right here. You see, we have the input. Then we have the expansion factor, which is T. This means that now we have this hyper parameter, which we can tune. So if we want better results, we can tune this expansion factor so that it permits us to get this better results. So here we have this expansion factor T. It's here, which expands the morph channel. So we get in with K and then we now move to TK. And then we have this TK, which now takes us to K prime. Now also note that here we have the relu 6, relu 6, but here we have no activation. Now that said, this is a summary of our mobile net version 2. So here you have mobile net version 2. You have the different bottlenecks. There we go. And then we have this conf to D average pool and then conf to D. This figure right here also shows us how the mobile net version 2 outperforms the mobile net D1 shuffle net and the NAS net. You see that with a mobile net version 2 right here. If you pick this, let's pick, for example, this two, you see the number of operations or the competition cost here is almost similar, but we see this great difference in accuracy where the mobile net V2 outperforms the mobile net V1. Then apart from classification, the mobile net V2 has been using other tasks like object detection, semantic segmentation and other computer vision tasks where we have low compute resource. Hello everyone and welcome to the session in which we treat this modern convolutional neural network architecture known as the efficient nets. In this efficient net paper, the authors proposed a more controlled manner of designing convolutional neural networks such that it suits our demands in accuracy and speed. And as you can see in those plots, you see that we could choose suitable parameters such that we could modify or increase our accuracy while taking note of how this affects the speed. That said, in this section, we'll see how Ming Xing Tang and Kwok Lei built the system for automatically scaling our convolutional neural networks much more efficiently. Conf nets are commonly developed at a fixed resource budget and then scaled up for better accuracy if more resources are available. So with the case of the ResNet, we had ResNet 34. Then after we had ResNet 50, ResNet 152. And depending on the kind of setting, we are going to pick this ResNet model which will permit us to run without any problems of latency while maintaining reasonable accuracy. So this means that if we are working in a high compute environment, then we could afford to work with this. Whereas if we are working in a low compute environment, then we would have to work with this model with fewer conv layers. Now that said, in this paper, the authors propose a more systematic study of how this model scaling can be done. And unlike other methods where we just scale by increasing the depth, here the proposal scaling by increasing the depth, increasing the width, the number of channels and the resolution. That is the size of the input image. And so here the proposal new scaling method that uniformly scales all dimensions of the depth with resolution using a simple yet highly effective compound coefficient. You can see the results right here. You see, for example, the ResNet 50. Let's extrapolate. Let's pick this here, although this has more parameters on the ResNet 50. Let's take instead the B4 because it has less parameters. So you see, it has fewer parameters than the ResNet 50. But its accuracy, that's the top one accuracy on the image net, is much greater than that of this ResNet 50 here. So we have the EfficientNet B4 version, which is about 83% to 1% accuracy, while this is only at about 76% to 1% accuracy. Now in this figure, we see how we have this baseline, some sort of baseline, like in the case of the ResNet, we could say this is ResNet 18. And then we have this deeper model depth scaling. This could be ResNet, say 50. Now, in this case, they have this baseline. First of all, this baseline is gotten by carrying out an automatic network architecture search. So we get this baseline and the different layers we have for this baseline. And then note that this baseline has a depth, as you can see, it has a depth. And when we scale deeper, when we carry out depth scaling, you see, we have much more layers added to this one. And then when we do width scaling, we increase the number of channels. So you see, we have this smaller channels for the baseline. And then the width scaling permits us to increase this number of channels. Then also, we have the resolution scaling, which has to do with the inputs. So there we have this input height times width. And now after currently resolution scaling, we see we increase this resolution. This means that we may work with a base of 224 by 224. And then after scaling, we may get to say 640 by 640. Then from here, we also have the compound scaling, which is what is used in this paper, where we don't only focus on the width or the depth or the resolution, but we scale all this systematically to achieve the best possible results while maintaining reasonable speeds. That said, we could see from these different plots that when you increase, like here we have the width, there's number of channels, which is increased. You notice that as we increase this number of channels, at some point it starts to plateau. And then when we increase the depth at some point, it starts to plateau. Then when we also increase this input size as a resolution, at some point it starts to plateau. And so this is why the authors proposed a technique where we could combine all this such that we get even better results. And there we go. We see the effect of compound scaling. You see that we have this D, depth, and then our resolution. You see when the depth is one, our resolution is one, we at this blue here, see we have worst results here. Whereas when we double this depth and then increase the resolution by 1.3, you see we have this best results right here. That said, we'll now dive a bit more deeper and look at this compound coefficient, which they spoke of at the very beginning. So we go down here and we have this formula right here. See this formula right here. All right at this equation, which is equation three, where we have this depth, we have these different formulas, these three formulas. The depth equals alpha times phi. And now these phis are user specified coefficient that controls how many more resources are available for model scaling. So this is some sort of scaling coefficient right here. So it's phi, phi, phi. And then here we have alpha, beta, and gamma. Now this is designed such that alpha, beta squared, gamma squared, is approximately equal to 2. And alpha is greater than or equal to 1, beta is always greater than or equal to 1, and gamma always greater than or equal to 1. So now we are going to carry out a grid search. So we're going to search for the best values for this alpha, beta, and gamma, and then fix them. Obviously, they are constants, so we're going to fix this. And then now start varying phi such that we carry out the scaling in a more systematic manner. And there we go to carry out or to find the values for alpha, beta, and gamma, the fixed phi to be equal to 1. And then they obtain alpha 1.2, beta 1.1, and gamma 1.15, all of this such that we have this constraint. Now they didn't fix alpha, beta, and gamma as constants and scale up the baseline network with the different phi as we already explained. And it's based on these different values of phi that we obtain the different versions of the efficient net going from B1 to B7. Now before moving on, it's important to take note of this efficient net B0, which is our baseline network. Remember, we have some baseline network, which we have seen here. Let's go this way. We have this baseline network right here, this one, this baseline network, which we are going to scale such that we have better results while working with compute constraints. Now that said, let's take this off and then scroll down back to this baseline model, which is giving just here in this table. You see we have this baseline model, efficient net B0. And then you'll notice first that the resolution is 224 by 224, meaning that we're going to start with image sizes of 224 by 224. But note that different image sizes could be used for the different models, although the best or the most adapted resolution for each model should be preferably used. Now that said, here you see we have a usual conf layer and then we have this MB conf right here. Now before getting to the MB conf, also note that after carrying out the neural architecture search, the authors notice that we could also make use of this 5x5 kennel or 5x5 kennel size filters. So unlike what we had discussed in previous sessions, this 5x5 kennel size filters are still very useful. Then getting to the MB conf we find here, here to say its main building block is this mobile inverted bottleneck. So recall the mobile inverted bottleneck, which we found in Sandlai R. Sandlai R is the MobileNet version 2 paper we had seen already, to which they also add the squeeze and excitation optimization. Now in the MobileNet version 3, the squeeze and excitation optimization was added. So here we have basically the MobileNet inverted residual block, which we had seen already. And then if we check out in this MobileNet v3 paper, which you can feel free to look at, you would have this squeeze and excitation right here. Let's zoom into this. See here we have the MobileNet version 2 with bottleneck residual, this residual, then our bottleneck as usual. Here we have this low dimension input getting in and then it gets expanded and then we have this low dimension output, which is produced in this final layer right here. Now with this squeeze and excitation, to better understand this squeeze and excitation layer, we should or we could get back to how the conf layers actually work. You see that to get this output, let's take this off, to get this output, for example, we carry out multiplications and additions for each and every channel here, that's for each and every channel on the input and those filters which correspond to this channel. And then to produce this negative one right here, all this are added up with equal weights. So the output from this computation, let's call it alpha, will be put here, plus the output from this computation, let's call it beta, we will put here, plus the output from this computation, let's call it gamma, will be put here and we'll get this value or this output of negative one at this position. Now what this squeeze and excitation layer brings in is some weights on this addition operation right here. So instead of just having a weight of one year one and year one, we're going to have some modified parameters or some parameters added here such that certain channels influence the output more than some others. And so here we could have instead of one, we can have a weight A, you have a weight B and your weight C. Getting back to this paper, the way this is done is as such, we start by carrying out some pooling and the result of this pooling will be one by one by C output. Now C is the number of channels. So if here we have C channels, here we'll have this output here, C. So this output here will be one by one by C. You notice how this is small and then we have C. This size, the size C is exactly the same size right here. So we have exactly this size is the same as this. And then for the height and width is one by one. Now once we get this, we pass this through two fully connected layers, you see with this radio activation. And then here we have this hard sigmoid activation after this fully connected layer. And then here we get this output of this same number of channels, C, which will match with this one. But now what we get here will be multiplied by each and every channel here. So this now we serve as the weights, we serve as, because remember we designed this as A, A alpha plus beta plus gamma. And then we had A, B and then C. So this A, B and C is actually this output right here. We're supposing that C, the channel size is equal to three. So we have this three here and these are the values which we get after going through this fully connected layer. And then we take this now and multiply by each and every channel. So if you break this up into three parts, let's remove this. Let's erase this. And we could cut this, let's cut this into three parts. So we have one, two, three. So we suppose there is one by one by three. So if we have this, this first part will multiply this chunk. So we'll multiply this chunk and then this other part will multiply this next chunk. And then this other part here will multiply this other chunk. And so now we have this channels whose contribution to this output is now weighted. That said, we also have the expansion factor here is six. Then getting back to the results, we see how the efficient nets perform better than the corresponding other confidence with similar number of parameters or even more number of parameters. Like here we see how the efficient net B zero outperforms the resnet 50 though you see this great difference number of parameters as the efficient net is more efficient as or has fewer parameters as compared to the resnet 50. We see resnet efficient at B one compared to resnet 152. You see 60 million years 7.8, but this one is more accurate than the resnet 152. You could check out from this right up to efficient net B seven. You see we have this G pipe. They are 97.97, but this one has 557 million parameters while this is only at 66 million parameters. And we could also look at the floating point operations. You see here we have fewer floating point operations for the efficient net B zero while still having higher accuracy. We also see that if we scale the mobile nets and the resnets will still, they wouldn't still get better results compared to the efficient net. And it shows the power of the network architecture search, which was used in getting our baseline. Now we'll go down and check out this year. We have this results right here. You see the class activation map, which is a visualization technique, which permits practitioners understand how the model or rather what portions of the inputs helped in producing the outputs shows clearly here that when we use compounds killing, we have the map, which is more focused on relevant regions, as you could see right here as compared to the baseline model and this other models with the deeper with depth scaling with scaling and resolution scaling. Hello everyone and welcome to this new and exciting session in which we are going to treat transfer learning and fine tuning. Transfer learning can be applied in several domains like computer vision, natural language processing and speech. In order to better understand the usefulness of transfer learning, we have to take note that deep learning models work best when given much data. And so this means that if you have a data set of only a hundred data points, then you're most likely going to have a poorly performing model. But what if we tell you that it's possible to train your model or to train a given model on say a million data points and then use that model or adapt that model such that you can now train it on this very small data set such that you start getting very great results. This is very possible with transfer learning and that's what we shall be treated in this section. At this point, one question which may be going through your minus, how is it possible to train a model made of say a million data points and then use that same model on a smaller data set which is obviously different from this data set with about a hundred data points. The answer to this question lies in this figure we have right here. Notice how you have this image here, this image of this truck and then we have this model which takes this input and then produces some outputs right here. This model or the kind of model we'll use for this image tasks are generally the conf nets as we've seen already in this course. And with the conf nets, we generally have two main sections. The first section is the feature extractor. And then as we go towards this final sections here, we have the classifier. And so the very first thing the conf net we want to do will be to extract low level features. And then as we go towards the end, we focus on extracting more high level features. Notice how you're with this, let's pick out this feature right here or this feature map. You see that we pick up these edges. You see for other feature maps, we actually filtering out some low level features or extracting this low level features from our input. Then as we go get towards the end, we get more high level features like for example, whole portions of the image like the tire. And then after this, we generally have a classifier which now permits us to pick between a set of options which one the model thinks this image is actually. Now because the conf net works this way, it means that if we have two data sets which are similar, then we could build some sort of feature extractor or we could build a model which will extract features from this very large data set. And then because these weights have been tuned or have been trained such that the extract features correctly, then when we pass in this very small data set, this section of the model will do just its job. That is of extracting useful features from this data set. And you see that because these two data sets are similar, it's going to do a great job. And so we will not need a very large data set in order to extract features from this small 100 size data set right here. And so in fact, what we're saying is we have a model, let's say we have this model, then we have this feature extractor unit and then we have this classifier unit which we've seen already. Generally this starts after the flattening all the global average pooling and then here we have some dense layers while with this we have a ConvNet or some convolutional layers with some max pool and batch norm layers. So what we say now is we have this small data set which we're going to pass in here and then we'll get at this point. So it's here that we're going to get this outputs from our feature extractor unit. And then since generally we have for example or we pre-trained this model, not the word pre-trained because we are somehow using this for the first time in this course, we are pre-training or we pre-trained pre that we did the training before, we pre-trained the model on a large or relatively large data set like for example ImageNet. Let's suppose that we pre-trained on some large data set with one billion images, then this unit year has lent to extract features from whatever image you give it. And so now when you come with just a hundred images, what it does is it extracts those features and then since at the level of this classifier we have a different setup for the pre-training, we now have to modify this classifier. So this means that if before let's suppose that let's suppose that before we had after the global pooling we have say a hundred unit dense layer and then 1000 output dense layer right here. Then in our case where we have just three outputs, what we'll do now is we'll simply replace this here, we'll take all this off and then now we may pass this say directly to the three output dense layer or we may pass this first to say 128 and then to this three output dense layer. So from here we see that this new model will focus more on the classification while allowing the previous pre-trained model to take care of extracting these features from our data. Now it should be noted that we generally use this concept of transfer learning when we have a very small data set and obviously since deep learning models perform best with large data set, we want to get the best out of them and so we want to use the transfer learning when we have a small data set as we've said and we have this model which has been pre-trained to extract these useful features from those kinds of images. So this simply means that we should have two similar data sets. Another advantage of using or working with transfer learning is that you get to gain in terms of training compute cost. That is this model which was pre-trained may have been trained for say three days and then now all you need to do is just get this pre-trained model and then apply transfer learning on your own specific and smaller task. And so when you're running on limited budget you find that working with pre-trained models is going to be really helpful. Now apart from transfer learning we also have fine-tuning which is quite similar in the sense that unlike with the transfer learning where we have this feature extractor's weights which are fixed and then during training we update the weights of this classification section with fine-tuning what we could do is also update the weights of this feature extractor section. Now generally we start fine-tuning from the top so we suppose that this is the bottom here so we have the input and then we have this final layer so we would say that we'll start fine-tuning from this final layers going to this initial layers. So yeah we have this fine-tuning process we could get this first this top layers with that is we fix this layers here we keep these layers fixed so their weights aren't updated during the training process and then we update these weights while obviously we update this weights already but the difference is that this weights were initialized from scratch that is we randomly initialize this weights whereas this weights are initialized from the pre-trained model. Now this weights here are from the pre-trained model but they are not trained. Now we could again depending on the kind of results we're getting meaning that if we apply fine-tuning to this top layers or to this final layers and we get better results you see that what we could do is we could keep fine-tuning so we could keep increasing this or this section of weights we could update in the feature extractor unit. So yeah we'll take this off take this off and now we have this part which we can train so this is trainable this is untrainable or not trained so we have something like this now it's also possible for us to go ahead and just say okay we're gonna train all our model but while carrying out fine-tuning if there's one thing you need to note is that you have to use a very small learning rate and the reason why you're doing this is to avoid disrupting this weight values which have taken very much time to attain and so as we do the fine-tuning we're gonna update this weights but very slowly and by getting this by updating this weight very slowly we mean we're gonna choose a very small learning rate and then observe how this affects our models performance. At this point we'll get straight to the code and we'll look at some pre-trained models so here you could see you get into this tensorflow applications you have called next model, dense net model, efficient net model, efficient net v2 we've seen the efficient net already, inception net, mobile net which we've seen mobile net v2, v3, net net and the famous res nets which we've seen already with the VGGs and the exception net. So here you have the choice of speaking out any one of this we're gonna go straight to the efficient net so we could have this here or you could pick the efficient net v2 so you could pick any one of this so here we're gonna pick the efficient net v4 this one since he has slightly fewer number of parameters compared to the res net 50 and it outperforms the res net 50 by very large margin so we're gonna pick this efficient net v4 and if there's one thing you can do with tensorflow is simply the fact that you could use these models without having to code them out from scratch so as you could see here we have this tensorflow Keras applications efficient net, efficient net v4 and then we just have this argument with this we get we're gonna define our efficient net model we paste this right here and we'll call this the backbone so we've seen this already now here we're not going to include the top recall that as we had seen here we have this old model and then let's take this fine-tuning part off we have this whole model and then what we are interested is in this feature extractor unit so we'll set that include top to false let's get back here we have include top we're not going to include this set this to false take this off and then the weights have been pre-trained on the image net data set we're not take this input tensor we have the input shape now this input shape will have configuration so basically we have configuration the image size the image size there we go in size by configuration and since we are not including the top as we've said here we're not going to take into consideration those classes now the classifier activation so we have that off and then all we could decide whether to include the pulling layer or not if we want to pick the pulling layer then we'll have to specify what pulling layer want to work with either the average or the max so with this we are just gonna take this off and specify that later on so here we have our backbone we run the cell and then what we do to freeze what we call this is freezing so what we do to freeze this backbone such that the weights aren't updated during training is by simply setting this here to to false or setting this trainable parameter to false so backbone the trainable equal false and that's it so this all we need to do to freeze our model now we're frozen our model the next thing to do is to add this other layer right here all this other layers so we'll go straight away then yeah we're gonna define this input with the image size now once we have this input we now pass in all we have the backbone so we have the backbone here which has been defined and its parameters have been set to be frozen and then from here we have the global average pulling so we have global average pulling here then we now have this dense layer now the configurations will set this number of dense layer 1 to 1024 and number of dense layers 2 to 128 so let's get back to this there we go we have that and then from here we're gonna have batch normalization layer there we go let's copy this paste it out here we have another dense layer now this time around second one that's fine and then finally we have this dense layer activation softmax so here we have our softmax activation and then this is number of classes here we have number of classes and that's fine so let's take this off now and then run this cell right here we're getting this error because we have to specify this this way that's fine that's it so now we have this model you see total parameters 19 million the trainable parameters just 1.9 million and the non trainable parameters are 17.6 million so it means that the backbone itself is 17.6 million and then this additional parameters here come with the remaining 1.9 million parameters now with this we have our model already set you see with minimal code and we could go ahead to start the training then here we can now start by training our model again we compile the model and we run the training process now training is over we could go ahead and evaluate our model so let's run this and what do we get we have close to 85% accuracy and 95.3% top key accuracy and this does slightly better than the previous models which we had worked with let's go ahead and test this so changes to model and we run this here incompatible found shape this we are going to resize this before passing into the model so here we have this image and just here we have let's say we have our test image which will resize so we just open CV to resize this image we pass in our test image and then we specify this in size so here we have in size copy this and there we go let's run this again and here's what we get you see we have the side output now let's go ahead and check this out here we run this here we have one miss no miss no miss the second miss here we have two misses and that's it so out of the 16 we have two misses that is 14 divided by 16 about 87.5% accuracy on this small batch of images which we took from the validation dataset we go ahead and check out the confusion metrics oops let's run this run this get back and here's our confusion metrics we get even better results but one thing we have to note here is that our dataset was not that small and so we may not see this change or this difference between training from scratch and using transfer learning so what we'll do is we'll take this we'll take just say 10 that's 320 so we'll have a dataset of 320 data points and then we'll see the difference when we train from scratch and when we train with a pre-trained model so right here let's get back to this we take this down and we're gonna use okay use any of this one so let's let's pick the Lynette quite simple pick the Lynette and then here we have in this let's scroll down we have training loss function the same metrics the same and here we have Lynette model so run that and yeah Lynette model that's fine and so yeah we'll train on this small part and validate on the full validation dataset we'll do this for just 20 epochs so after training for 20 epochs on that very small dataset you see the model doesn't perform well you see it doesn't even get up to 50% validation accuracy while the trained accuracy keeps increasing so the models over feeding now from here we are now going to change this so we're gonna use our model this pre-trained model so we have this here we just run this again so we initialize this parameters and then we'll compile the model this is model so let's let's call this pre-trained pre-trained model let's change this name here to pre-trained model there we go we have pre-trained model and we'll get the pre-trained model summary that's fine yeah we have that and then here we have pre-trained pre-trained model okay so here we're gonna run this pre-trained model compile and then we'll start with the training again so just note that we we had the accuracy validation accuracy below 50% previously and now we're gonna check out our validation accuracy when working with the pre-trained model but already one thing you could notice just after two epochs like here see after this two epochs we are gonna see this validation accuracy which is already greater than 50 even from the first epoch it was already greater than 50 it shows you the power of working with pre-trained models as we are now making use of those extracted features to get this much more performing model see how the accuracy keeps increasing now we're done with the training you could see that this model which before we couldn't cross this 50% mark for the validation accuracy now is able to cross this mark as we'll see here you see we have the validation accuracy of 71% while just training on 320 data points so let's get here let's run this and you could see what we have here you see it gets just above 70% so now with pre-training we get above 70% and we even got greater than 50% just from the very first epoch and so what we could see from here is the very first thing is get as much cleaned data as possible and if you can't lay hands on this or try some data augmentation and then from here if your data set is still very small then you could then apply transfer learning but if you have a relatively large data set it will be needless applying transfer learning as training from scratch should normally get you better results so here we could evaluate our model so you could see that this is our pre pre-trained model with just 10 out of 213 batches which produces 71.3% validation accuracy we now move on to fine tuning as we've looked at transfer learning and now we get into fine tuning but before getting to fine tuning we would first convert this R code which was built with the sequential API now into the functional API so here's our converted model it's basically the same thing we have the input we have the backbone which takes in this input produces output with our average pooling dense layer batch norm dense layer and this dense layer right here then we have our fine tuned model right here let's run this code cell and then we could view a summary which is meant to be identical to what we had already with the pre-trained model so here we we have this 17.673 thousand see this is 675 it doesn't match with what we expect so let's get back here and we noticed that we did not put this here so let's um x we did not include a batch norm layer so let's run this again and then we now have this summary right here which is exactly the same as that of our previously built model with a sequential API and then now we want to fine tune our model that is all these layers which were frozen that is not trained we now want to make them trainable so right here we'll get back and then we simply have backbone the trainable and we set that to true then here we are going to set this trainable to false now recall when we're building this resnet34 model right here we have this trainable parameter which we made use of because remember here we had training sorry it's not trainable let's get back here that was training so we should have training here now trainable is different from this one so take note of that this is trainable and this is training so while we set this training to false and we'll get back here you would find that this batch norm took in this training parameter and the reason why we need this specially for the batch norm is simply because the batch norm works differently during training and inference and then during training the batch norm layer normalizes its output using the mean and the standard deviation of the current batch of inputs whereas during inference the batch norm layer normalizes its outputs using a moving average of the mean and the standard deviation of the batches it has seen during the training so since at inference or when we find tuning we do not want the batch norm to take the current mean or rather the mean and the standard deviation from the current batch of inputs we are instead going to compute this from what it saw during training and so that's why this training parameter is very important we have this training right here you see it takes the input and the training where we could set training to be false for training mode and training to be rather training to be true for training mode and training to be false for inference or let's say fine tuning now before we move on you should note that this layer the trainable set to false is different from setting training to false when you set a layer's trainable parameter to false it simply means we do not want to update the weights when training but when we set training to false it means we work in an inference mode in the case of the batch norm this gamma and beater are trainable parameters and so when we say layer the trainable equal false it means that they are not going to be updated during training but on the other hand this mean and variance aren't trainable parameters instead they are parameters which adapt to the training data and that is why when we add inference mode that's when we set training to false we do not want to disrupt the mean and variance values gotten during the training based on the training inputs and so as we saw already this mean and variance at inference mode will be simply the moving average of the mean and standard deviation of the batches it has seen during training and so clearly these two concepts the that is setting the weights to parameters not to be trainable and so clearly the concept of setting these weights not to be trainable is different from that of working in inference mode nonetheless it should be noted that in the case of the batch norm setting trainable to false on the layer means that the layer will be subsequently run in inference mode now although we've seen that these two aren't they don't mean actually the same thing now also note here that setting trainable on a model containing other layers will recursively set the trainable value of all inner layers and if the value of the trainable attribute is changed after calling compile method on a model the new value doesn't take effect for this model onto compile is called again okay so that's it you could check out all the dropout here we have this dropout layer which doesn't have any trainable parameters but remember that the way the dropout works is that you have let's say some inputs let's take this off you have some inputs and then if you to pass to the dropout layer at the end you have some of these inputs which will be not considered so you have maybe this will proceed but this not taken into consideration maybe this moves and then this not taken into consideration so the dropout of 0.5 simply means that half of our inputs will be will move forward and the other half will not be taken into consideration and so you see that at inference that is when we are actually trying to test our model we do not need to drop out some of this neurons right here and so generally the dropout also takes in this training parameter here where when we set training to true it means that we are in training mode and so we could actually drop out some of these values whereas when we set the training to false then we are in inference mode and so we do nothing so we just allow the inputs to pass without any modifications and you could also see clearly from here that the layer that trainable doesn't really apply because dropout doesn't have trainable parameters whereas with this training we could decide whether it's true or false that is what is in training mode or inference mode so in fact what we're saying is we have this model here so in fact what we're saying is we have this model here we have the backbone and we have the head for classification we apply transfer learning by freezing all this we freeze all our backbone so no parameter here is updated during training and then now we move on to fine tuning where in fine tuning we want to update these parameters with a very small learning rate and then we also want to avoid a situation where those mean and variance statistics which were gotten during the training process will be upset during this fine tuning process and so the batch norm is kind of like a special layer where even during the fine tuning where we want or where we have set the trainable to true that is we want to update this weights during the training we do not want to modify or upset the batch norms mean and variance and so we are going to set this training here to false so it still behaves as if it were in inference mode so getting back here we have our training which has been set to false and then we could start training our model again but one point to note here is we do the fine tuning on a pre-trained model which has already been trained so what we'll do is we're not going to start training this model this way we'll start by training the pre-trained model so we start by having this backbone to be set to false and this training set to false so we're going to repeat the transfer learning process again before then applying fine tuning so each time you want to apply fine tuning make sure you have this set to false training set to false and then you go ahead so yeah let's run this again uh here's our fine tune model which achieves uh best validation accuracy of about 70 percent okay now we're done with transfer learning we're now going to apply fine tuning to do this we're going to set this to true so all we need to do here is set this to true we're not going to rerun this again we is the same model and that's even the idea the idea is for us to start with the backbone which is not trainable and then later on make it trainable uh while now just simply recompile in the model so don't forget to recompile this model to take into consideration the fact that some parts of the model are now trainable so let's get back here recompile the model and see what this gives us now as we start training we notice that this validation accuracy here isn't uh looking like what we expect because before getting to the fine tuning we already had a model with a validation accuracy of about 70 but now we're getting this 33 and the simple reason why this is so is because our learning rate here we still maintain the same learning rate instead of reducing it before the fine tuning so we would have to stop this here then we would get back to this here set this to false so we would have to start back the whole process set this to false and then we retrain the model we get this accuracy of about 70 now we get back here and we set this to true so we set this to true that's fine we run this cell backbone now trainable and then we are going to make sure this learning rate here divided by 100 so we're going to make use of very small learning rate now once we have this we're going to run this again so we're going to compile our model and restart the training process training are completed the other results we get you could see that the validation accuracy increases up to 72.2 percent so we make an extra gain of 2 percent for the validation accuracy after fine tuning our model and this makes sense since fine tuning permits us to squeeze out some extra jewels from this backbone since this time around it's actually trainable and that said we've just completed section on transfer learning thanks for getting around to this point and see you in the next section hello everyone and welcome to this new and exciting session in which we are going to visualize the convolutional neural networks feature maps one very important part of building robust deep learning models involves understanding how these models work or understanding what goes on in the different hidden layers and so in the section we'll focus on taking a model which has already been pre-trained and then generating these feature maps so we get to see exactly what goes on under the hood the pre-trained model we'll be using here will be this vgg16 so we'll simply copy this get back to our code paste this out there we go we have our vgg16 we're not going to take the top so we'll take all this off now our input shape and we'll take this input tensor off we're not going to include the top so we set this to false that's it then this input shape will define it as our in size here so we have configuration in size that's fine and we add this here okay so that's it we set this and we give it this name vgg backbone so we have vgg backbone right here we can check out a summary vgg backbone summary run that there we go you see it has about 14.7 million parameters we now move on to the next step where we're going to create this other model which will permit us visualize this feature maps now to explain how this works let's recall that we have this vgg right here so we have our vgg and then what the vgg does is it takes in an input image so we have an input image and then it produces a single output now if we have say not included at the top then we will have this output which is 8 by 8 by 512 so here we have 8 by 8 by 512 and when it took in this 256 by 256 by 3 input now since we have only this one single output and we are interested in visualizing the hidden layers that what go that's what goes in the vgg model what we'll do now is we'll create a new model we'll create a new model right here which instead now has many different outputs and these different outputs will come from these different hidden layers so we could take this one and it becomes now an output this one it becomes an output this one output and so on and so forth so basically this hidden layers now or this hidden or the outputs of the hidden layers that's our feature maps will become our outputs so we'll now have this model with this as input and then this as output so this will be now our different outputs instead of just a single output now we have this different outputs there will be about 17 outputs in total now you may also decide to pick the specific output so you may want to take only the conf layers the only the outputs of the conf layers so you omit the max pull layers here here here and here with this one but it all depends on you and we'll see how to do this so let's get back to the code well we're now going to build our feature maps so we'll take the feature maps and then we'll put this in a list so here we'll get the layer output and this for layer in the vgg backbone layers so we we have this vgg backbone layers right here or this vgg backbone model here we're going to get all its layers starting from this one we're not going to pick the input so we'll simply have this and that will be it so we take these layers and then from here we'll build this new model which we'll call feature map model and sakura's model from here it takes in as inputs the vgg backbone so we have vgg backbone input here we have the vgg backbone input is the same input but now what the difference is that the outputs is this feature maps here so we don't have this just one output but all this other um hidden layer output will now become our output or be part of our output so here we have feature maps so that's it we'll build this new model we'll uh view the summary feature map model that summary and there we go so that's it uh it looks similar to what we have but if we had picked say from model one to just model four or sorry from layer one to layer four then you see it's shortened because this all we need for this our new model but since we're getting right up to the end you see that we actually go through the whole vgg model but the difference is that now we have outputs that we have many outputs and we have just this one single input unlike before where we have one input and one output so from here we have this model which we've just this new model which we've just designed start from one run that again you have that and now let's head on to passing an input through this model so what we want to do now is we take this input image and then we pass it into our model and now since our model outputs the different feature maps the different hidden layer outputs we will now be able to visualize what's going on inside our vgg model now to get this output we are going to use something similar to the testing which we've seen already recall we did we carry out this testing here where we take we read this image we could simply copy this where we read this image and then we pass it to our model to get the output but now in our case the model let's reduce this the model we'll be working with is our newly created feature map model so let's have this here and that's it we have our test image we resize we pass this in this feature map model and then you'll see that oh when we run this we now check out feature maps so we'll say for let's say for i in range length of the feature maps we want to print out want to print out the fmap shape so we have that list has no attribute shape oh okay fmaps i so let's pick out this i we run that and it's and you see this so you see that the output starts from here from this one instead of the input starts from it actually starts from here as we since we have decided to start from this one since we decided to start from here because we don't want to include the input as part of our output so that's logical we have that we start from this right up to this very last one year now here we've picked the conf layers and the max pull layers so we have all this now so we can now um visualize this different feature maps right here now let's note this length let's print this out let's note this length is here you have 18 different outputs we get back here and we modify this so we'll say that we'll only do this if is conf of layer that name is true so if this is true if this is true then we are going to attach this to the outputs now oh we could also do this we could just simply have it like this and that's fine now so what this means is we are only going to take the conf layers as part of our output now we could define this is conf this is conf takes in a layer name so basically this layer names uh what we have here does you see why it's important to always give your layers some names because now you see it's helpful always it's used now to differentiate between the different um types of layers so here we have this layer name and then we're going to say that if this layer name if this layer name or rather if conf in the layer name we return true then else we return false so we run that there we go we run this again it looks similar to what we had before but one thing you'll notice now is that this length is reduced so we've taken off five we've gone from 18 to 13 so we've taken off five layers which correspond to this max pool layers since this conf uh this conf isn't in this name right here so that's it we have that we could also decide to say okay we want to take only the max pools so we will say if pool uh pool we run that and we check this legacy now only five there we go so that's it oh let's get back to conf and we have that okay so now we've run this and we have the different shapes now to carry out the final visualization you see you have this f maps here so we're going to go through each and every feature map so for i in range the length of uh feature maps we now create this figure and then we specify the fixed size so we have fixed size equal to 56 by 256 now we have that we call this method right here and then from here since we're going through each and every feature map is important for us to get the feature size so we want to get these values for each feature map now with this we just simply have f maps we have k and then all rather i here so we get in this we pick a feature map we get that and then we get the shape uh one so this will permit us to get this value so this is shape zero shape one shape two shape three so we get this feature size the feature map size then we now get to this number of channels so we have n channels equal the feature maps i shape three three because this is zero one two three so this is how we get number of channels now we have this already set we want to be able to visualize this such that like here such that all these channels are lined on a single line so because we have this 512 16 by 16 let's let's check this earlier ones we have like here we have 64 256 by 256 images so let's suppose that this is one of them here we have this 256 256 by 256 right here and then we have 64 of this for the 64 different channels right here now what we want to do is take this one let's take this one put it here take this other one and align it take the next one align it so we could visualize this in one line up to the very last one right here so to do this now what we'll do is we'll create another array which we'll call joint maps joint maps uh mp1s we initialize that way and then the size here we take the feature size so we have f size and then to get the this for the width this for the height actually so we we know that the height in the case of 256 by 256 this height is 256 so we have this height 256 this distance 256 but then now the width is going to change so the width is no longer 256 by 256 times in this case 64 so here we have um 256 here we have f size times the number of channels so that's how we do that so with that we now have this joint maps which is initialized to one so we have this array now set now the next step will be to fill in this value this output this features in this array now we understand how this joint maps here was created we now go ahead to fill this um information all these different uh features in this giant maps array so here we'll do we'll go through the different channels so for j in uh range uh n channels so basically n channels we have that and then we fill in the joint maps now the way we'll do this is we'll keep the height fixed so we have the height fixed and then in this width dimension we'll fill this information in a way that as we go from one channel to another we are going to keep steps of 256 so here we will have um f size our filter size our feature uh map size is 256 f size times j and then we go to f size times j plus one so what this means is would we fix the height as we've said already we've we'll fix this height here the height is fixed 256 that is it here so we'll take all elements in the height and then for the width if when j equals zero for example we have zero up to zero plus one so the here we have zero zero plus one so we have zero and then zero times f size is zero then one times f size is 256 so this means that in the width dimension we're going to go in the height we have 256 in the width we have 256 now when j goes to one now here we have one one times f size is 256 so we have 256 and then one plus one is two two times 256 is 512 so now we'll skip 256 steps at this point we are at this point now and we get this one and then we repeat the same process again when when j equals two we have 512 and then we move to this should be 768 um no 256 yeah 768 so now we go from 512 to 768 and so that's how we're going to be filling up these different positions right here now once we have this already set we now go ahead and and pass in the data while we have to fill in here so we have f maps and then we take in i if we consider this case where j equals zero that's we've picked this zone from zero to 256 and we've collected all the height then we have this patch right here and to fill up this patch we have our feature maps which we've seen already but then we're going to take a particular feature map obviously this i here picks that particular feature map and then once we pick the particular feature map we can now go ahead and set this here to the values of the feature maps while selecting the particular channel so now we when j equals zero for example we take the zero channel and so we'll take all the values which come before and then pick out now this zero channel for example now here's j so that's it we have we have our join maps which has now been created and then we could take this off and we're now ready to plot our image so we have this image show and then we pass and join maps now if we want to pass all this is going to be very ram consuming so we just take the we select all the height and then we'll pick some values so we'll go from zero to for example 512 so we'll have that and now we could run this but before running this we need to set the different axis here we have axis then we have plot that sub plot uh the length of the feature maps so here we'll basically if we have certain feature maps then we're gonna have this um different sub plots here one and then here we have i plus one so that is it so once we have this set we can now run this and see what we get takes a while to run and now that's complete we could visualize the results here let's simply scroll down and you see this result you see for the initial layers we have this low level features which have been extracted so we have we could see this clearly here and as we go down or as we go further or deeper into the network this uh the features we extracting start to become more high level features so you see this see here see this one um focuses see here we extract this mount here unlike before where we're more focused on um edges so that's it we keep going deeper and we see the outputs or the results we get in and that's it we've just visualized a trained model of feature maps now another thing we could do is suppose at the beginning here that we have known right here so we don't we we don't want the pre-trained weights so we run this run this again and check out on what we are gonna what the model is gonna produce here is what we get you let's scroll so you get to see this you see the inputs um and as we go deeper we'll notice that not much information is yet to be extracted from the input hello everyone and welcome to this new and exciting session in which we are gonna see how to implement the grad cam technique which permits us visually explain how the deep neural networks work this was first developed in this paper by Ram Persad et al and entitled grant cam a visual explanations from deep networks via gradient based localization so as you could see here we have this input for example let's reduce this we have this original image and there we could see this grad cam output now the task here is a classification task where we're trying to say whether there's a cut or darken the image and then with a grad cam for the cat we're able to detect the portion of the image which influenced the model to say that there's a cat in the image and also for the grad cam dog we have this part here which shows us that this portion influences the model the most in detecting or seeing that there is a dog in the image now we also have the grad cam for different models you see this grad cam for the vgg model and we have this resnet grad cam right here so we see that two different models can produce two different grad cams though generally they should be similar we said with a resnet we'll have this larger surface as compared to this vgg and we could also see this for the cat now that said we are going to implement the grad cam technique which generates or which permits us to generate these kinds of visualizations which tell us what parts of the input influence the model in taking certain decisions as we could see in this figure two from the paper given an image and a class of interest for example the tiger cat or any other type of differentiable output so um like here we have this image you see the image and we're giving the class tiger cat so here the task is image classification so here we have this class tiger cat now um for now we'll keep out this year we'll focus only on this image classification so essentially we have this image as we said already we we have the class then we pass this through a cnn that's a convolutional neural network you have that right here where let's say we forward propagate the image to the cnn part of the model remember um this cnn has already been trained so we forward this to the cnn part of the model and then through the tax specific competitions to obtain a raw score for the category so task specific year in our case our task is image classification so this year is specific to um our task which is that of image classification now the gradients are set to zero for all classes except the desired class then the signal is then back propagated to the rectified convolutional feature maps of interest so you could see from here that when we forward propagate this to our cnn we have this output feature maps here you have rectified conv feature maps so these are feature maps and then we pass this through this fully connected layers then we obtain the outputs from here we back propagate as we said already to this rectified feature maps and you can see that the resultant of this back propagation is what we have here notice how this is colored compared to this so here's our feature maps here's the output from back propagation from this output back up to this feature maps so we we obtain this by doing the the derivative of this output let's say y let's say y tiger cut let's call it tc y tc y tiger cut with respect to x feature maps let's call it fm so we're finding our we're computing this derivative of the output with respect to each and every value we have here for the feature maps and obviously this will give us a corresponding feature map it's just like having uh let's let's pick any position let's say this random position here we take the output with respect to this we take the output with respect to another position take the output with respect to another position and so on and so forth and by doing this we reconstruct somehow those feature maps but this time around um it's going to be the derivatives of the output with respect to each and every position then from here as you could see let's take this off from here what we do is we are going to obtain the mean for each and every one of those features so you see this one this one that's pinkish you have this here so we obtain the mean of for this we take for this we obtain this mean um this and so on and so forth so we obtain all this and so all these mean values now are going to be multiplied by this feature maps so for this for example this one here we'll take this and multiply um these features we'll take this multiply these features and so on and so forth then once we get this we're going to combine all this or add all these features up and then pass this through a relu function which then produces our grad cam and so that's how this works in theory that's how we obtain this grad cam we have here let's dive into the code and get this practically so first things first we have this efficient net d5 model which we had trained already so we'll just um load the weights let's go ahead and load this weights so we have pretty much a lot of weights and there we go from here we could test out the model so we have this image here basically what we've seen already in testing so we have the test image um we expand dimensions and then we have the pre-trained model which now takes in our image array so this is the output right here you could see that the max is at this um index one which is logical because it's a happy image we could um see that right here person is happy so there's our image and here's what our model produces remember as we've seen already we have actually three classes we have angry happy and sad and um this is the zero index index one and index two so i'll put this index one meaning this um prediction is correct at this point if you could recall from here this model was broken up into two parts so we have let's take this off we have this first part which generates this feature maps and then this other parts which takes in the feature maps and then outputs the specific class of this input image but given that when defining our model this was in a block that is we basically define this as one not as two separate entities like this we are going to look at how to separate this or better still how to create a model which is made of only this first part and then the model made of this other part and then one thing you should note is that the way we created this model was a bit different from what we had seen so far so previously we get back here with those pre-trained models let's get back modeling we have pre-trained well let's pick this one transfer learning with this you see that we have a backbone defined and then we have the backbone we have global average pulling and so on and so forth and then when you look at the summary if you look at the summary you have all this represented as one so the backbone was uh you don't get into the details of what is contained in the backbone but now when you look at this other summary let's get down to grad cam when you look at this model summary now you would find that all the details of the backbone are given and this has gotten differently because now what we do is the input our x which is passing to the global average pulling is simply this backbone dot output so when we say when we specify that you want to get a backbone output this permits us get a summary with all the details like this and then when it comes to the inputs here you would say you want to have the backbone inputs so let's this input right here is actually useless you need to take it you could take this off so that's it so once you specify this and then you have your usual output then you would get this full summary okay so that said now let's go ahead and create or make this model out of this full model we have here so in order to create this last conf layer model which is essentially this model here where we would get the output to be this rectified conf feature maps we have this Keras model which takes us input the pre-trained model inputs so it's quite similar to what we had already here where we just take the inputs to be the backbone inputs now the our inputs is the full pre-trained model inputs but then the output take notice the output is this last conf layer output now what is this last conf layer the last conf layer is that layer whose name is given as top activation you'll notice that this is this last conf layer where we have this feature maps it's 8 by 8 by 2048 and then from here we move on to the global average pooling then there's one then two and then we have our output so this part from here upwards is this initial conf layer or this conv convolutional neural network and then this here is this classifier unit to the right and so now that we have this name of this last conf layer that's a top activation we're going to make use of that to produce our output you see last conf layer is simply the pre-trained model and we get the layer whose name is last conf layer name and last conf layer name is top activation so we understand where we get this top activation from so now we have our last conf layer as we've said already we are now able to have this last conf layer model or this initial cnn model here which has as input the image and as output the feature maps so let's run this and see what we get there we go we could we could do last conf layer model model and check out the summary there you go we have this model let's check out the output you see here's the model output and here's the model's input see 256 by 256 by three so we have the input is the image and the output the feature maps now we've built this first model or we've created this first model from our overall model let's go ahead and check out on our classifier model so for the classifier model you see we have this inputs the input now is going to be this feature maps remember the output of this was the feature maps and the inputs of this is the feature maps from here given that we have the inputs to this classifier model we are going to simply pass this through each and every layer which makes up the classification part of the pre-trained model so if we get back up here let's scroll back up you'll notice that we had this global average pooling we had this dense we had dense one we had this dense two which all make up the classifier path so let's get back here and then you see this classifier layer names is given here so we understand where these names are coming from and then we simply go through for every layer name in our classifier layer names so let's have this so for every layer name in our classifier layer names we are going to pass the input this input classifier input into these layers so we take this input pass into the global average pooling then the the output will be x we take that again pass this into the dense then we take that again pass into dense one and then pass into dense two and then we have the output which is in this case x so our input is a classifier input which is this and our output is the output after we pass this input through these different layers the count block and the classification unit designed the next step will be to compute the partial derivatives of the output with respect to the features here now we could compute this partial derivatives or this gradients making use of this gradient method which takes in the top class channel and the last conv layer output so this year is our top class and then here we have our feature maps to obtain this feature map that's this last conv layer output we simply pass the inputs that's the image into our last conv layer model our last conv layer models this conv layer or this conv model right here take the input pass in here obtain the feature map so here's our feature maps and then from the feature maps we pass this through our classifier model so we take the classifier we take the feature maps pass in the classifier model obtain the predictions here turn those predictions and then get the top prediction index now the top prediction index will depend on our example now given that the person is happy we expect this index value to be equal one so we could we could even print this out let's print this out top prediction index there we go so we would have this top prediction index and then to obtain the exact score we are going to simply pick that score from the predictions so here we have the pretz obviously the pretz is from this so we have the pretz and then by specifying top pred index or by simply specifying one what we're saying is we want to get this score when we pass this input into this classifier model and now this score we're talking about here should be what we had right here so should be this value right here so we're computing the partial derivative of the output respect to the feature maps and we're making use of this gradient method to help us do that so let's take this off you see here we have this here there we go we run this and then we obtain this output shape you see takes the exact shape of the feature maps now once we have this here does this gradients the next thing we could do is simply obtain the mean values at every position right here so uh we simply make use of this reduced mean while specifying the axis so let's run this and print out our pulled grads shape there we go you see we have 2048 so basically this vector now where each position is a mean of a single channel remember from here we had 2048 channels and each and every one of them was eight by eight so you should all could say one by eight by eight so we had this for each and every channel and now we've reduced each and every one of this into a single value so this is just a single value this is a single value which we have here which we'll see here essentially and so now we have this 2048 dimensional vector and that's what we've seen here so this is essentially this now the next step will be to take this here and multiply by our feature maps to carry out that multiplication we'll simply go for iron range 2048 so for each and every one of these different positions we have here we are going to take the corresponding last conflayer output which is essentially this feature maps so we take the feature map you see we specify i so if zero then we're taking this feature map of this channel and then multiplying by its corresponding pulled grads which we've seen already to be the mean of each and every channel we have here for this gradients so that's how we obtain this now note that we specify that one is zeroed element because this is one by eight by eight by 2048 and specifying this will leave us with a tensor of shape eight by eight by 2048 so we get rid of this here so that said we now run this and we obtain our last conflayer output so it's going to be obviously the same shape you could see check that out last conflayer output shape and there we go now to obtain the heat map we are going to sum up the values at different positions for all the channels so let's suppose that we have something like this let's say for example three channels then this position see we have this position we'll take this plus this plus this and then we'll have this output then we move to the next position this plus this plus this and then we have this and so on and so forth so that's how we're going to go from this eight by eight by number of channels output or let's say 2048 output to an eight by eight heat map so this what we have here we're going to sum this and specify the axis is a channel axis so we run this already and then now we could visualize our heat map but notice that there's also this really right here so we have the relu before we visualize the heat map there we go here's our heat map which we then resize using open series resize method so you see how we get from this heat map which was eight by eight to this one which is 256 by 256 and the reason why we're doing this is because we want to add this with the image or purpose this on the image such that we have an output which shows us clearly where or better still regions of the image where we have the highest contribution to our output which is that of a happy person so the model pretty that this person is happy and now we know which parts of the image contributed the most to that prediction now try now with a different image and modifying this slightly so that we could get this image here we find that is this part so previously was a small region now is this this zone we see this wrinkles right here we have this zone and this zone which influence the prediction now with this we've just implemented this grad cam method thank you for getting right up to this point and see you in the next section hi there and welcome to this new and exciting session in which we shall be treating or we shall be using this transformer network right here to solve problems in computer vision and more specifically in the task of image classification up on to this point we've seen different convolutional neural networks like the lunette the vgg the resnet the mobile net the efficient net and now we'll be looking at the vision transformers this vision transformers were first developed in this paper entitled and images what 16 by 16 words where they build transformers for image recognition at scale in this section we'll take a deep dive into how this whole architecture year has been constructed and how it works and also how and why transformers perform as well as their convolutional neural network counterparts the very first point you want to note here is the usage of transformers for computer vision tasks has been developed in very recent times you could see here from this date this paper was published the authors say here that while the transformer architecture has become the de facto standard for natural language processing tasks its application to computer vision remains limited in vision attention is either applied in conjunction with the convolutional networks or used to replace certain components of convolutional networks while keeping their overall structure in place we show that this reliance on the convolutional neural networks is not necessary and a pure transformer that's without any convolutional neural networks apply directly to sequences of image touches can perform very well on image classification tasks they even go ahead and tell us that when pre-trained on large amount of data and transferred to multiple mid-sized or small image recognition benchmarks like the image net ciphar the vits that is a vision transformer attains excellent results compared to the state of the art conf nets like the efficient net while requiring substantially fewer computational resources to train now it's possible that you've never heard of this term transformer or maybe you form an electrical engineering background and you've only heard of this when it comes to stepping up and stepping down electric power now we are going to go straight away to explain terms like the transformer or even this attention which was mentioned to better understand the transformer and the role it has to play in this vits architecture right here we would have to get back in time to understand why do we first developed in 2017 this paper entitled attention is all you need was first developed by vaswaniya al and it has turned out to be one of the most influential papers in the modern deep learning era with the development of this transformer architecture right here at the heart of this transformer architecture we have this self attention models and more specifically in this paper they used the scaled up product attention that we could see here but then as we said the whole purpose of the domain in which these kinds of architectures or those kinds of networks were built was for natural language processing but the question is how does this work in natural language processing to understand how and why the attention and also the transformers are used in natural language processing we'll take the following example which is that of translation which we used to already do in with google translate so here we're going to put in i love the weather or could you see the weather today is amazing and we'll translate this to french le temps je d'hui and choir now initially the kinds of deep learning techniques which were used in solving these kinds of problems that is taking us from one language to another where the recurrent neural networks the way this recurrent neural networks work is quite simple so we'll start by putting the text here here we've put out our example from google translate and then we've added this extra blocks right here now these blocks we've seen here are recurrent neural network blocks recurrent neural networks generally reading are enhanced here are one of the first deep learning based models used in natural language processing tasks like the case of translation we have here now the way this works is we have our initial text or we have our source that's english text the weather today is amazing and then we have the target which we want to generate so initially we have this input and this output which we're going to train and later on when we pass in some random input we expect to get a reasonable output now the way we have this or the the way this is structured is such that each and every one of this is called a token so we have this word here which is a token this this is our first token next token this token this token and this other token then these different tokens have been converted into vectors and then been passed in this rnn blocks right here now we carry out some simple computations like multiplication and addition and then some information is been passed in from one block to another hands the term the recurrent neural network now the importance of passing information from one block to another is that this token's computations in this block will depend on this other previous tokens that you could see here so it depends on this depends on this and also depends on this other one and then once we're done with convert or passing this information from one block to another up to this we are then going to take this here we're going to have some information we're going to be passed onto this other rnn block here so this is our encoder block so here we encode the information and then yeah we decode this information so here we have the encoder and the decoder and then again here a similar process is repeated where we have these computations which produce an output here and then we could take this output and feed it in this one to produce this other output and so on and so forth up to this final output but then the problem with this technique or with this method is that first of all if we have a very long text then it may happen that it has becoming difficult for information to flow from this first blocks here to this final blocks and given that even as humans we know the importance of taking into consideration some previous context when trying to carry out attacks like for example translation this kind of problem will lead to very poor results now another problem here is each time we're training we have to pass this information for one block to another sequentially so here we pass all this information sequentially and because this information is passed sequentially it makes it difficult for us to implement parallelization very efficiently and so this makes the training of these kinds of neural networks very difficult now to tackle the issue with long-term dependencies attention networks were developed so right here instead of depending on just this final vector we get or this final output we get from this hidden layer here which has been passed on here to relay this information from the source to the target language what we'll do is for each and every unit we have here each and every recurrent neural network block we have here we are going to take into consideration inputs from each and every block here so this inputs will be taken into consideration so each and every block now you see all of this is passed and then we have this attention layer right here which then processes these inputs from this different source RNN blocks such that the layer that all this attention layer produces an output vector which is now passed as input into this RNN block and so when we have this source and this target we pass in the source then we get all we combine those inputs from each and every RNN block right here pass in this as input into this RNN block get an output in this case is L then we take this output and pass it as an input in here but again once we shift and go and get to this time where or this time frame where we want to get this second output what we'll do is we'll have another tension here which again takes in all these different inputs so we take again all these different inputs here and here and here and here then carry out some computations based on the type of attention we are implementing and then from here we get an output which is passed together with this right here so from here we get this output and then we repeat the same step that's passing in the tone that is taking this output passing in here and then also taking in this inputs from these different RNN blocks and so as you can see here for each and every block we have here it pays attention to each and every input and for this or from this we could even come up with an attention map where we would have this text this text in English the weather today is amazing and then this other side we have the tone or the inquiry app so now after training this kind of model we can be able to see how much attention this pays to each and every input here and then it's biological that this law will pay the most attention to the and then actually here is weather so this will pay more attention to almost attention to weather then we pay most attention to today we pay most attention to east and inquiry app will pay most attention to amazing and if we get to this paper entitled neural machine translation by jointly learning to align and translate that is a famous Badenau et al paper you can see some of this attention maps here let's have this you see some of these attention maps the you see the economic European then the agreement on the European economic area was signed in august 1992 so you see this attention maps here where we see clearly which words attend most to one another so here we have this image which shows exactly what we're describing previously so here you have this inputs and then to get this output y of t you will find that we are going to take in or we're going to attend to each and every input here and then pass this year to obtain our yt now at this point we are going to move on from the attention to self-attention and to better explain the self-attention we'll consider a whole different problem which is that of sentiment analysis so here we want to we have this model we could now take this off we don't make use of this although you should note that we still use self-attention in the translation problems but it will be easier to grasp this concept in the context of sentiment analysis so here what we are having is we we have the weather today is amazing i want to be able to see whether this is a positive or negative statement so now we have this model which takes in inputs like this and then let's draw this model here like this we have this model and then outputs or tells us whether the statement we've made is a positive or a negative statement now here for this self-attention layer we are not going to need this recurrent neural network hidden states anymore in fact what we could do is we could take all this off actually because basically we're having this self-attention model which we'll see in a minute how it works and then what we're passing in here is some vectors so we have this vector we have this other vector we have this vector this one and finally this one now if we combine all this we'll find out we have a sequence length so we have one two three four five suppose our sequence length is five so we have a sequence length by let's say embedding dimension metrics which we which we get from here now let me explain let's suppose that the sequence length is five as we've all seen and then the embedding dimension is let's say three so we have this five by three metrics which we are going to pass into this self-attention layer right here now this embeddings all these vectors which we pass into the self-attention unit are going to be designed in a way that words which look alike are going to be close to each other while words which are opposites are going to be far away from each other now let's since we're working in three dimensions it means we'll have one two three values here one two three and then finally here one two three so let's let's do something like this three dimensions what we'll have is the word happy which in this case can be represented by this vector or this embedding will be or can be plotted out like this and this will be close to a word like smile while a word like sad a word like sad will be far away from this uh two words because they are actually opposites to each other so we have sad and we could have angry right here now for this one year or for this text here we could pick out these two words which are most likely to be very close to each other we could have the right here and we have is somewhere around here and so now getting back to this model we have this five by three input which is passed into our self-attention layer so we could let's let's have this matrix here five by three would have the the word there yeah we'll have its own embedding so we will have some value some value some value suppose that we're working in three-dimensional embedding and then whether we'll have its own value its own value its own value today its own value this value this value could you could take say two point three one zero point five negative five one whatever value one year and then you have this and you have this and you have this this is four already and then amazing will have its own so you see that each and every one of this year has its own embedding so this is these are the different words we have here then at this point we'll implement a special type of attention known as a dot product attention where we'll take this here and multiply it by the transpose of a matrix which has the same shape at this matrix here so we'll take this we'll call this the query and then we'll multiply this by the transpose of the key now this key is going to be three by five since it's going to have the same shape as this query now this is our query we'll call this a query and so here now we have this three by five matrix and then this product will give us a five by five matrix now after this after getting this five by five matrix we could pass this to a softmax layer now we've looked at the softmax layer in previous sessions but one thing you should note here is once we have this five by five matrix it produces this attention map similar to what we have seen before where we have this the weather today is amazing to the side and the same again to the side and then words which are most similar to each other in a certain context are going to have the highest values and so if we're in the case where you have uh say let's replace this weather by happy and then we have uh amazing let's uh no let's let's live amazing so if we have the happy today's amazing uh it sure doesn't make sense but let's consider this let's suppose that we have the happy today's amazing then this uh second row of foot column because amazing will be around here so we'll have this value which is going to be relatively higher than all the other surrounding values and this will be because after training the model the attention map values would have been modified such that values or rather words which are similar to one another take higher values while words which are not similar to one another take very small values now from here we have this uh five by five matrix which now when multiplied by another five by three matrix will give us a five by three matrix generally we call this matrix which is multiplied by this attention matrix the value so we have query we have the key and we have the value of this you see that we have this input which got in here which was five by three and now we have a five by three output then this year now we pass through some fully connected layers and then we'll have an output or a fully connected layer with one neuron in this output which will tell us whether an input statement is a positive statement or a negative statement and so as you've seen we've gotten rid completely of the recurrent neural network blocks as now we're just making use of this self-attention blocks to extract information from our inputs now one of the first papers if not the first paper which made use of just the attention and getting rid of the RNNs was this attention is all you need paper and it happens to be one of the most influential papers in modern day deep learning so here in this attention is all you need the paper or the transformer paper to present this new network which you could see just right here and then a single block let's take this off a single block which makes up the transformer model is this multi-head attention so as you could see right here we have this single block and then here we have this multi-head attention so let's look at this multi-head attention this actually the multi-head attention here so you have this year which is this whole block and then in this multi-head attention you have the skill dot product attention which is this self-attention we just talked about you see we have the query the key and the value so since it's self-attention you'll notice here that we have this input and all these come from the same input so we have this input which is split it up into cure k and v query key and value now this resembles or is analogous to data management systems where data is stored in key value pairs just like say python dictionary so you have data in this key value pairs data start this way and then when you want a particular information you have to pass in a query now when you pass in a query let's change this color when you pass in a query you have a particular key which is selected once the key is selected we now obtain the value which is the data itself and is kind of similar to what we have in here and then from here in this a level of the split note that before the information has been passed into this skill dot product attention we actually pass this cure k and v into some different linear layers and so this means that even though we have the same input they'll end up being projected into three different inputs and so now we have this cure k v we are going to carry out cure k transpose here we have the matmul as we saw already cure times k transpose and then we have this scalene which you can see right here in this formula this attention formula we have cure k t divided by this d k then from here we have softmax of all this and then we multiply by v so let's get back up to this here that's fine now that we have this output you now see that we have this multi head so we we got this we have the softmax we have the matmul where we take this softmax of this multiply by v so that's how we get this recall how we saw that with a example we had previously and then we have this multi head attention now this multi head attention here simply means you take this year as you pass in your information like this you get this cure k and v and then you again pass the same information into this block so let's suppose that this is our skill dot product attention block this is skill dot product attention which is right inside here and so once we have this let's let's make that smaller let's suppose that this is what we have here so this year is actually this now to obtain a multi head attention we would have this other one year and then we'll have let's suppose that we have three heads if we have three heads then we would have three of this stacked in this way you have one two let's change the color so it becomes clear we have this one in red we have this next one in blue here and then we have this other one in green so there we go we have this three and then when the information gets in so you have your cure you have your k and you have your v we pass this to uh this separate linear layers see for for each of this we have some linear layer here all of this came from the same inputs as you could see here and then now this information is passed here so we have cure k v passed into this one into this block here and then this same cure k v also is passed into let's change the color we will now have some other here some other linear layers let's put it besides this we have some other linear layers here we'll pass v we will pass k we'll pass cure right here and then this now will be set into this self-attention block right here then we'll also finally have this for the red so we'll have something like this red we have the k something like this we have the v something like this so now this cure k v is passed now into this red here and that's it and then the outputs here the outputs will get at the end of this three self-attention blocks will now be concatenated and then pass through a linear layer so this linear layers is like our dense layer in tensor flow now once we have this you see we have our multi-head attention which is this block and then now we'll take this input add it onto the output and then go through a layer normalization then from here we pass this through a feed forward network that's like our fully connected network or dense layer and then we'll again repeat this addition and normalization a little similar to what we have with the resnets now once we have this now we can then repeat this n times now you notice that this is similar to this except for the fact that now we have this two multi-head attentions and we also have this maxed anyway we're not going to get into all those details what's important is for you to understand how this encoder here works and now we understand how this works we will now get back to our V paper that is this paper entitled transformers for image recognition at scale and now you should be able to understand this transformer block which we presented earlier in this paper and now with this understanding of how this transformer encoder works let's now get into this unit here where we break this image into these different patches as you could see right here to better understand how and why we make use of patches right here let's not forget that what this transformer encoder takes in is some input sequence so we have this input here uh initially we had words where each word like this could be represented by this vector or this embedding vector and then this now combined is passed into the transformer here since our input is this image in others for us to represent it this way we'll have to break this up so what we could do or what we could think of at first sight is we have this image let's suppose the image is 256 by 256 by say three channels then we could take each and every pixel here so let's let's omit the channel for the channels for now so what we could have here is for each and every pixel in this 256 by 256 image we would have a vector representing that pixel and then this other one is vector this other one is vector and so on and so forth but don't forget that unlike previously where we had only five words now we have 256 times 256 words because if we have an image like this and we have to get each and every pixel then we'll have 256 by 256 which is more than 65 000 different vectors which we'll have to pass here and so before we're in our attention model we had an attention map which was five by five recall we saw that already with words we had a five or an input sentence with five words we had five by five attention map now we would have a 65 000 by 65 000 attention map you see that working with these kinds of matrices and memory isn't very feasible and so instead of going pixel by pixel the authors decide to work patch by patch let's create this again so you get to see that take this off you see here we go patch by patch so you could see how this image instead of taking each pixel we bring this up into patches so this is now like a pixel and then you see this patch you see this patch this patch this other patch this patch and so on and so forth up to this patch right here now this is what is like the word now here so with images we have to break this up like this and the authors choose to work with 16 by 16 pixel patches so each patch here is 16 by 16 and so given that we have 16 by 16 if we have this patch for example then we would have 256 different pixels for each patch here we have 256 here we have 256 and so on and so forth so unlike with the images or rather unlike with the words where we had five by three so we had five words and each word was represented by the three-dimensional vector here each patch is represented by the 256 dimensional vector now this doesn't mean that in nlp we generally work with this this was just done to make it easier for you to understand so getting back to computer vision you see that we have this 256 256 256 and so on and so forth now when working with the transformer we may not want to work with this 256 dimensional vectors maybe we want to work with say 512 dimensional vectors in that case we would have to do this linear projection of the flattened patches such that we leave from this um say let's suppose that we have one two three we have nine patches so the sequence line is nine so we have this input which is nine by 256 and then after going through this linear projection we now get to nine by 512 and this will be the embedding dimension for our transformer in the previous example our embedding dimension was three three so if this this permits us to be to to work flexibly as now we could decide on what size we want for one embedding dimension now that said we have this output you see nine by 512 and then we're ready to pass this into the transformer encoder but just before passing this we would add this position embeddings you see there we have this input you see in this different this color you have them getting in and then we have this position embeddings here is notice zero one two three and up to nine now the way this works or or let's start by first the reason why we even have to do this is because unlike with the conf nets where where the convolutional or the way the convolutional neural networks work is that for computing the feature maps it takes into consideration locality so this means that you see these two portions here um when passed with a conf filter will produce a certain output and so this means that exhaust which belong to a certain or to a small locality like this one will be used to produce the output and this clearly gives an gives the CNNs an upper hand over the transformers as when trying to understand an image the positions of particular pixels actually matter so this means that CNNs already have an inductive bias due to the way they actually work and so to give a helping hand to the transformer network will now need or will need this position embedding which gives this transformer encoder an idea of the location of each and every patch which is passed in but again it should be noted that this will have to be learned automatically by the model now if you notice we have this extra input right here and the reason why we have this extra input is simply because we do not want the situation where after going through this encoder or this transformer encoder right here we pick one of this outputs because we would have outputs here we don't want we don't want to pick one of these outputs to be used for the MLP head or to be used for this fully connected network in this classification unit right here so to avoid this sort of bias where we would be picking one of this the authors add this extra learnable class embedding right here which will be or whose output will be passed into this MLP head and then will be used for classification another important point to note here is the transformer encoder or this visual vision transformers are some sort of hybrid architecture because we may decide not to pass in those image patches directly but instead pass those image patches through a convolutional neural network then get the output embeddings and pass in your directly instead of this image patches it should be noted that the multi-layer perceptron contains two fully connected layers with a julu non-linearity here's the general julu non-linearity compared to the relu and the elu so you see we have this relu where all values less than zero all values less than zero gives output of zero and all values greater than zero give the exact same value but with the julu we have this curved function right here so that's it the type of normalization is the layer normalization as we mentioned already and the layer normalization here we could visualize this in this paper by Sheng et al entitled power norm or thinking batch normalization in transformers where you see we let's zoom this you see we have layer normalization here and we have the batch normalization put side by side so over the layer normalization as we were saying if you consider some inputs let's let's here we have the sequence length or the sequence dimension we have the features or the embeddings or like a vector actually so we have the different vectors and then we have the batch dimension so basically what we're saying is we have in this sequence length or we have these different vectors here which have been passed into some layer and then instead of doing or carrying out normalization for for throughout the batches as is in the case of the batch norm here we're carrying out this normalization for each and every vector and the reason why we do not use the batch norm with the transformers is the fact that the batch statistics for nlp data have a very large variance throughout training and this variance exists in the corresponding ingredients as well and so to avoid this kind of situation it's preferable for us to carry out this normalization on the features instead before we move on to the experiments let's look at how the vids are being used in real world so actually the vids are pre-trained on very large these sets and fine-tuned to smaller downstream tasks obviously when fine-tuning we remove this head and replace with a head which now correspond to our number of classes so this means that initially we may have a thousand class head and then we move this to k classes or let's say three class head to better understand why we're going from while we have a d by k output let's get back here so after those inputs have been passed in here we have an output sequence length plus one this plus is one year or let's just say we have here we have say from here a one by d output if if we're considering all the sequence length will be a sequence length by d output this deer is our embedding dimension which we had fixed from this linear projection right here so we have this one by d and then we pass this through obviously it becomes it becomes like simply uh d neurons so we have one two we now have d neurons since it's just one by d and then we have this output let's say we have a thousand classes then we'll have this fully connected layer which brings all this year this d inputs to this k outputs or in this case to this 1000 outputs now where we want to fine tune we're going to take this off take this off and replace this now with k outputs so we now have k outputs right here and then we initialize this weight of this fully connected layer the others also make mention of the fact that during fine tuning is better to work at higher resolutions so this means that the model could be trained at 256 by 256 and then later on fine tuned with 512 by 512 images and then since they keep the patch size the same this results in a larger effective sequence length now let's explain or let's visualize this statement so here we have this input which is say 48 by 48 let's say we have your 48 by 48 and when we divide this our break this up into three parts we have 16 16 16 16 16 16 so we have 16 by 16 patches now if we want to fine tune on a higher resolution image then uh let's say the higher resolution image is say 96 by 96 so could have something like this so if now we're fine tune on the 96 by 96 image and that we still maintain the fact that this year or the patches will be 16 by 16 then this means that instead of three year we're going to have six so we now have or one two three four five and six patches six patches this way two three four five six and so on and so forth so now we're going to have 36 different patches instead of nine patches as we have here and that's why they make me have the fact that the sequence length is going to be increased and that's so long as they can fit in the memory now due to this modifications the pre-trained that's what we had before the pre-trained position embeddings may no longer be meaningful so they therefore perform 2d interpolation of the pre-trained position embeddings according to the allocation in the original image the experiments here we we could see those different models they have the vid base the vid large and the vid huge number of parameters 86 million to 632 million then here we have 12 layers recall let's get back here recall we have this number of layers here so basically you're repeating this you're repeating this year 12 times so we'll get back so there we have 12 layers for the vid base as we stated here and then we have 24 for vid large and 32 for vid huge then this hidden size this d this embedding dimension is 768 for base 1024 for large and 1284 huge the mlp size that's uh fully connected layers uh to 3072 4096 here 5120 now the number of heads remember the attention heads 12 16 16 the experiments were carried out on this gft 300 median data set i will see how this 14 by 14 patch version of the vid outperforms this resnet 152 now this uh performance although not largely uh greater than that of the resnets requires less computation resources to train as we see here we have 2500 tpu core days required to train this model as compared to this one which requires 9900 tpu core days also from the plots you see that when you increase the number of pre-training samples the model which performs the best is this vid right here see here we have this vid and this outperforms the resnets whereas uh for a reduced number of samples the resnets outperforms the vids while here the smaller the patch size like here we have this fortune fortune by fortune patch size we have the better the results now in order to understand the reason why as you increase this uh data set size the vids start to outperform the conf nets we have to recall that when working with confidence like the resnet there is some inductive bias in the sense that the fact that this resnet takes as input this two-dimensional image already gives this conf net a helping hand when it comes to extracting features from here and so even with relatively smaller data sets these confidence can make sense out of this input image now with the transformers which are some sort of generic neural network the model doesn't get to see the image in this its natural form what it sees is some patches which have been converted to some vectors and so at the very beginning or with small data the transformer model finds it difficult to make much sense out of these patches but as soon as we increase this data set to considerable amounts this transformer model now free of the inductive bias can even do better than the confidence and interesting enough you'll notice that after training a transformer model this position embeddings we call the position embeddings which are added onto the patch embeddings before passing to the transformer actually learn on their own to encode the position of the patches you can see from this uh plot here where we have the input patch row and the input patch column you see that this one one you see the position this is gotten by the model or this is learned automatically by the model during the training process you see two one it goes a step in the direction and maintains this direction or maintains a row and you see three one maintains a row that goes three steps you see this you see you see that and then finally here you go you go several steps to the right and then seven steps uh downward then here to the left you could look at this you see this um this embedding filters right here we have this embedding filters which we see here which look much alike to the the conf net filters then to the right you have this plot which summarizes the reason why the vids end up being more powerful than the confidence to understand this let's take this here who consider a conf net with a given depth now with a conf net the initial layers let's let's let's have this conf net and we break this up so we break this up we have our initial layers and then we have our final layers this initial layers permit us um extract low level features while the final layers permit us extract high level features and so if we have an image like this like this one and we have this head and we have this then given that we're passing these filters or this conf net filters here you see that this pixel for example attends to this other pixels which are found around its locality and then as we go deeper in the network we would have this pixel here which now tries to attend to this other pixel here which is much more far away from it to better picture this remember the example we took for three or rather two three by three filters compared to a single five by five filter let's let's let's draw this here we compared this with a single five by five filter and we saw that although this five by five filter had a larger receptive field compared to a single three by three filter which we have here making or stacking up this three by three filters that is making our network deeper permitted us to still be able to capture this part of the image and so this shows us that with the confidence in the earlier layers when the when is not yet deep enough we still capturing this local information and then as we go deeper we start capturing much more global information and so if we're to have this kind of plot here where this we have mean attention distance and here we have the network that would see that for a conf net will keep increasing this up to a point where we may we will no longer be able to continue increasing because as this this network depth or this will increase the number of layers we are able to capture much more global features and so this mean attention distance keeps increasing but with the attention or with the transformers since each patch attends to each and every other patch as we have seen already with the self-attention each and every patch will attend to the other right from the very first attention layer we are not going to have this but instead this plot we have here and so this means that if we train our VIT with a very large data set right from the very first layers we are able to capture both the local and the global features and this is what makes the VITs more powerful compared to the confidence when we work with big data here we can also visualize what the model sees by looking at this attention maps you will notice that after training the model you see we have this attention here see here these are pixels which pay much attention to one another see here these pixels are paying attention or much more attention to one another as compared to the other pixels in summary to understand or to visualize what goes on when training to CNN and VIT model side by side you'll see here that with the VITs as you increase this data site this increase this site and this is increased data size here so as we increase the the data size or rather when we start with small data sizes we have this kind of accuracy while for the CNNs we already have reasonable accuracies even with small data size and then as we keep increasing this data size as we keep increasing this data size you see this accuracy keeps increasing while for the CNNs VITs start to plateau at some point and this plateauing is simply comes due to the fact that this CNNs here are limited by the inductive biases whereas these transformers which are more generic neural networks are free to learn even better from these larger data sets hello everyone and welcome to this new and exciting session in which we shall be building our own VIT model from scratch in the previous section we saw how this transformer model which previously was used for NLP tasks could be used in computer vision with proper preparations and so in the section we'll see how to convert or create patches of this image and then carry out those linear projections and pass this output from those linear projections into the transformer encoder to then create or to then train an end-to-end model which learns how to see whether an input that image is that of an angry person a happy person or sad given that we've been working with 256 by 256 images if we have to split up those images into 16 by 16 images then we would have 256 different images because here if you have 256 year and then uh break this up year into 16 by 16 so here you have 16 pixels by 16 pixels then you would have to go you have to do this 16 times horizontally and 16 times vertically to form a 256 by 256 image and so um from here you would have 256 different patches like this one and so what we'll use to create these patches will be this extract patches method right here now this extract patches takes in the image it takes in the sizes takes in the stripes and the rates with also the pattern so let's look at how this works from here we have this picture here or we have this output which will help us picture how this works but before looking at that let's look at its arguments we have the sizes or we have the images for the tensor we have this sizes which specifies the patch sizes so in our case we have 16 by 16 patches so our patch size will be 16 we also have the stripes which tells us how much we should shift while creating these patches and then we have the rates which will better understand with some figure so let's let's now get back to this year and then we see that we've been we here we pass in the sizes of one three three one so this list we pass it in here we got this from here so you see we told it must be this list where we have one the size row the size column and one but since the size row equals size column equals 16 then here we would have 16 by 16 now in the case where we have three three it you you could see here let's let's have this here you could see here let's take this off you could see here that we have this three by three so here unlike with our example where we're working with 16 by 16 pixels here they have three by three pixels and then the next one you have the stripes five by five now you will notice that let's scroll this way you will notice that when this box or when we want to get the next patch we shift or we go five steps to the right and then get this next position or this next patch right here so you can see after shifting we get to this so originally we had this image here and then we're trying to get these different patches now the patches are what we have in stars we can see and then again we shift five steps downward we go downward again and then we get to this you will see that if you can count this one two three four five steps and then we get to this and then we shift again and so on and so forth so that's how we obtain all this three this four patches right here now the next thing we will look at will be this rates now the rates again you have this one and this one and then you specify this two rate values here now we'll look at or we understand these two rate values by looking at a dilation or by looking at a dilated convolution operation we can see this visualization from this medium posed by Sig Ho Tsang where unlike with the usual convolutions where when we have in this filter which passes through the image you see we have this three by three filter which is compact here so there's no spacing between each element of the filter but with a dilated convolution you see the spacing here so it's kind of similar to what we will have here as let's get back to documentation as with this if we decide to have patches where there is some spacing between the pixels then we could specify how much paste we want with this rates right here now the pattern here is set to valid and so this means that if some pattern needs to be done to match our with the extraction then it will be done automatically now that said we could copy this out copy this simply and then we paste it out right here so here we have this and then we'll call this patches so we have this patches and then we'll pass in our test image right here there we go we have test image that's fine and then here we have 16 by 16 now this is the patch size so we could do well to put that in the configuration so we could we could add that to the configuration and the strident 2 is 16 because we want to let's get back here the strident let's see why we need the strident to be 16 now you see that we're interested if we have this kind of image we're interested in we are not interested in skipping this part so we don't want to skip any parts we want to have this then we want to have this next you see this what we want to do then we want to have this next and this next obviously there'll be padding year and year so account for this space here and then we want to have this next and so on and so forth so because we want to have this we'll make this value to be the same as this such that when we have this three by three patch we skip three steps and move to this next and so on and so forth so that said if you want to have this kind of compact patch extraction then you want to have this this the value here for the stride the same as the size so with that now let's configuration patch size we have that the 16 by 16 that's fine and then now let's run this year let's also run let's run this year this image test image then once that's fine we run this we get this error let's check out it must be four dimensional okay so what we'll do is instead of taking this test image we'll do expand deems so we create this extra dimension and then we add this to the zeroed axis so we add this extra dimension run that again you see we have our patches now look at what we're gonna have when we print out the patches shape patches that shape what do we get see we have this 16 by 16 by 768 now let's explain why we have this now recall that we have a 256 by 256 by three image meaning that we're going to have three channels like this 256 by 256 then by three now since we're dealing with patches i think it's preferable we should take this at once so this is 16 by 16 patch 16 by 16 patch here we have this 16 by 16 patch and then here again we have this 16 by 16 patch right here so now we have this three 16 by 16 patches and then given that each and every one of those patches is 16 by 16 and 16 by 16 is 256 you'll see that if we pick a given patch like here it's a 16 by if we pick a 16 by 16 patch then this third dimension will be 768 simply because for each patch we have 256 pixels per channel so here we have 16 by 16 256 pixels which make up this patch now this other one 256 and this one 256 and now if you sum this up it gives you 768 and that's how this value is gotten now to plug this out we are going to go through each and every patch so we'll take this both vertically and horizontally create the subplots 16 by 16 because we'll have 16 by 16 different subplots and then for each subplot we have this image is here we pick ij we pick a given patch out of this 256 patches we pick that patch and then we're going to reshape this because when you pick this patch you're left with this 768 here when you pick the patch you're left with these pixels which have all been flattened out to this 768 dimensional vector that we calculated here and so now what we need to do is reshape this back into a 16 by 16 by 3 pixel obviously 16 by 16 by 3 will give you 768 and so that's it let's run this now you can always increase this figure size or reduce it depending on how you want to view this so running this this what we get is output there we go we see this image which initially was compact now we've breaking this up into several patches now before we go on to create our patch encoder which is this whole section right here what we'll have to do is convert or reshape these patches here so recall that in this paper after doing this year that's after creating these patches we have to take each and every one of them and this will be considered as one element of the sequence so here we have nine patches and here we have nine values which have been passed or nine different inputs or vectors which will be passed here and so if we get back here since we have 256 of this then we'll need to reshape this such that each and every one of this will be considered as a part of our whole sequence so here would print out let's print out let's let's reshape that so we have patches equals reshape and then we have the patches and then this will give us patches that's a batch dimension patches shape zero then the next one with negative one and then here we have 768 now either we put negative one or we explicitly write 256 so this means that we have decided to put this now into uh 256 uh sequence length tensor now with this let's print out the patches shape and you see what that gives us see here we have 256 now if we put negative one year it's going to automatically give us uh this 256 and that's it so we we have this oh let's let's rerun this so we get that difference so you see here 216 by 16 and then 256 and that's it so each and every one of those 256 now will be passed into our transformer now we could modify this uh print the this plot of the image so let's let's have that and then we'll say we have uh patches shape uh one so we'll take 256 and then yeah we have system by 16 and then we have simply i plus one okay so we have that we could take off this k now we don't need this again take that off and then here we have this i here which will now reshape into this and this should be fine so there we go let's run this again and that's it we get the exact same output which is normal since we just uh restructured those inputs so with this we see that we are now set to create our patch encoder layer and this patch encoder layer will be similar to the kinds of layers which we've been creating so we could just copy this here reduce that and then paste it out here so we'll call this our patch encoder layer there we go we have patch encoder and your patch encoder so this patch encoder will be responsible for first of all converting our image into patches and then um carrying out the projection and adding the positional encoding patch encoder that's it and now here we have this linear projection which we would um create here so let's call this linear linear projection and we could do this with our dense layer so we have this dense layer stick that off and then yes we could select our hidden size dimension or embedding dimension now since our input is already let's uh write this out here we have inputs of uh let's take this off uh batch dimension or let's let's take the batch dimension off we have inputs of 256 by 768 already so this is our hidden size dimension for now but what we could do is we could convert this into 512 so let's do just that uh we could let's no let's let's take 1024 let's make the model bigger but this is too many parameters already it will take much time to train so let's let's let's stay with this okay so we stick with that and uh let's get back to the code we're going to specify our embedding dimension uh let's say hidden size as is in the paper so our hidden size we specify the hidden size and that's it we're going to pass this in here we have hidden size there we go and we do not need this batch norm we have the linear projection we have this hidden size okay let's leave it this way for now so uh we have this call meta we should take this input and that will be it we'll have this calculation we should saw already here so we would have this patches let's copy this uh from here okay so we we get the input we get this input now this input we'll be getting here we wouldn't need to do this expandings because the when training we already have the batch dimension so we'll just have x let's call this in no let's let's let's have that as x okay so we have that x and that's fine then we'll do the reshaping that's it and then from this reshaping we'll have our output uh output will be this uh patch which has been projected to this new dimension and so here we would have output and self linear projection which takes in the patches then to make this value dynamic we're going to have let's take this off we're going to have the patches shape negative one so this value we have here is the last dimension of all patches so we have that and we get that last dimension now that we have this set we now ready to add up this positional embeddings we have here so we will add this different positional embeddings here onto our linearly projected patches we should also note that we are not going to take into consideration this class embeddings which I mentioned in the paper as in practice this is not really important we have here Lucas Beyer from the Google brain team who says the main aim of including this was to reproduce the exact transformer network but on image patches and so in practice we are going to stick to this linear projections the positional embedding now will be constructed using TensorFlow's embedding layer so here we have positional embedding as described in the paper embedding on encoding and then here we have the embedding layer now let's check out on how this embedding layer is constructed or first of all what it's about now this embedding layer here as described Tfkrel's layers embedding turns positive integers into dense vectors of fixed size now getting back to the paper at this point you see we have this linear projections here we have this linear projections which need to be added up to the positioning coding which is this once now the linear projection we're going to get from here if we take if we include the batch dimension we'll have an output batch size by number of patches so let's call this np number of patches because obviously we have these patches here and each patch is going to be a vector so it's number of patches by that last dimension which in this case is 768 so you could always fix this hidden dimension this all hidden dimension let's say batch by np by hidden dimension and then here let's take this off let's take a batch of one so if we take a batch of one we would have one for the batch dimension then the number of patches would have will be 256 because if we have 256 by 256 image and we break it up into 16 by 16 pixel patches then we'll have 256 different patches so here's what we have for this one and now with the embedding layer what we'll be able to get will be another tensor which is like this in the output of that embedding layer and so as we can see here it takes this indices and then turns them into the dense vectors which we're interested in now the arguments we have here the input dimension output dimension initializes regularizes as we can see here and then we can see this example where it converts for example this number four into this 2d vector this index 20 into this other 2d vector and then from this example which we could copy out and test in our code let's test this out let's add this here so we have this which we could test out so you better you see better how it works you see the model takes us input this integer matrix by dimension by input length and then gives this other output now in our case since we're working with image data we do not really have this vocabulary we should talk about here and so instead of this vocabulary we replace this by the number of patches now if this were a natural language processing then this will be the total number of words that our model can treat and so if we could have a vocabulary of say 300 000 words and that's what we're going to pass in as this input dimension but as we said here we're going to take this to be our number of patches now this input dimension is as I say here is a dimension of the dense embedding so in this case they they put in the value 64 which we will change to 768 shortly now that said we will have this model which you define take that off we have this input you see the size of this input is 32 by 10 input there's a batch dimension and here are the different indices so we have this input and then we're going to print out the shape from here so let's run this there we go you see it takes in this and then outputs this one now as a vector so basically what we have is we could have let's take this off what we have is we have some indices let's take this off we have some indices let's say we have like in this case they have 10 indices so we have this this this this seven eight nine ten okay so here we have this 10 indices and then oh we're not a batch dimension so we just have this 10 indices here so this 10 indices now could be projected into its two-dimensional version where this one will be represented by a vector so just like the patches so you have a patch represented by this vector this one presented by this vector so this is no longer represented by an index say let's say because here this random values take value between zero and one thousand so it's no more a value like say 900 so this maybe was 900 now will be converted into a vector and the the size of this vector we see here will depend on this value we set here so you see this this means that this this this one year now becomes a 64 dimensional vector and so each and every one of this becomes 64 dimensional vectors and that's why we go from this 10 to now 10 by 64 which you could see here if we ignore the batch dimension so what we'll be doing in our case now let's take this off what we're doing in our case is we have this 768 and then here we have 256 and then the size of this input which is going to be some random input we take the size to be 32 by 256 since we have 256 patches so we have that 256 let's take this we could take this off you could check out the the reason why we need the input length documentation but for now we don't need that so we'll have 32 by 256 and then we expect to have an output which is 32 by 256 now by 768 let's modify this let's take one by size of one so let's run this which we have 1 by 256 by 768 and you see that it would match now with this year with this year from the projections or from the predictions of the inputs now that's set let's simply copy this year we have this layer which we're going to copy and add that year so instead of this embeddings we have this embeddings there we go and then here we'll pass in the number of patches so here we have number of patches and then this one is this hidden size so here again we have our hidden size and that's it so now we have the hidden size we get here and then instead of having this we have this plus the positional encodings so we have self dot positional embedding we could call that embedding and then what this will take in will be an input of length number of patches so we get back here and then we would have embedding embedding input okay use the tf range and then we have the start we could start from zero this year the indices so we have this and then we have the limit our limit is going to be the number of patches because we want to get the this indices going from the zero value to some n value such that the length of this tensor we're going to create is going to be such that is equal the number of patches so our limit year is number of patches in that sense we could have this year so we have number of patches already from here so we could get self number of patches there we go we have number of number of patches okay so we have our number of patches that looks fine and then we could make use of that year so we have that number of patches and then we are going to set our delta so we have delta will take to be one now obviously we set delta to be two it means you're going to go from zero to number of patches why it's keeping two steps and that's not what we want so we we want number of patches elements in this embedding inputs so with this now we could pass an embedding embedding input and that should be fine okay so we have this year looks fine and we'll take this off so now we return we could return our output simply now it should be noted that this embedding layer here is similar to the dense layer but with a dense layer when you have an input x you see with a dense layer when you have an input let's write it here for the dense layer when you have an input x that input x is multiplied by the weights and you add the bias so this is how we get the output but with this embedding layer is a simple metric small application so when you take an input x you multiply it by the weights and you get the output with that now let's run this cell here and then we move ahead so we have this run and we could delete this two cells now now to test this out we could define patchank we have our patch encoder patch encoder which we've just created here so a patch encoder and this takes into the number of patches and hidden size so let's specify those 256 and 768 so we have this and then now we could have patch encoder and then pass in an image and see what we get as output here's image we have the zeros there we go 1 by 256 by 256 by 3 so now when we pass an input image we expect to get some output of shape batch size by number of patches by the hidden size or the number of hidden units and now that we've built this year up to the point where we have our embedded patches let's go ahead and build this transformer encoder right here as you can see we start with a layer normalization multi-head attention and then we add this this input to this output we have the layer normalization again then the multi-layer perceptron and again we have this addition and then we get the output now to build the code for our transformer encoder we're going to paste out this patch encoder here we remove this we have transformer uh there we go and then this transformer is made of a first norm layer and then the second norm layer let's have this layer layer norm one it's our first norm layer layer normalization there we go that's fine and then we have our second layer normalization layer so call this layer norm 2 now we've defined our layer normalization layers we can go ahead to define our multi-head attention layer so here we have multi-head attention and there we go again your multi-head attention you could check out the documentation for this so in documentation you have the different arguments which we could pass in the type of values which the multi-head attention takes in so here uh we will specify number of heads and the key dimension here this key dimension is the same as the hidden size or the number of hidden units which in our case we fixed at 768 we could also have the value dimension drop out and this other arguments now what this takes in as you would see here is a query value key and attention mass which we will not use here return attention scores will not do this this training will not specify this but what we'll specify will be this query and this value and then since we are not going to specify the key it's going to consider that the key is the or rather it's going to consider that this value here is a key so it'll be the same value that said here we would have the number of heads let's take this off multi-head attention number of heads and then we'll also need the hidden size uh yeah so we fully have the hidden size so we'll leave that away it will replace number of patches by number of heads okay so we have our multi-head attention number of heads hidden size and then the next will be the dense layers so uh as you saw in this model here you see we have after the multi-head attention we have layer norm 2 and then we have this MLP here which is made of two dense layers so we have our self dense one which is our dense layer and then the number of units here will take it to be hidden size and then we'll specify the activation to be a JLU activation as specified in the paper so we have JLU and then we'll repeat the same process for the second dense layer so here we have that and dense 2 okay so that's it we will specify those two dense layers and we now get our output so we should have that and then now we could go ahead and call so let's before going to that let's modify this here transform my encoder that's fine the name transform my encoder there we go that should be fine okay so we building our transform my encoder layer already and now we set to pass this input into these different layers we have our input here x there we go we have x let's take this off and then x gets into the layer norm 1 so we have our output x x equals layer norm 1 which takes in this x here takes our input x and then our output is this we could call this in and then we have this input input we have our input which gets in here and then produces an output x and then from here we get into the multi-head attention layer so we have x again this is multi-head attention and this takes in x so now we have this input which gets here gives gives us output this output gets into this multi-head and gives us output now that we've had this remember from the paper that we have this addition to do so we have this add layer which we need to specify here because after this output got into the multi-head we got this output which now needs to be added to this input from the patches so we need to create this link here so that said getting back to the code we have x now which is add and this add will take in this x and the input then from here again we take this and pass into our layer norm so now we'll call this let's let's call this x1 because we'll need this x1 again so we can't just be working with x you understand shortly why we need this x1 so we have this and we have your x1 okay so we have that and that's fine everything looks fine then from here we take this and pass or take this output x1 and pass into our layer norm 2 so we call this layer norm 2 that's it and then this takes in our x1 produces output x1 then from here we take this and pass into our dense layers so we have this x1 which is equal dense 1 and this takes in x1 and again we have x1 which takes in dense 2 sorry which takes x1 which is gotten from dense 2 which takes in x1 and now let's say this produces an output x2 remember this like our final output so let's call this output let's let's call this output okay so that's our output right there now let's do this then for this output again we're going to do this addition you call from the paper you see after doing this addition you get an output we get an output that we take this output from here and add onto this that's why you see we had to create x1 for this and then x2 for this so let's get back to our code we have that and then here we have our output now which is equal add that's it and then it takes in our output and this x1 here or rather this x1 after this addition so we need to have this as x2 so let's change this to x2 so we could make use of this now in this addition because if we if we have this if we maintain this to be x1 then the value we're getting here would be this final x1 we got here and that's not what we want what we want is this output from here so that's why we had to change this variable name so we have x2 this takes in x1 then that's it so from here now we take this output and add it to this x1 there we go we have that x1 we get now our output so we could close this up and that's fine so we return our output and we have our transformer encoder right here now we can go ahead and test this so we have our transformer encoder number of heads hidden size that's fine and then we also have this input right here recall that the input or the patches take this form so let's run this and see what we get we'll get an error from this yeah this is logical because we need to pass in at least two inputs so here if you recall from the from this year from multi-head attention you see this we have this two which must be passed in then this key is optional this one optional and all this rest optional so we must pass at least this two so that said let's get back and modify that so in this multi-head attention we have this and then we have x1 so that's fine so now we have that let's run this again and see what we get you see that's fine you see we get exactly what we expect so there's a kind of input we get and here's our output after going through the transformer encoder now although this works make sure that you check your work or check this code and be sure that everything is working as it's supposed to be so be sure that you pass in the right inputs and get the right outputs and so that you get the exact output you ought to get and not some outputs different from what you're what you're supposed to have so that's it we have our transformer encoder and now we'll head on to building our VIT model so here we would have our VIT model let's get back up here let's copy this rest net from your complete network we had this so just simply copy the same structure and then paste it out here now although there's not much difference we'll just the main difference here is that we will have this model instead of layer so here we have a model and then here we will call this VIT so here's our VIT model our vision transformer model and then here we change this we have the vision transformer and we'll call this our vision transformer vision transformer does it okay so there we go we could take this off take this off and all this off then we're building in such a way that it takes in the number of heads the hidden size then from this patch encoder the number of patches so we just need number of heads hidden size and number of patches so with that would specify number of heads there we go hidden size and then number of patches okay so we have that and then here we would have our patch encoder which will be defined so we have patch encoder and this patch encoder which we have in right here is simply what we created already so we just have patch encoder and what does this take so we get back here and we look at this format see number of patches and hidden size so that's why it's important for you to test this as you go on so since we've tested this we show that this works when you pass in an input image so your patch encoder number of heads and hidden size alright a number of patches and hidden size so we have number of patches then the hidden size which we'll get from here now we have this patch encoder we have our transformer encoders recall that we have several transformers here see we have l transformer encoders so we get back here and then we define this transformer uh let's call it trans encoders okay so this transformer encoders will be at least made up different layers and the length of this list will depend on the number of layers so we have your number of layers and then what we'll do is we'll say for underscore in this number of layers or rather in range number of layers see we are then going to define the transformer encoder so we have transformer encoder so we have the separate transformer encoders and then we'll specify number of heads and hidden size so here again we have number of heads and then we have the hidden size the input gets through this patch encoder so here we have our patch encoder that is it this way we have x let's call this input so here we have input and this passes through the patch our patch encoder and then produces the output x then once we have this output x what we are going to do is we are going to loop through or we're going to go through each and every transformer encoder layer so here we have for i in range of self that number of layers number of layers let's define this here number of layers equal this number of layers from here so set our number of layers and then we take this in here so now we're going to go through this and then what we'll be doing is we'll be having this x which is going to be the output from this transformer encoder layers so we have trans encoders i which takes in the different inputs x so we'll do this for a number of layers times and then for each of this we have trans encoders as we've defined already here and we pick the specific transformer encoder now once we get the output from here see once we get our output from here we'll now obtain an output which will flatten so from here we'll get we'll say x equal we'll flatten this out take an x and then from this we'll have this MLP block right here oh let's get back here oh we have MLP block here okay so we have this MLP block right here that's after this now on the paper what they had done was since they had this class token they picked this last or rather they picked this first output here so this means that if we have let's take from here if we have an output by size one let's take this off one by 256 by 768 then out of this 256 here or 257 because they'll have this additional so out of this 257 we'll pick out only one from here and so we'll be left with that one which has 768 heating units so basically this will be like having a one by 768 output but now since we're taking all this into consideration we'll just flatten this out and then pass this through our MLP head right here so that's it let's get back to the code we flatten this out and then we we we specify the dense layers which make up the MLP head so we have dense one dense layer this dense layer has certain number of units let's say a thousand or let's let's let's give this dense units here let's let's have it as this argument so we have number of dense units okay so we have our number of dense units there uh let's specify this number of dense units and then we have the Jell-O activation CF and Jell-O okay so we have that for dense one and now we should have or we should write this code for dense two so we have now dense two and then here same number of dense units you could always modify this and that's it next from here we have x equal dense one that's it dense one this is self okay so here we have dense one which takes an x and then dense two which takes an x now our final this our final output dense layer has to consider the number of classes so just as what we have been doing so far in this course you let's take this one this complete block here so so far what we've been doing is we always ensure that our output dense layer has the number of classes number of units in this output so here we would simply copy this scroll down back to our code and put this right here so that's it let's call this dense three dense and that should be fine okay so here we have dense take this off that's fine we have that now let's run this should be fine let's go ahead and test this so here we have our VIT model and we'll define VIT because of the IT and the parameters those arguments here a number of heads hidden size all of this we're going to define this just down here scroll down and then define this here okay so we have now this number of heads uh we say eight heads hidden size 768 uh number of patches 256 the number of layers let's say four layers the number of dense units let's say 1024 okay so there we go we have our VIT and then this VIT now from the we pass in this um input to 1 by 256 by 256 by 3 and we should get a reasonable output we could print out the summary so VIT summary and we could check out this model so you see here we have 283 million different parameters for VIT model now let's reduce this we could say four heads just two layers and then this number of dense units here we could take 128 now let's run this again and there we go you see the two layers you see the patch encoder and then you see the dense layers we should follow this uh transformer encoder layers so now we have this model we'll change this to the batch size so we have configuration and then batch size now the reason why we're doing this is because when we're training our batch size is going to be known so we don't want to have this uh here like that so with that now we run this again and run this summary uh yeah we need to change this to 32 you see when we change this it doesn't work so here we have 32 same as our batch size that's it and now we can go ahead to compile and train our model our model on our training note that you we wouldn't get the best results because obviously the VITs need very large data sets or even extra large data sets to perform as well as a confidence and so when working with the VITs generally we want to train on a very very large data set and then later on fine tune on the smaller data set towards the end of the epoch we still get this error and the source of this error is the fact that since we've fixed the batch size that's uh if you get backed up here in our training you see if you look at this here we fixed this and so since this isn't dynamic now we have a data set which has been broken down into batches of 32 and now towards the end you may have a batch of say for example eight obviously because the data set the data set isn't necessarily divisible by 32 that is the number of elements you have isn't necessarily divisible by 32 and so you have a remainder and so when you have this remainder and that you fixed this here you should get an error because now you've told the model to always use at this position a value of 32 so to avoid that instead of doing the patches shape we're going to do tf.shape so with tf.shape before we had patches we got the shape and that was it but now we instead using tf.shape so we call it the shape method from tensorflow and then here we're going to pass in our patches so once we have those uh once we have our patches passed in we now select this batch dimension and that should be it so let's run this again you would see that even from here we could modify this we could put 32 and it'll still work fine okay let's see too so with that now let's go ahead and compile and then restart the training the model is training and as you could see it starts to stagnate around this 44.4 percent accuracy and 44.16 percent validation accuracy in the next section we'll fine tune a vid which has been trained on a very large dataset hello everyone and welcome to this new and exciting session in which we are going to be looking at how to fine tune an already trained transformer more precisely vision transformer model using the hugging face library and tensorflow tool hugging face today is at the forefront of practical AI and it permits practitioners around the world to build train and deploy state-of-the-art models very easily it's also used by thousands of organizations and teams around the world so here we have a host of different tasks which we could solve with readily available hugging face models like audio classification image classification object detection question answering summarization text classification and translation in our specific case we are dealing with image classification and if you click open right here you see we have this uh image classification page where you could already test an image on this vid model here so this is the vid base model with patch size 16 and the image size 224 we also have this facebook's uh DIT base distilled model and with patch size 16 image size 224 you could browse a host of other image classification models here as you can see here we have these different models sorted in out of a number of downloads see 219 000 times this model was downloaded now we'll be working with this vid oh we're fine tuning this vid by google right here here we have the model description so you could have that we've seen this already in the paper um the intent that uses and limitations and then how to use this model without any fine tuning so here you get passing your image and then um already run classification on this image using this vid model right here vid for image classification model now here you will notice that this is uh we suppose that we dealing with a pytorch model so we could check in the documentation here where we would see this vid model on the left side together with many other different models which are available for free and you see here we have this vid you you could also check out the DIT the distillation transformers let's get to D we should have DIT around this there we go here's the DIT you see you have the DIT and this documentation right here so uh we also have the swing we should have seen we looked at the swing previously so if we check out the swing somewhere here uh swing s where we are swc you have the swing transformer right here so that's it see the swing transformer we have seen already now let's get back to our vid vision transformer okay so we have this vision transformer right here and you could uh here go through the the whole documentation so we have the vid config vid feature extractor let's analyze this so it becomes clearer there we go we have this feature extractor the vid model vid for masked image modeling vid for image classification and you'll notice here we have tf vid model so the difference with this is this is pytouch the model and then this is the tensorflow model which we'll be using now the the tf vid model and tf vid for image classification now recall that you have your vid model which starts from the patch you have the patch once you have the patch you have the transformer encoder and then from here you have you take this you have this mlp head and then you have your output right here now with this tf vid for image classification we have all this full year so we go from this patch right up to the mlp head but with the tf vid model this part this part is not included so what you get will be only this output right here so with a tf vid model we just we have this whereas with the tf vid for image classification we have all this now apart from tensorflow we also have the code for flex so you could check out flex vid model flex vid for image classification now that said we are now going to focus on fine tuning this vision transformer model we'll start by first installing this transformers library right here so we have pip install transformers now that's fine we move on let's go ahead to start with the fine tuning but then let's get back to documentation and here we have the overview we could check out those vid config right here and you'll notice that all those different configurations are basically what we've seen already so we have this hidden size 768 as default value number of hidden layers 12 so here the stack in 12 different transformer encoded blocks number of attention heads 12 intermediate size 3072 now for this 3072 actually is for the dense layers we have here now the input you get in from this normal layer or right from here we have say 1 by 256 by 768 then this gets in and gets to this point where we still have the same so up to here is the same and we expect that this output here should be of the same shape but then in this mlp layer we have two dense layers now the first dense layer will convert this to 1 by 256 by 3072 and then the next dense layer would convert this to 1 by 256 by 768 so that's why they call this the intermediate size right here so this intermediate size you can see dimensionality of the intermediate feed forward layer in the transformer encoder we have the hidden activation JLU hidden dropout probability so you have some dropout here attention probabilities dropout attention props dropout probability initializer range layer norm epsilon value to better understand this we could get back to the layer norm documentation in tensor flow you see epsilon here by default is one times ten to the negative three and then we could see where exactly it's been used so recall that with normalization we have x minus a given mean divided by a standard deviation now we we do not want a situation where this here is zero and then we have an infinite output so we generally add some epsilon right here now this epsilon by default as you see here is one 0.001 and in hogging phase here it is one times ten to negative twelve now it's only encoder so there's no encoder decoder that's why this is set to false image size 224 the patch size 16 number of channels three what are we going to add a bias into the query keys and values here true encoder stride 16 now you can remember the the use of the stride when we're trying to get the patches we once once we have an image like this and then we have a patch size of 16 by 16 would move through 16 pixels to obtain the next patch so that we have no space here so we have something like this actually okay so that's it we understand this config you could check out your this usage of the config so let's copy this code and get it back here there we go get back here and then we have this code paste it out here okay so that's it you see clearly you could easily create a VIT model without necessarily going through all this process which we had done right here so this was just for educational purposes and means that if you want to build your own VIT it's simple you just have to do this specify the configuration that's the config and then here you initialize the model and that's it so let's let's run this and then let's also print out this configuration let's look at that you see there we go we could change this we could change let's say we change this hidden size to some value 144 so here let's have our hidden size so you see this how you could change the hidden size to suit your needs so that's it you change this that's fine you look at that and you see you now have this new configuration and that's it now for the next we'll look at this feature extractor the feature extractor is similar to what we have done already that is taken in the input resizing it and then clearing out some normalization so that's it for the feature extractor you could check in the documentation this VIT model here is a PyTorch model so let's go to tf.vit so this is for tensorflow now so here you see there we go we have our tf.vit model you can expand these parameters and here we have all those different arguments which we could check out now here you have this example of how we could use the tf.vit model directly without going through any stressful process so here you see we have this first of all we have the data data set and then we have this image which is extracted this you could you could get this image from our own data set so that's it and then you have this feature extractor so that's a feature extractor we have this model and now note that here the tf.vit model is from pre-trained so this means that we are going to use this model which has already been trained and you see this specifications here VIT base patch 16 to 24 in 21 key okay so that's it the inputs now pass through the feature extractor before then pass to our model so we'll see how to adapt this code so that we could fine tune our own model in tensorflow and as we have said before this tf.vit model different from the image classification model in that the outputs here are not the final output classes but these hidden states from the transform my encoder so let's scroll here maybe let's read an example okay let's an example here you see this output here that gives you directly a cat so a model produce one of 1000 imaginary classes so that's it whereas here we have this hidden we have this hidden states okay we paste this out here let's take this off take this one off and then we could get started with building our own VIT model based off this hug and face tf.vit model so we wouldn't need this data sets here we already have our own data set that's fine we wouldn't make use of this feature extractor we have this model here we have our hug and face let's call this hug and face model and then let's just take all this off actually so we have this we have the hug and face model we have our tf.vit model from pre-trained and that's it now we're going to define some input so we have here our input it's equal the input layer and then we specify the shape so here we we work with 224 by 224 by three and then using the tensorflow functional api we'll get an output here x which we take in this hug and face model let's call this base model so taking the base model takes in inputs let's call this let's change this to base model and so from here we have this base model of v takes in the input and then now we have the output here x now when we run this you see we download this 330 megabyte pre-trained model let's have this hug and face model here so we we get the inputs from here and then we have the outputs let's call this output x okay so we have that let's run this again we have a hug and face model now set and we still get this error you see it's linked to the positioning of this input here so let's let's change this let's say we have three by 224 by 224 we run this again you should see now that everything works fine see uh it now works fine so this means that the inputs of this uh base model here so hug and face model should be of this shape so it's three uh let's take this here instead of being 224 by 224 by three i was inside three by 224 by 224 now this means that we need an extra layer which will convert this into this before passing into our base model right here so let's build that extra layer and we'll take inspiration from our resize rescale layer which we had built already let's get to resize rescale right here so we have that uh we'll modify the resize rescale specifically for this um hug and face model so here we have a disco resize rescale for hug and face okay so we have this resize rescale we'll resize make sure it is uh 224 by 224 so every image which passes here will be 224 by 224 we're gonna rescale and then after rescaling we are going to permitate the value so we call on this permit here permit layer and the way we'll build this or the way we'll call this permit layer will be such that we move this from this third position this is zero one two three batch by 224 by 224 by three so move it from this third position to this first position so here we have three uh going to this position and then this uh one two shifts to the right so we have uh three this one goes here three one and then this two comes here so that the output now will be batch the you see the batch remains intact by uh three which has been shifted by 224 by 224 so be careful not to do instead two one this is this is one two and not two one because here we haven't uh height by width by channel so we want to change this to channel by height by width okay so that's said uh we do this here so we just do three one two and that's it so after this input layer before getting here we'll call this x we'll take in our resize rescale and that's it so this resize rescale takes in the inputs and then here we'll pass it will be x let's run this again we should get an error because when we permute it it goes back to 224 by 224 by three so let's have this see we have that 224 by 224 by three as we used to working and then we run this now and everything should be okay oh we get an error resize is not defined uh let's run this oh let's make sure that's how we called it let's go up resize oh we need to run this actually this should be fine now um that's it you see everything is okay so now what we'll do is we'll pass in some input let's let's pass in some input we have this test image right here and then we have a model which takes in the test image now we need to also convert this or rather add the batch dimension so let's expand dims and take in the test image right here so we have that we run this and see what we get let's add this to the zero axis axis zero run that again and then we told that we expected this but instead found this now let's get back up here and we could change this to 256 by 256 and then knowing that this resize will convert it back to 224 since our model takes 224 so let's run this again and there we go here's our output you see we have the last hidden state one by 197 by 768 and as we scroll we have this puller output one by 768 and then from here we we scroll down again let's see if we have another output and that's it okay so this what we get is output we have the the last hidden state and this puller output from the documentation where we had the parameters tfv model the parameters you see here we have this last hidden state the puller output and then we told that these are the two outputs we will always get and then we told that this one is optional the hidden states is optional but the last hidden state is an optional get we always get the last hidden state we always get this puller output and then this attentions here are also optional so if you want to get this attention so all you need to do here is to specify config remember the configuration config config config that output attention set that to true and so this means that by default this will be set to false and then for the hidden states we also repeat the same so config that output hidden states we set it to true now here in the documentation they explain a difference between this puller output and this last hidden state but getting back here you should see the shape so you see this is just one of this year while this is all our full hidden state our full last hidden state one by 197 by 768 but this model or this hugging face with model was built taken into consideration this class embedding right here and so this means that if you want to uh um carry out some classification is better off or we're better off taking this final year this final um or this class embeddings final hidden state now if we want just those hidden states we could specify this pick out the the zero index we run this and we should get only the output or the last hidden states so that's it we get this last hidden states and then since we are interested only in that output corresponding to the class embedding and the class embeddings are the zero position here the zero position we are going to take we are going to we are going to do this here we're going to take um this we're going to take for the first dimension here we take all all and then for this next one we take we select the zero index and then the next we take all so let's run this again and see what we get and that's it we have this output right here and now since we've converted this hugging face model into a tensorflow model we could do a summary so let's run this and check out our model summary so that's it we have this model summary we scroll down see 86 million parameters we have the input sequential the sequential is corresponds to the resizer scale layer of it and then the slicing operator which we have here which permits us get our specific output now getting back here we'll now add our final classifier here so we have this and then we have we'll call this output that's fine let's just call this output so we have this output takes in the dense layer has the number of classes specified here and then we have activation softmax as usual so that's it we have this here output okay now let's run this and see what we get we're getting an error because we didn't pass in this x here so let's run that again now as the model is training just remember that the learning rate we use here isn't appropriate as we can't be using this type of high learning rate when doing fine tuning so we have to change this and use some lower let's say five times ten to the negative say five okay so let's stop the training and then restart this process you see now that when we initialize our model and then we modify this learning rate you see the loss drops now much lower than what we had before with a higher learning rate and our accuracy is already at 75 percent and we are still at the first epoch so be careful when you find tuning or when you're updating all the parameters of an already trained model you have to make sure you use a very small learning rate hello everyone and welcome to this new and exciting session in which we are going to log our data to 1db in previous sessions we already saw how to do 1db logs that is logging our metrics like this also logging this confusion matrices as we can see here we also looked at hyper parameter tuning data set versioning and model versioning in this session we'll not just only log metrics but also log tables like this just as we had done before we're going to start with this weights and biases installation so we we have this pip install 1db that's fine and then we import 1db and then from 1db krass we import the 1db callback so we run this and then we go ahead and log in now we've we've gone through this already in some previous sessions in here you just have to click on this uh authorize so you get here you should have this link or this code to copy from here now if you do not have this uh this key if you don't see this key to copy then you would have to sign up on the weights and biases platform so let's get back here we now paste this and then we press enter so that should be it now with with that we initialize this project and we call the project emotion detection entity neural learn so we have that and that should be fine so you could change this to whatever name you want and then uh whatever entity you corresponds to you to yours so that's it currently logged in as neural learn use this to force we log in okay so we have that and then we have this configs here so we run that um train and validation directory that's fine now we go ahead and train our model and we include this 1db callback right here so all you needed to do here is include this 1db callback and training can start now we just run this for say three ebooks just do that for three ebooks and then let's just take a small part of our data set so let's take the small part so we have that and we run this now as we train this model you'll notice that with just a very small part of our data set the hug and face model performs quite well as you see we already have 77 of validation accuracy with just this small part now the other point is we get back to 1db you see here we could check out on our profile there we go you see projects you select emotion detection and then while this loads okay so we have this and we click on this hopeful plans right here this is what we currently haven't this is our current run so you see it shows us already this charts here we have our top key accuracy validation accuracy validation loss and all of that just by specifying that 1db call so this is already information which has already been locked in and as we know you could always get access to this anytime since this information is locked on weights and biases servers from here we see that the validation accuracy goes right up to about 79.63 percent so it's about 80 validation accuracy while the accuracy itself goes to 82.6 percent and you could check this here you see 82.6 and then you're 79.63 so that's it we able to lock this data very easily the next callback will be adding here will be that of including confusion matrices so here we'll put log confusion matrix okay so we have this log confusion matrix and now we'll go ahead and put out a code for this note that for the 1db callback we didn't necessarily need to pass in any information here but with this log conf matrix this is our custom callback we are going to write code such that each and every time we're done with the training we're going to automatically generate this confusion matrix based off what the model predicts now recall that we've already treated confusion matrices so you could click on this and then we would copy out this code copy out this part which permits us get those predictions and the levels and then we will not make use of this any longer as 1db as a method which takes in our confusion matrix matter which takes in these predictions and those levels and automatically produces the confusion matrix so let's copy this here and then let's get back to our callback there we go so at the end of each epoch it's here we get the epoch logs and then we go to the validation data set and then we have our image the image is patched to the model to get the predictions and then we have the levels obviously coming from this level right here then from here we produce a simple list of the predictions and the levels so now once we have this list once we're able to get this list which we've treated already in some previous sessions you now just have this part so this you could check out the documentation for these plots let's get to the documentation the 1db documentation you could check this out here let's type in confusion matrix there we go we have confusion matrix 1db of that let's check out this logplots right here okay so here we here we in this experiment tracking and log data with 1db log log median objects then logplots okay so we have this we the ROC curves or PR curves and our confusion matrices so we saw this already in some previous sessions so you could get back to looking at how to make use of this 1db so that's it you see we copy out this piece of code here where we have this confusion matrix we have all the ground truth that's the levels and then we have what the model predicts then we obviously have the class names so that's it and then here you see just 1db log and then you specify this string right here and then you pass in the confusion matrix cm which is this and that's basically what we are doing here so we just copied out that code so with this year we have our white true prets class names which is specified already and then we just run this so we run this and that's fine now the next thing we'll do is we're going to create tables and now we're going to see all how important working with tables is as compared to just simply log in values tables will permit us visualize our data in a more interactive manner and here you see under this data visualization you have those log tables and we'll see how to create a simple table and log this so here we have the data you notice that we have different rows this row one row zero row one two and three and there you have the image and you have the predicted and the truth levels so basically what we're doing here is we're taking this image then we log so here's our image then we log its predicted value let's say for example class eight let's say predict class eight and then the truth level is class zero then we take some other image uh let's say predicted is one and here's one so basically this is what we're doing we created now this table where we have different rows and different columns okay so that's it we'll we see how to create this and we also see how to log this tables into our 1db runs so getting back to the code here you see we have this different columns image predicted and level and then we're going to call this table our validation table we have those columns which takes in our columns here obviously you could add this you could increase this and put whatever values you want there and then fill that so we have this uh validation table which we've just created and then now we're going to fill this table color row by row so for our first row you see we have the predicted values uh our model takes in the image that's it and we have the levels and then here you see for a given row we have 1db image we pass in our image that's it we pass in the predicted scores then we pass in the levels scores to for corresponding to this three columns right here and then uh for this valid data table that we created here we're going to add this data so we're going to do this for each and every element of our validation data set now here we decided to take a hundred and then we'll get him back up what we'll do is we are going to uh somehow unbatch this so we're going to change this batch size to one because we want to be able to play around with a single validation element and not uh say for example 32 of them so because of that we'll just change this to one and we run this again uh this should be fine okay so so we have this uh validation data that's fine and then here uh training okay we run this again and that's it we have our validation data now let's get back here as we said we've we've taken just a hundred of this so take just a hundred let's let's even reduce this let's say 25 okay so but you could take all the elements actually we just want to make this a bit faster so we have that and once now you feel this you see this follow permission to fill each and every row so you just keep filling rows in your table and that's it you could get this uh here see this we you see you have this table the columns and you have the data right here and then you just log this information so this from this uh documentation you could be able to uh understand exactly how we build this table although it's quite straightforward so from here we do one db log and then we have our model results and we're passing the vowel table so let's run this let's get back here let's run this two cells we run this two cells now and then we start with the training so we have the training now over and we could check out this custom chat which is that of the confusion matrix you see here we could get the number of uh times the model predicted angry when we're supposed to be angry here is the prediction happy when it's supposed to be happy and then here is prediction sad when it's supposed to be sad so we see this is 357 times for happy 876 um for sad 682 and then here we have the times when the model predicts sad when it's supposed to be angry so this is our confusion matrix uh which we generate automatically using one db confusion matrix method and then we move on to this tables so here you notice that the tables contain a hundred different elements so uh you could keep going on to the next so move on to the next 10 and so on and so forth so let's get back or let's let's just work with this now here you see that with this table you can actually like here you see if you scroll if you get get to the hover on this and you have these three dots click on that and you could group so check on this you see you group by image for this one what we have here is this predictions you see happy the level happy uh yes sad sad angry angry and so on and so forth so this model predictions here are quite great now if we increase this move you see we have this example here where the model or rather the the true levels is happy and the model predicts angry but when you look at this photo it's uh it shows that this person is not necessarily happy so we understand why the model may be thinking that this person is angry and so with this one db tool you see that it becomes very easy to detect mislevelled images like this one now another way to get this more systematically is by getting here but since we do not have the group we need to start by ungrouping right here so let's get back here click on this we ungroup now we've ungrouped already and then we get to this level and then we group by those different levels you see that here we have happy sad angry now um this is what the model predicts the model you see here the model predicts mostly happy because obviously the level is happy because we grouped by happy level here sad you see mostly sad angry and mostly angry now what is interesting to note here is for this angry well the level is angry the model never predicts happy anyways let's uh get back here you should be able to scroll through this you see you should be able to uh scroll through all the different happy images and here you should be able now let's uh look at this you should be able to detect cases where the model or rather the data set has been mislevelled like here this person isn't necessarily happy so already you could detect uh these kinds of misclassified or mislevelled data points then we also have this image which is almost completely blurred out you could always could check on these kinds of images and uh take them off actually because if your model isn't going to be seen this in real life then it's preferable to take this off nonetheless if uh in the real world the model is going to be meeting these kind of images then you could allow this uh let's keep going this way you could also check out for those sad images and be sure that those are actually sad people here again you see this person actually smiling and it's leveled as sad then for the angry you see this uh little boy was visibly happy but instead level angry so the 1db tables permits us check these kinds of errors in a very interactive manner now we will again ungroup so we ungroup this and then we could check this out for a predicted group by uh predicted levels so this is what a model sees so the model predicts that this is happy oh this is sad and this image is angry you see what you notice here is because this image you know recall this image which we just saw because this image has been had been leveled as angry the model now predicts that this image is an angry image which is not true and so when you deploy this now into production you would find that this kind of images or this image will be predicted as angry whereas it's meant to be happy and the worst part of it is that even your evaluation wouldn't find this as an error because already the level or the the the class for this was set to be angry then another thing you could do here if you get here um let's have this back another thing you could do is you get to this column settings here click on this you could uh get to column settings just by clicking on this column settings this column settings click on that and then you're you select the levels so here you're picking all the levels where the printed value is different from what the level or rather what the model predicts so here we could say row predicted so when row level is different from a row predicted we unless it so we have that so press enter and then what we notice here is we have this true this false and uh and that so basically the reason why you have this true and this false is the true is for when let's get back to this year the true is for when the row level is even from the row predicted so this is this are the wrong predictions so what we could do here is we could modify this to equal so that's equal uh that's fine this should be reset so you see okay so you see now it's true meaning that this false is when the the prediction is not the same as the level so that's it we could increase the size of this table let's get to this increase the size and basically that's it so you see uh the greens are for when the coincide and the reds are for when what the model predicts is different from what the model ought to have predicted now we could get again to the stop and then check this column settings and take this off so you see you could simply take that off okay so we have that reset again and you now have your levels as you had originally you could do the same for the predicted now you could also get here see this take on this and then you you check the row row we're trying to get all the rows where the row level is uh different see different from the row see type row predicted so that's it because if you say row level different from row level obviously you have no row so let's test that and you could see if you apply you should have no element in here see there's no element uh that's not actually possible now let's take this off and now we take row predicted so row predicted and then you apply you apply no um some connectivity issues so here we apply we check out where the the the predicted is different from the level so here we have this here see happy and the predicted either sad or angry so you are the wrong predictions so with this you could check out and see the kinds of images which the model finds difficult or find difficulties in correctly predicting so like with this one you see happy this the the level here is happy but clearly this man isn't happy this one doesn't look too happy doesn't necessarily even look happy this one is happy and we could look at that so generally for this our problem uh we see clearly that many many problems come or many of these misclassifications come already from our data set so this means that the data set we have to look at the data set and ensure that it is cleaned before making use of it so you see this happy this person isn't happy this this happy this happy uh we could get to sad you see this person is happy but this level that's sad so with these kinds of interactive visualizations we could be able to say for example we want to see where the model was supposed to predict a certain value and it didn't predict that value now we could also check out where this is equals so we could check equals and say okay where the the row level is equal the predicted we apply and then we check out the kinds of images we have you see from here that oh most times the leveling will be correct you see the level happy level sad the level angry uh we are we still have this image here this shows that we have even gone as far as teaching the model to look at this kind of image as an angry image Hi guys and welcome to this other exciting session in which we are going to look at the ONIX format of representing machine learning models. ONIX actually stands for Open Neural Network Exchange. This open standard for machine learning interoperability was co-developed by Microsoft, Facebook and AWS and in this session we'll learn how to convert our already trained TensorFlow model into this ONIX format and then carry out inference on this our newly created ONIX model. So we've gotten to this point where we've fine-tuned our hugging face based vision transformer model. Now we have this TensorFlow model which we've created and evaluated and it may happen that another developer using a different framework like for example PyTorch wants to make use of this model which has been trained in TensorFlow. So thanks to the ONIX format we can now convert this code which was written in TensorFlow or rather convert this model which was built in TensorFlow into an ONIX model and then later on convert this model from the ONIX format into PyTorch such that we now have this PyTorch model which this other practitioner can use. Now another possibility is the reverse that is we could go from this PyTorch model to a TensorFlow model thanks to the ONIX format's interoperability. Another reason why a developer will want to say take this model from PyTorch and make use of it say in Cafe for example maybe because that model is more efficiently run in this other framework and the reason why we have these kinds of differences for different frameworks is because for example you could have a convolutional neural network or a convolutional layer which is built in PyTorch with a certain implementation and the implementation in Cafe may be slightly different and maybe more efficient as compared to that which was done in PyTorch. Now this is just for demonstrative purposes and we are not saying that the implementations in Cafe are better than those in PyTorch. And so in summary the ONIX format allows models to be represented in a common format that can be executed across different hardware platforms using the ONIX runtime. So now developers can feel free to build their models with just any framework say for example TensorFlow or PyTorch or Paddle or whatever framework they actually want to work with knowing that they could deploy your model on whatever hardware they want to since they could now convert these models to the ONIX format and then run inference on these models via the ONIX runtime which is in fact an inference engine which is lightweight and modular and permits us run our ONIX models on just any hardware we choose to work with. So here let's look at the list of supported hardwares. You see for example the TensorRT which is very popular with the NVIDIA GPUs and permits us to attain very high speeds when carrying out inference on neural network models. The other great advantage of working with the ONIX models is the ease with which we could convert TensorFlow models into this ONIX format. So right here you see we evaluate in our model this our hugging face model which we fine-tuned previously and we get in 90% validation accuracy. Now we can go ahead and save this model so yeah let's take this off HF model let's call this hugging or let's call this VIT fine-tuned. Okay so that's it we have this fine-tuned VIT model which we're gonna save we check this out here. There we go we see a VIT fine-tuned and you could check you see you have the saved model Keras metadata variables the assets folder and then here if you you could see that this is almost one gigabyte model so uh you could view this here you see uh 984 just just check as I hover on this you see here 984.39 megabyte model and so this means that if you were to deploy this in a real world scenario then you need to always allocate this amount of space in order to run this model. Now let's move on to converting this model into the onyx format. We shall have this two installs so we start by installing the tensorflow to onyx tool and then we'll install uh the onyx runtime so we run this and then now we'll go ahead to convert our model which is this one here VIT fine-tuned so we just have to specify your VIT fine-tuned into the onyx format so here we just say VIT fine-tuned and then now we have this model I'll give it this is the name of our onyx file so call this VIT onyx that's it VIT onyx tf2onyx.convert and that should be it so now we'll run this then while waiting for that run to be complete you could get to the onyx github repo tensorflow onyx and then you'll have the documentation for how to convert from tensorflow to onyx so let's get back to this uh to running because here we have a series of warnings then uh specifies the tensorflow on tensorflow onyx and tensorflow to onyx versions then the upset then um it's optimizing and now we have this successful conversion so onyx model is saved as VIT onyx dot onyx now let's open this up see VIT onyx dot onyx you see you you we've moved from this uh let's open this again we've moved from this year where we had uh 984.39 megabytes to this optimized onyx version which is at 327 megabytes now another option will be to convert the model from this Keras format to the onyx format so this first one we saved this as a tensorflow saved model and then now we could just have this year so let's say we have the model we saved this as a Keras model so we have that h5 and we run that um taking here you see we have this still 984.98 megabytes close to one gigabytes and then from this Keras format we are going to now convert this to onyx now here we're going to have this specification and then we'll pass in the image size so here we have um the batch by 256 by 256 by 3 it's float 32 and it's our input then we could specify this output path so this one was VIT onyx let's let's let's have this as um let's let's call this VIT Keras dot onyx okay so this is going to be our output path here and then we will have this tensorflow to onyx here which contains this from Keras method which takes in our model hug and face model and then the specifications which we just mentioned right here so here our specifications right here we pass this in this uh our passes as our input signature we have this offset value and then we also have our output path which we have already specified here then we have our output names which we shall get automatically done so that's it let's run this while that's running also note that you could check out this conversions on the onyx runtime.ai platform where you can get the details of all what we're doing so let's get back here still still running and that's it complete let's check out our VIT Keras you see 327.59 megabytes then from here we move on to the inference where we'll see whether what we get from the onyx model coincides with the initial Keras or tensorflow format so right here we have this provider now here we specify this provider to be this CPU execution provider as we'll be running this on the CPU and then we have this onyx runtime as RT which we are imported right here and we should be making up making use of here so the when we want to run an onyx model you see the first thing you have to notice we do not need tensorflow anymore so even if we restart this whole process all we need to do will be just to install the onyx runtime and then import this this way so now we have our onyx runtime we we have this inference session which we create by just specifying this path our output path here our output path is the path to this model the onyx model and then the provider is this one here so we just need to specify this path and the provider and we're good to go so now we have this the next thing we want to do is to run the inference so the onyx prediction is this m.run and then now we'll specify the output names now this output names is gotten from here so let's run this let's print out output names so you see what it contains that's it you see we have that dense and if we get back to when we're creating this model let's get back to this okay you see that the name here we have is dense so if you specify the model name to be different from this you would have a different output name so that's it let's get back to our onyx inference we've converted already and that's it so so that's it we have our output names which we created all right i wish we generated automatically from here does it and then obviously at least because we could have several outputs and then from here we pass in our input image now this our input image is simply what we've been having already so we just we could copy this copy this and then test this out here with our onyx model um let's have this here add this code paste this out all we need here is just this basically so we we could run this let's run this that's it we have our image and then let's run this and yes let's run this and then we print out the onyx thread so we get in input must be a list of dictionaries or a single non-py array so what we're going to do here is instead of tensor flow we use non-py so you see already that we do not really need tensorflow like even with this we could get the the test image from here let's let's get this test image directly so we could have test image and that's it we run that we have our image this not defined um test image let's run that again that's fine we get this other error because this input isn't a float so here instead of this we're going to have test image dot as type np dot float 32 okay so we have that set now let's run this again no before doing this let's make sure we pass in this image instead here so we have that we run that that should be fine okay so let's rerun this again and you see we have now our onyx thread so there we go you see it shows us that it's a sad image because this is angry happy sad so it's sad because it has the highest probability and you see the image here is sad now if we get back to this top here and simply run this the same image let's run this let's get the probabilities actually so let's let's let's say we want to have let's print out the model image let's get those probabilities you see this is this what this model gives us is output and here's what we get from the onyx model right here now before we continue another thing we would like to check out is the speed or the time or the latency of the model so let's get back up here and we could add this code so let's import time we could just do it below so let's yeah let's do let's just do that below so let's take this off and then here we could say we want to have let's import time there we go and then we have t1 which is time the time and then we do hf model taking an input image then we record the time the current time minus the t1 so we could get the time which is elapsed after running this model you see 0.14 that's a 0.14 and you should know that here we're supposing that we're running this on a GPU so you check uh manage sessions we're running this on a GPU now for the onyx model we could have here t1 time dot time and then we could print out the the difference in time so we have that minus t1 let's run this and see what we get input lists a single numpy oh let's run this again okay so you see with uh we here we're making use of the cpu and we get in 0.279 seconds while with the gpu here we get in 0.143 seconds now we can be comparing the onyx runtime results on a cpu with that of this uh hungry face model with tensorflow on a gpu you could you could even see from here if you do get device you see from here that what the onyx runtime is using is the cpu and so if we want to make use of the gpu we would have to install onyx runtime gpu version so we could compare the the two um models so for now just note that tensorflow with the gpu is 0.15 seconds and then we'll need to get tensorflow with a cpu and then we'll also get we we got onyx with a cpu onyx with a cpu give uh let's say 0.3 seconds give about 0.3 oh let's run this again we run it again we should get let's run this again oh getting an error now that was because i had moved this uh file into the the drive so here we have the vcarass here and we run this again we have about 0.38 to see that's it about okay let's say 0.5 so let's let's say this is 0.5 uh here and then we need to get onyx with a gpu and then we also need to get a tensorflow with the cpu so that's it uh what we'll do now is we'll go ahead and install the onyx gpu so here we have pip install onyx runtime gpu and that should be it we already installed this okay so we have this um onyx runtime gpu and then if you if you if you do uh import onyx runtime as rt and then you as rt and then you say rt.get device we are getting the cpu so what we'll do now is we are going to restart this runtime so basically restart this runtime so that um this the the gpu version of onyx will be taken into consideration now this time around we're going to install onyx runtime with the gpu and that's it as you can see now the device we're using is a gpu okay so we have that let's now get back here uh we already have our model in the onyx format so we just run this and then if you run this you'll see that since we we we specify the provider to be the cpu we shouldn't have any much difference here so let's run this again we import a time output names not defined okay our output names is basically this so let's have this list and let's define this here so let's have output names um output names there we go output names dense so let's run that again that's fine and then oh we get back here but note that these output names were generated from here so you could um get back to start generating this or you could just make use of the output names as you can see since we restarted this runtime we do not have those files anymore apart from the onyx uh which we start in the drive so if we have to do this we need to retrain our model since we lost them already okay so we have our output names here which was specified and then we get back here we run this again and check out on the time it takes to run the model you see 0.34 not bad uh 0.34 okay let's say 0.34 so here normally this should be zero point yeah let's say 35 okay so the tf with gpu 0.15 onyx with cpu 0.35 now the way we're going to use this is we'll get here you see um in this documentation you have the provider you will specify the coder execution provider so let's copy this and then paste it out here so instead of instead of here for example instead of cpu now we have the gpu so we could make use of the coder execution provider so here we'll take this off and then paste this out so this is our provider now note that since these providers uh can be a list you could have several providers like here when we do this when we have coder execution provider before the cpu execution provider it means that that the priority goes to the coder execution provider and so in the case where we are having a cpu then we'll start with this one and given that this cannot work in the situation of a cpu we would then move on to this but if we have a gpu then directly we will use this scooter execution provider and that's it see 0.048 seconds or let's say 0.5 seconds which is three times less than what we had when running our model or when running the tensorflow model so you see already that the onyx uh framework permits us to optimize our initial tensorflow model so here we have onyx gpu now this is 0.05 so it takes us now 50 milliseconds to run this hug and face model so that's it now before we proceed it's also important to note that generally the way we measure this time it takes for the model to predict or an output we usually can test with several input images or we could repeat this process several times so let's say for underscore in range let's say 10 we run this and then we get the time elapsed instead of just testing once we could test 10 times so we could get the average so let's run that and you see dividing this by 10 we have 0.035 so that's 35 milliseconds per prediction so this means that for us to have 100 predictions we would take 3.5 milliseconds we could test that out let's run this and there we go we take 2.3 seconds for 100 predictions now if we divide let's let's say let's call this number of predictions uh we set this to be 100 and then the time taken this is uh time time for a single prediction there we go time for a single prediction all this divided by the number of predictions so we get the average that's fine so let's run that again so we get this time okay so we see 0.023 seconds that's 23 milliseconds while if we repeat the same for the hugging face model you see it takes 0.15 seconds uh just as we had already now we could see that the the difference in speed here 0.15 divided by 0.025 you see the onyx model is six times faster than the tensorflow model now what if we get back and run the tensorflow model on the cpu so change this runtime known and we save that so we'll have to rerun all this again now running our hugging face model here you see this takes 0.8 seconds 0.8 so with the with the cpu we have 0.8 divided by 0.35 the onyx model runs about twice as fast as the tensorflow model hi guys and welcome to this new section in which we are going to be looking at quantization for neural networks in the previous section we looked at the open neural network exchange standard which is an open standard for machine learning interoperability we saw that not only does this onyx format permit us convert models from one framework to another but they also allow us optimize our models for different hardwares and so in line with these optimizations we are going to look at quantization which is a technique for performing computations and storing tensors at lower bit widths than the usual floating points which we have been working with so far in this course model quantization is a popular deep learning optimization method in which model data that is both the network parameters and the activations are converted from a floating point representation to a lower precision representation typically using 8-bit integers now defining quantization in this manner may not seem very clear so let's try to understand first of all why quantization or quantizing a neural network model is important so here let's consider this very simplified model where we take in some input multiply by a weight and add the bias now we have several layers so we just simply stack this up and we could say we have our model which has already been trained and this model has a hundred million parameters and occupies let's say one gigabyte of space so we have this model 100 million parameters one gigabyte of space and if you're doubting what this space is for you should note that it's for storing this weights and biases and so obviously the more parameters we are going to have the heavier our final model file will be now supposing you want to use this in some setup like a mobile phone so you want to use this in your mobile phone it means that you need to allocate at least one gigabyte of memory space if you want to run this model and this is where the techniques like quantization come in so now thanks to quantization instead of storing this weights in a 32-bit space we are going to store them in 8-bit memory space so we're going from floating point 32 to int 8 now if you're not familiar with the floating point arithmetic you could check out this resource by Fabian Sanglar where he explains in a very intuitive manner the core concept around floating point binary representation essentially if a single weight value that's your model weight let's take this off if you have a model weight value which is for example 3.14 the way this is represented in memory is by first of all allocating this 32 spaces we have here where each space takes a 0 or 1 at this first position here the 0 or the 1 is to specify whether we're dealing with a positive or negative number and then for the next eight positions we are going to see whether this value 3.14 lies in the range 2 to the minus 1 to 2 to the 0 or 2 to the 0 to 2 to the 1 or 2 to the 1 to 2 to the 2 and so on and so forth now in our case 3.14 lies in this range and given that this exponent we have here is 1 we are going to apply this formula where we have the exponent minus 127 should give us this power we have here so we'll have that 1 so we have e is equal now 128 and if you convert 128 to binary notation you would obtain this right here and now after encoding this integer position the next step will be to encode this decimal value right here and that will be the role of this 23 other positions remember this is only one box this is eight boxes and here we have 23 boxes for this eight we'll see that it helps us locate our number in this range which we've seen already but for this other 23 boxes we are going to suppose that since we have 2 to the 23 possibilities thus let's try this here 2 to the 23 possibilities which is actually 8.388,608 possibilities so we have 8 million possibilities here this simply means that for every given range which we've seen here for this range this range this range this up to the end we are going to divide it into 8 million different parts and so if you see here you see you have 2 to the power of 1 this is 2 to the power of 2 if you break this gap if you break this here this here this gap into 8 million different parts or better still if we consider that the distance to move from 2 to 4 is 8.388 million then the finding this 3.14 or encoding this 0.14 right here or let's just say encoding 3.14 will until calculating the distance from 2 right up to 3.14 knowing that this distance from 2 to 4 is 8.388 million we cannot compute this distance by simply doing 3.14 minus 2 that is 1.14 divided by all this distance that's 2 so we find this and then multiply by the 8 million we have 4, 7, 8, 1, 5, 0, 6 so now we shall convert this to binary and we obtain this here so once we obtain this we then fill up all this 23 spaces right here and that's essentially how a number like this is stored in memory and so getting back here if we have to store this let's say 3.14 3.14 which was previously stored in this 32 box memory and now we want to store it in an 8 box memory now we move from 1 gigabyte to 256 megabytes here we have 256 and our mobile phone will now need only 256 megabytes of memory to run our model now it doesn't just suffice to say we are going to go from the floating point 32 to the int 8 we need to describe exactly how this is done and the way it's done is actually by a simple linear mapping where we shall start by defining two ranges of values the first range is for the floating point values and as you could see here the defined negative a max to a max well one good thing about deep learning models is most times your weight or your weight values lie between negative 1 and 1 and so getting back here we could have here negative 1 so a max will be 1 and so negative a max is negative 1 and then a max is 1 so we go from negative 1 to 1 and then if we want our output to be unsigned ints instead of going from negative 128 to 127 we shall go from 0 to 255 now notice that the number of values we have between 0 and 255 is the same as number of values we have between negative 128 and 127 but with the unsigned ints all our values are positives so instead of the int we have unsigned int 8 and so at this point our aim is to take values ranging between negative 1 and 1 and map them in the range 0 to 255 and now we will use a simple linear function which has the form y equals to ax plus b now our wire will be the output value so we'll have the let's call this x we'll call this x quantized this equal x floating value so this is the original value of the weight let's let's put this in blue we have the original value of the weight or the float value divided by the certain scale plus a zero point value we'll call this z so simply one over s is equal a and b is equal z then y is x q and x is x f so now our aim is to look for the value of s and z such that when we have any value in this range we get its corresponding value in this other range the way we'll get s let's have this the way we get s is by doing x float max x float max you say that x float max is simply a max minus x float mean which is in this case negative a max divided by x quantized max minus x quantized mean now if you replace all this by the corresponding values we have here we would have one minus negative one divided by 255 minus zero so 255 minus zero this means we have two divided by 255 that's our s which is our scale and then z our zero point is x q max minus x f max divided by s this s we just had here so if we replace again we have x q max which is 255 minus x f max which is one so we have 255 minus one divided by two on 255 that will give us 255 divided by two so essentially 127 127.5 so that's what you get now the way you can look at this zero point is it's the quantized value we get when the floating value is zero so when we convert when we have zero here we have zero on s which is zero um the quantized is the quantized value or the corresponding quantized values equals z so that's why we call that the zero point and then s which is the scale simply scales our inputs here as we go from this range of values to this other range now you could take a simple example where you could leave from um x f to x q if you have negative one here you see you have let's say x q is equal negative one that's x f let's suppose we have negative one then divided by s we said s was 250 two on 255 um plus z z is 255 divided by two so this gives us zero which makes sense as we go from negative one to negative one one two zero two fifty five this means that this here or this boundary values should be almost the same now let's take another example in the middle let's say negative or let's say zero point three so we could have x q which is zero point three divided by two on 255 plus um 255 divided by two so in this case now we have a value of 165.75 which if we run up we could have 166 so essentially we're going from 0.3 to 166 and apart from running the output as we've just done we would also see that we could clip any outliers so in case we've decided to have here a max or our x f max to be one and that it happens that we have a weight or weight value which is more than this one then the output um unsigned int would be 255 so any value greater than this is going to take this value any value less than this is going to take this value of zero and so we've seen how this um simple technique permits us reduce our memory used in storing the weights now it's logical that we're going to have a drop in the accuracy because if you've trained a model for example to have certain floating or certain weights which are actually floats and then you convert these floats into integers where you have some extra transformations like the rounded and the clipping then you'd expect to have a drop in the performance of the model nonetheless this huge gains in terms of memory are enough for us to sacrifice a bit of the accuracy or more generally the model's performance and that said apart from this model weight size which is dropped or reduced it should be noted that arithmetic operations like multiplication and addition of our quantized integers can be carried out even much faster and so we not only have a model which occupies less space but a model which is even much faster we have generally three ways of carrying out quantization the dynamic quantization static quantization and the quantization are where training now given that during the quantization process the weights and the activations are stored at lower bid weights in the case of the dynamic quantization this quantization parameters does a scale under zero point which we've seen already for the activations are computed dynamically or on the fly and because this year have to be computed dynamically there is an increase in the cost of inference so it will take a little bit more time to produce an output as compared to other methods like static quantization nonetheless year we usually achieve higher accuracy compared to the static quantization methods for the static quantization method we first of all compute the quantization parameters using a much smaller data set which we'll call the calibration data so essentially we have our model and in here we have our different quantization parameters but instead of dynamically computing these different quantization parameters we are going to pass in some input and output carry out several runs such that we are able to obtain the most appropriate quantization parameters based on this data we've passed in and then now when we want to run or carry out inference we do not need again to compute these parameters unlike with the dynamic quantization where at inference time we always have to compute these quantization parameters here we compute these quantization parameters before via calibration data and then now when we run an inference we just pass in inputs and we already have the quantization parameters set but the problem now with this method is that if this calibration is done poorly then we would have low quality values for the scale under zero point and so because of this we would then have a lower accuracy as compared to the dynamic quantization method that said these two methods we've just seen are post-quantization methods so dynamic and static is post-quantization meaning that we train the model first in floating point 32 and then after training the model we convert this model to one with weights and activations which are unsigned ints now sometimes the post-training quantization that is PT cure is not able to achieve acceptable task accuracy this is when you might consider using the quantization aware training that's QAT the idea behind the quantization aware training is simple you can improve the accuracy of the quantized models if you include the quantization error in the training phase so unlike post-training quantization where we train the model first before quantizing here the network adapts to the quantized weights and activations during the training so as we were saying we include a quantization error in the training loss by inserting fake quantization operations into the training graph to simulate the quantization of data and parameters these operations are called fake that's fake quantization because they quantize the data but they immediately de-quantize the data so the operations compute remains in float point precision that said the post-training quantization is more popular than the quantization aware training method thanks to its simplicity as it doesn't involve the training pipeline the quantization aware training almost always produces better accuracy and sometimes this is the only acceptable method and that's it for the section in which we've looked at quantization of neural network weights and activations to help reduce model size and also speed up computations hello everyone and welcome to this new and exciting session in which we are going to see how we'll move from a tensorflow model which occupies one gigabyte of space to an unequal quantized model occupying just 83 megabytes at this point we now understand the concept of quantization and we're going to see how to apply or implement quantization specifically dynamic quantization to make use of our model even more efficiently before we move on we should also note here that the tf size tf size is one gigabyte it's about approximately one gigabyte that's 1000 megabytes while the onyx size is 328 megabytes so onyx size 328 megabytes now we're going to look at the onyx quantized let's just copy this so we have the onyx quantized cpu onyx quantized cpu which we shall get shortly the onyx quantized GPU and then we'll also get the onyx quantized size so let's take this off for now and then get back here now here as you could see we've imported the onyx onyx runtime quantization we've already imported the onyx runtime so here we just imported this quantized dynamic and quantized then from here you see we have the two models that's the floating point and the quantized model now this year this model is what we had already so we'll get back and then copy that and then we'll call this one the vid quantized so that's it now all we need to do is we have this quantized dynamic which takes in this model takes in the path to the quantized model it's still an onyx model and then the weight type now here this weight type is an unsigned int so let's run this and see what we get we get in an error this name isn't defined okay we should run this before running this one so that's it this should be fine this time around and we should be able to get this file right here this quantized onyx file again here you could check out documentation on quantizing onyx models you see we have quantized onyx model we have an overview and all of that so let's get back here and check out our model oh there we go we have our vid quantized and what we notice is we have 83 megabytes so this means that we've gone from one gigabyte or 1000 megabyte to just 83 megabytes and now we could go ahead and check out the speed or the cpu speed so let's get back here or rather let's let's get back here there we go copy this path let's copy this path and then we have this year we paste that out here and then this this one here this is an cuda this this is cpu cpu execution provider that's fine everything looks fine and let's run this you see we get 0.39 that's practically 0.4 seconds per prediction and so here we have 0.4 seconds now let's let's check this out again um let's check this out again for the original onyx model so let's run this and check this out you see here we have 0.49 that's practically 0.5 so it means that this isn't exactly true this should be 0.5 so we see that the quantized model uh is faster and way much lighter than the onyx the original onyx and the tensorflow models but we have to be careful as quantization generally comes with a drop in the accuracy now we switch to a gpu and then test out our quantized model so right here we have this quantized model which we're going to run and then check out its latency here is where we get 0.27 let's say 0.3 and this shows that the quantized model doesn't benefit as much as the onyx model from the usage of the gpu now if you check out the documentation we'll see that there is quantizing uh an onyx model and this is quantizing on a gpu so the quantization on the gpu isn't as straightforward as that with the cpu as here we do that we will need a device that supports tensor call int 8 computation like the c4 or the a100 and let's say here that order hardware would not benefit from quantization though if you want to proceed with quantization or quantization with a gpu you can make use of this tensor RT execution provider and here they give the overall procedure to leverage this tensor RT execution provider so with that we're going to get back here and the next thing we shall do is ensure that the quantization process hasn't led to too much drop in accuracy as when we quantize the model generally we may have drop in accuracy but our aim here is to be sure that this drop is minimal and so to do this we are going to evaluate our model so here basically we'll define this accuracy which takes uh the model and then for a hundred we'll take a hundred elements a hundred elements in a validation data set where the evaluation data set as a byte size of one we are going to compare each time the output or the unexpedition with the level so here we compare the level with the unexpedition and if they are the same we increase the accuracy variable value year by one initially they are zero total accuracy is zero but the total is always increased and then the the accuracy is increased only when we have this two the same so with this basically we implement this accuracy uh method which now we take the two models that is the onyx the original onyx and the quantized onyx so here we have these providers to to to make this run faster so we run this now you see here we have 90 for the original and then 89 for the quantized model the next thing we'll look at will be how to visualize onyx models using Lutz Rodas Neutron app so you could get your Neutron app and you you have this interface right here now we're going to open the model there we go it's loading and here's what we get you see we start with this transpose you could recall we have this transpose and then we moved on to this resizing then matrix multiplication and then the rest of the VIT model right here so here is the VIT model in this onyx format onyx quantized format screwed right to the end right here and then towards this end you see we have this matrix multiplications for a linear layer and then we have this softmax you could also export as png so you could open this up in this png format right here so that's our model and that's it for the section where we've left from a 1 gigabyte model to an 83 megabyte model with just a 0.01 drop in accuracy hi there and welcome to this new and exciting session in which we are going to treat quantization our world training with tensorflow now in some previous sections we started by explaining what quantization is all about and the advantages of quantizing models we also looked at different quantization methods and we looked at the relative advantages and disadvantages of these different methods that said we are going to see in this section how to quantize a full model or just some layers which make up that model in tensorflow the special model which permits us carry out quantization is this tf.mod which stands for tensorflow model optimization and so here we start by installing this tensorflow model optimization model and then we'll import this as tf.mod then since we want to do quantization we could get in here Keras and that's it here we have different methods and classes let's get into this quantize model right here as you can see quantize a Keras model with a default quantization implementation and so here we have to simply pass in this to quantize argument which is the model to be quantized and then we should get a quantization aware model so here for example you see this model define the sequential model and then we also have this functional model then to quantize this we just call this method quantize model right here and then we pass in our model and we have our quantization aware model so that said let's go ahead and implement this here we have our model in this case let's let's start with our hugging face model so we have this our hugging face model which we've declared already and now let's say we want to have our quant aware hugging face okay so we want this quantization aware hugging face model and then we want to use tf.mod.quantize model so basically let's copy this here and then paste it out in the code so we have that paste it out and then here we have our hugging face model so from this we're running this should give us our quantization aware model now we get an error quantizing the tf keras model inside another tf keras model is not supported so as of now this isn't supported now um let's try out with the efficient net model though this should it should be the same error because the efficient net model let's get up here the definition for the efficient net model first of all you can see the hugging face model you have this model in this model so that's the reason why that doesn't work and then we we get to the efficient net model transfer okay so we have this model right here and we could see that we have this keras model which is this backbone here in this model and so if we had to use this we have to look for a way to break this backbone up into its different layers but as of now what we've been doing is just making use of this backbone as is here we just have this backbone and that was it we didn't actually break this model up into different layers now that said let's copy out this efficient net model right here uh to have to take all this off and then now we are no longer making use of this input right here so we wouldn't use this we'll use the the backbone's input directly so let's look at that we have this x so from here we have the backbone's output we should get into this global average pooling layer so here we have backbone output that's it we have this output which gets into the global average pooling we have this x here that's it which now passes through this dense layer and then to the batch norm layer and then to this dense layer search that we have an output right here okay so we have that let's pass in this x values there we go and finally we have this then from here now we create our model so it's our pre-trained our pre-trained model is a Keras model and which takes inputs the backbone input so now our backbone input is our input and then our output is simply this output right here so we have that output okay so we have this set now and everything should work fine so let's run this here and what do you notice you will notice that the Keras model what we had previously as our Keras model has now been broken up so you could see i think we should have just um set uh here pre-trained functional model let's call this functional model okay so let's let's go back and run this other cell here let's get back here um pre-trained model uh okay so let's run this again this is our pre-trained model and then so that we could we could run this um summaries down here and you could see Clara so you see here we have this model is exactly the same model we're dealing with so far this is the exact same model what we want to do is just to paste this so have the pre-trained so this is a pre-trained model uh let's get a summary pre-trained model summary run that let's reduce this so you could get into the space we have this and you see we still have this exact same total parameters is here the same number of parameters number of non-trainable parameters exactly the same so it's basically the same thing but the difference here is we do not have this um let's open that up again we do not have pre-trained summary we do not have this carat model here so we do not have this uh model right here and so because we don't have this now it will be possible for us to make use of this method and quantize our full model so that's it we have this pre-trained model now let's run this again so we we have this pre-trained pre-trained functional model let's run that okay we have our model set and now what we can do is we would run this now so let's run this again and see what we get just increase the size and there we go we get another error this uh the same error actually let's get back here oh this should be pre-trained functional so let's run this now it's taking more time hopefully everything should work well now instead we're getting this other error yeah well we told that this rescaling is not or this this layer here is not supported and this is normal since here in this rescaling layer we do not have any weights and so we are not going to be carrying our quantization for such layers so um what we can do now is instead of quantizing the whole model we'll select some layers we want to quantize so here there we go what we'll do is instead select some layers so this means that if we had let's define a simple model so let's let's uh get back to the top and then we define for example this learned model without this resize rescale so that's it uh quite simple model we have that now let's run this let's run this cell and then oops we're getting an error so that's because we we took up the resize rescale i would not specify this your exact for an exact uh input size so let's run this again that's fine now we have our lunette model let's do this lunette model and run that we get another error let's check that out this batch norm is not supported so you cannot quantize this batch norm layer so what we'll do is let's let's basically remove the batch norm layers but later on we'll see how to have to to to to quantize only uh some layers so for now let's just remove this batch norm layers uh batch norm off drop out let's take the drop out to batch norm off and that's it uh so we have that let's run this again uh see what we get okay so that's fine you see now uh we've been able to make this lunette model quantization aware and we've done this for the whole or the full model now in cases like this year this model here this uh efficient net model where we have this backbone which is uh our pre-trained uh backbone we cannot start taking off the normalization layer for example here and taking off this rescaling which comes with the backbone and so on and so forth so what we'll instead do is we'll move layer by layer and select the layers which we want to actually uh make quantization aware so that's basically what we'll do and so instead of proceeding as we did here with this lunette that is uh quantizing the whole the full model we're going to go layer by layer so with that we could comment that section there and now take this model off now in order to quantize only some layers of the model we'll make use of this quantize annotate layer method right here so we see again we have quantization kiraz quantize annotate layer and this takes in the model to annotate with some quantization configurations so here what they explain is this function does not actually quantize a layer it is mainly used to specify that the layer should be quantized so you see it's there to specify that the layer should be quantized and so the layer then gets quantized accordingly when we do a quantize apply so this is the quantize apply method here click open that and that should be it so let's get back uh oh let's let's just get let's just look at this example here where you see this layer you see we have this model but in this model we want to quantize only this layer and so as you could see we have quantize annotate layer and then once this is done we do a quantize apply to get our quantization aware model which here is called quantized model so let's go ahead and see how to implement this with our pre-trained efficient net model we'll now define this method apply quantization to the conf layers which takes in a layer and then if that layer the name all right here if if this conf is in the layer name we are going to carry out the quantization so we're going to apply the quantization on the conf layers so here we have layer okay so in the case where we don't have that we'll just return the layer itself so the layer remains unchanged whereas conf layers will become quantization aware so we have this apply uh matter right here which will run there we go now once we we have this method defined we'll make use of this clone model method right here to create a new model but one which takes into consideration a certain clone function we get back here and then we paste this out oh we wouldn't making use of this in input tensors we'll just make use of the clone function and our clone function here is this apply quantization to the conf layers so that's it now you you check this year if you check out here you'll see that wherever we have the conf layers you see like this one we have this conf uh yeah we have this count for the depth wise convolutions and so on and so forth now you could also include this for the expand and reduce layers but let's just work with only those counts so that's it you now understand how to pick out certain layers or how to leave out others from the quantization awareness process so from here we have this apply right here and then we'll call this our quant aware efficient net so that's it we have this quant aware efficient net and then we run this uh no this model this model here has to be our our pre-trained model so it's our pre-trained model we run that again and now this should be fine okay so we have our quantize our model which is now quantization aware and you notice that when we do quant aware efficient net summary we should get something slightly different from what we used to get in um we have in this now oh this is no this should be func so we should have func model let's run that again and run now let's run this let's get back here and you see we have this quant aware efficient net and you will now notice that let's get back to the top you will notice that wherever we have this uh conv layers see wherever we had the conv layers we now instead have this quantize annotate so as we scroll you wouldn't see a conv layer but instead we have the quantize annotates so that's it but uh yes yes because this didn't have there's no conv in this name so we could as we said before we could include this as we expand and as we reduce um layers so that's it we we now have this quantize annotate layers which wasn't what we had before making this model quantization aware or some layers of this model quantization aware so with that now we are done with the annotation and we are ready to make this actually quantization aware so here we call this quant aware model um yeah this is quant quant aware let's get this exact name right here quant aware efficient okay so we call that quant aware efficient there we go that's it and now we have our quant aware model let's run this again and then see what we get as a summary and that's it is now quantization aware so we know we no longer having the the annotations but now some wrappers so here you see the the layer name but now we have this quant which is added to these layers um let's scroll down check out on this you see we have this other ones here and so on and so forth so that's it we we we now have our quantization aware model and we're now ready to compile this model and train it like every regular model we get back to our training right here and then at a level of this compile we have your quant aware model that's it um this this is the same linear rate so that's it okay let's run this and then here we also have our quant aware model there we go so let's run this we're getting this resource exhausted error so i'm going to restart the session and hopefully everything should work fine there we go we started the session and now we able to train anyways we see how to implement quantization aware training with tensorflow and in the next section we'll dive into post training quantization hello everyone and welcome to this new and exciting session in which we are going to look at post training quantization with tensorflow in the previous session we looked at quantization aware training still with tensorflow and now we'll look at how to do quantization for model which has already been trained now as you could see here we have this pre-trained model model which obtains an accuracy of 84 percent and a top five accuracy of or rather top two accuracy of 95.6 percent and then in the section we'll quantize this model check out on whether this quantize model occupies less space as compared to the original model and then also verify that not much model performance is lost before we start with the quantization process we should note that we are going to be using this tensorflow light library now this tensorflow light library is a mobile library for deploying models on mobile devices microcontrollers and other edge devices so this means that in these environments where the compute resources are limited it's important for us to quantize the models since we will now get smaller and lighter models and also faster models here we have a general overview of how this works you will see you pick a model like for example efficient net based model convert this model to tensorflow light using tensorflow light converter which we are going to see shortly we then deploy this by taking a compressed version or compressed tf.light file now we were already working with for example Keras files which are hdf5 files now this tf.light is some sort of compressed file and this compressed file will be loaded in the environment in which you'll be working in and then from here we will also quantize this from 32 floats to 8 bit integers which can run on devices with low compute resources so here in the documentation we have here tf.light and you have tf.light converter basically this tf.light converter as the word goes is going to convert your models into the tf.light format and here as you can see these examples you have the tf.light converter from a saved model from a Keras model from a function from a GX model so let's say for example we work with a Keras model just have the model passed here and then you generate this tf.light model from this model making use of the tf.light converter now apart from these arguments we also have the attributes so here you could specify the optimizations the representative data set which is very important in the case where we're working with static quantization remember that with static quantization we have to obtain the scale and zero point values by making use of unlabeled data so basically as we had seen previously all we need to do is to pass in the inputs which in this case are images and then these values will be inferred from the model's interaction with the inputs then we have the target specifications inference input type inference output type whether to allow custom operations or not and then whether to exclude the conversion metadata or not so that's it that's our converter right here now we paste out this code from the documentation and then let's let's take this first here now apply some of the attributes that we have this converter dot optimizations let's get back to the documentation the optimizations and here we could set this optimization with this tf.light optimize so let's open this up there we go you see this takes different values we have default whatever we want to optimize for size this is deprecated does the same as default optimize for latency does the same as default this one is experimental hence subject to change so what we're going to do here is simply take this default so that said we have the tf.light tf.light.optimize and default just as we had in the documentation right here now we could also specify the inference input type and the inference output type so let's get back here we have there we go converter inference input type is going to be unsigned int eight and then let's copy paste this we have the output type which is going to be the same so that's it so we specify this the inference input type and the inference output type and now let's specify the representational data so this representative data set right here which in fact is a generator which permits us output the input values because recall we all we need in the static quantization is just as inputs so here we have the generator which yields the inputs and then here we just say converter dot representative data set equals a representative data generator here we have our training data set now we could take we could take all our training data set or just a few we obviously don't need to take all the data set so we could just take like 20 and use that to obtain the values for the scale and the zero point now if you new to this notion of scale and zero point it's important you check our previous sessions where we treat this so let's run this here we also run this and now we set to convert oh this model here is pre-trained our pre-trained model so let's run that and that should be fine so we now set to carry out the conversion now we're done with the conversion we are now going to save this in the tf.light format so we have this path and this path we're going to have this file so let's run the cell and that's fine so we get that we have the the file size here and when we check this up you see we have this 21 megabytes so we're going from this model which we could check out here we're going from this model which is 90.7 megabytes to a 21.12 megabyte model now before moving on we should note that if we want to implement dynamic quantization here then we wouldn't specify this representative data generator so that's it let's get back we install this tensorflow light runtime now talking about installing the tensorflow light runtime once we already have this tensorflow light file right here if we want to run this in some other system say for example want to run this in our raspberry pi all we'll need to do now will be to install this runtime and that'll be it we wouldn't need to install tensorflow any longer so we just have this we run this that gets installed we import the the tf.light runtime then we prepare our test image so we just run this we've seen this already we have our pre-trained model we're going to get the argmax and then the corresponding class so that's it we should get angry see it matches with what we expect we could try out this other example here let's run this and there we go so this our model now we're going to use this runtime to run our tensorflow light model now we've restarted the session and you'll see that without tensorflow let's let's let's do this let's say tf zeros and one by two for example we run that and you see this is not defined so we have no input for now now let's take this off and then we get back up here we install our runtime import the runtime uh i think we'll be needing numpy so what we'll do is we're going to take this numpy so we try we tend to work without necessarily needing tensorflow so we have this imported as numpy that's it here we have this test image open cv so import cv2 that's fine uh well here we we make use of tensorflow but we will see how to get rid of this dependence on tensorflow so first of all here we have to note that tensorflow was used here to convert this test image here let's print out our test image to convert this test image which is uh an unsigned int with it be so here let's do this you see it's an unsigned int i wanted to convert this into a float and that's why we made this uh change here so here we wouldn't need this any longer and here we could use numpy so we have that and everything looks fine okay so we have this uh test image and that's fine so now let's run this and we have our image now image is not defined let's get back here oh no this is the test image test image run that again this should be fine okay so we have now our image and then here we see we have this interpreter which loads our tensorflow light file which we've saved in the drive and then we allocate tensors once this is done we move on to get the details the input and output details which we had from the conversion process now here here we have uh the unsigned int and here we also have the unsigned int and then you would see that um we have this test image which we would change the type so first of all you notice that this image is tend to a numpy array which we don't need any longer because it's already numpy and then even this type we do not really need to do this although um if you print this let's comment this section and if you print this input details and we get the data type you will see that we have an unsigned int now tf not uh defined well here we use a tf.light so let's let's have this run that again we have tf.light runtime has no attribute interpreter now what we'll do is we'll have this dot interpreter okay so we should have this and this should work now okay that's it now we have this we run this again and that's fine so you see we have you see the unsigned int which is what was expected because when doing a conversion we have specified that we wanted this uh data type for input and our output so that's it let's take this off what we're saying is we don't necessarily need this step right here so uh this will be useful if we had this as a tensor as a tensor tensor for tensor and if we did not have this as an unsigned int already so now that we have that you see we set the tensor so here we have our test image that's it and then the input details index you could print this out so you see what's in here input details index as you could see it's zero though now in this line we get an error we got three but expected four for the input so let's get back here um oh okay here we here we had this change to in let's have we actually let's get back so what we're saying is here we have this expand dims to get from three dimension to four dimensions and the name was still um so let's have that we run this again here we have our test image now which has uh four dimensions and then now this should work so here we set the tensor and then we run the inference so that's it we run the inference here and once we run the inference we should be able to get the tensor uh at the level of the output so let's take this off and then run this now takes a while yeah this there this is the inference uh process now it should be noted that tensorflow light has been built for mobile and embedded CPUs so uh general purpose CPUs like this collapse CPUs aren't the best match for tensorflow light models in terms of speed now we print out the output uh i could just let's print out the output see what's what's in there see there we go you show that we have this here the highest value then we could do np.arc max of this output right here run that again that's it uh now let's do let's take this here so we could get the class name automatically then now let's run this and we get the class happy so this is what we expected and again we've done this without having to import tensorflow so we did not we didn't need tensorflow once we already had our tf.light model you could see here tf not defined now our next step will be to measure the accuracy of our tensorflow light quantized model so here we do exact the same process as we had before we basically have the model path and then this input details output details and then we go to our validation data set take a hundred elements in our validation data set so this means that we'll have to import tensorflow for this process since we are trying to evaluate the model's performance so here we have this we'll need the validation data set so we'll need to get back and run this cells here so let's get back here we will run this there we go we will be running all the cells now when we want to import this when tensorflow is already imported we get in this error so what we'll do is we just have this here we run that there we go and then when we get to this accuracy you will notice we use that the tensorflow light so it's tf.light not tf.light so we're using this model from tensorflow and not from this package which we had installed here so that said now we have that set our accuracy we have input output as we're saying our validation data set test image which is going to be passed here the inference we get the output we compare if they're the same we increase accuracy if not we skip and then we move on but to increase the total and then from here we have accuracy divided by total or let's let's say positives divided by total or let's say correct correct yeah correct predictions so let's change this to correct so here we have correct predictions and that should be it so we we have this accuracy here and then we specify the model path let's get back here and then take this path of our tf.light model now we we have simply accuracy accuracy and then we specify that path okay let's have this and we run this now there we go we're done with computing the accuracy for our tf.light model and we get 0.82 that is 82 percent as compared to the 84 with the original model now to get the more accurate value for this it's advisable to use the whole data set so you could take this off and so now you have your model which performs at 82 accuracy and which now could be deployed in some mobile device hello everyone and welcome to this new and exciting session in which we are going to look at apis now api stands for application programming interface and in this section we're going to look at why we even need apis and also how they work now supposing you've just built this model right here let's call this a model m1 and this model can take in an input and produce an output the question is how do i make my web app like this one or my mobile app or even my desktop app access this model which i've built such that a user of let's say this mobile app can and just by pressing a button get access to this my model's predictions the way we could go about this is by making use of apis that's application programming interfaces as defined here on the g2.com website an api permits software development and innovation to be easier by allowing programs that is for example your web apps to communicate data and functions safely and quickly apis does application programming interfaces accelerate innovation because more developers can build products based on existing data and functionality getting back to our example this means that thousands of users now on the web can get access to our model and make predictions without these developers of these web apps or mobile apps necessarily mastering the art of model creation to even make these concepts clearer let's take this example so here we're supposing that you get into a restaurant you give a command to the waiter who then takes this command and then tells the cooks to make available the food you have in the command then once this food is ready it sends it back to the waiter that's the api and then this waiter now passes on this food to you or then eats it and is happy now in the case of computer software where we have say this mobile app which has to communicate with this api right here then this communication has to follow certain rules or protocol known as the http this protocol guides the way information is being transmitted via the web now http stands for hypertext transfer protocol so this is hypertext then transfer protocol now the type of data exchange could be text could be images could be video could be any kind of data which is understandable by both this client right here and the server also communication following this http protocol is connectionless that is each time this client needs to communicate with the server to know what objects are found in the particular image a connection is created between these two now once the connection is created and then the client receives the output that is the particular location where the objects are found in an image that connection is closed and so we we now have this which is taken off here then once we want to connect again with the api which is found in the server we still create or recreate another connection then also as we have said the data exchange here isn't of a particular type so really this protocol doesn't force you to pass or receive a certain type of data so far as the data you're passing or receiving is understood by the particular entity which is sending or receiving then everything works just fine then the last property of http protocol is it's actually stateless that is once you make a request and then you receive a response no client information is being stored in this request response cycle hence once data is being passed to the api and the response received you should not expect to be able to retrieve this data which was being passed to the server from this point we are going to go in depth into how http works so you get to this address www.postman.com you could sign in or sign up in case you do not have an account with postman now i have an account so i'm just going to sign in you get to this point here you see you have workspaces my workspace and then you have this link here which comes by default you have this url which is this information you pass in each time you want to gain access to a particular site in this case it's postmanico.com now url actually stands for uniform resource locator and so this address right here permits us get access to certain resources which are located somewhere in the web and this brings us to the http methods if you click right here you'll see that we have different options like if you have you have the get you have the post you have the put you have the patch and so on and so forth these are known as the http method and each time you want to get access to a resource which is located somewhere in the web you have to specify the exact method so like with a get for example we are saying that we want to get a resource for the post unlike the get method where our main interest is to retrieve some data here we submit an entity to the specified resource which often causes a change in state or side effects at the level of the server so this means that we could make use of the get method when we want to retrieve a user's information and then we make use of the post when we want to modify some data or add information so like for example if you want to get ready to start on a platform then make use of this post request as we're going to add a role on the database so suppose we have this database right here where we have the user's id we have the user's name and for example the user's password so here we have this small database and then we have id let's say id 0 we have the name fred and password let's say whatever value we have here now then we have id 1 let's say sally and then we have whatever password here then we could take 2 we have rita and we have whatever name we want to have here or whatever password we want to have here so this is it now with the get request we could just retrieve that fred has an id of 0 its name is fred and the password is a given password whereas with the post request we could add a new user so that's why they speak of constant change in state or side effects on the server so we could actually add this new user by passing information via this post request so here we could add this third and we could add say mac let's say mac and that's it so that's it for the post request now with the put request we could update this data so this means that here we could say we are not interested in maybe modifying the name but if we want to update this password we could change this password now to some new password so now we could update this row right here in the database now we could also delete so we could just simply take this off with our delete request and that will be it for that row we also have other methods like the head the connect options trace and patch which you could check out in this document right here but most times we make use of the get post put and delete then also those responses have some status codes which we have seen already so you maybe have seen 200 for okay so okay when you make a request and it's okay you could receive that let's let's make this request here uh url empty let's get back here and you see you see your stitches let's highlight this you see your stitches okay so this uh uh as we have seen here let's get back uh this as we as we have seen here http response status codes so you could again check in this documentation and have uh every detail about the different status codes and you need to understand how to work with them as they are very important when you're dealing with apis so here we have informational responses successful responses lying between 200 and 299 and that's why you see here we have this 200 status code which is a success and then we have redirections we have client error and we have server error so this error is coming from the person sending the api request then you'll be you'll be between 400 and 499 the errors from the server will be between 500 and 599 so when the server is down sometimes you'll see this 500 so you could check your internal server error the server is in color situation it doesn't know how to handle so you could check out all this in this documentation now let's get back here and make this error so let's let's put in whatever value one year write that again and you should see here you see 404 not found so you you get here you see 404 client error so this error comes from the client we are going to paste this out here and then select the post request get to the body form data this is the information will be passing in so here we're going to have email teams at neural learn.ai the password not a password and then job neural learns and then click on send okay what do we get you see we have this output right here we have our email the username the job and a token status 200 now you'll notice that unlike with a get request we have now introduced this body data so if we don't pass this body data you see that we wouldn't have the right output and this is because to login to just any platform obviously you need to pass in your credentials which in this case are the email the password and the job so this tells us that when doing an API call like this one so when trying to make the client communicate with the server we need to specify the HTTP method we need to specify the URL then we also need to specify the body which in this case is all this information right here that's the body information and the header information now this header and body information could be broken up into the request and response so this is the request body and here is a response body and then here we have the request header and the response header which you can get by simply clicking on this right here you can also check out a list of HTTP headers on this developer.mozilla.org platform right here then at a level of this response headers take note of this content type which is Gizen. Gizen actually stands for JavaScript object notation which is a very easy and lightweight format for storing and transporting data from a client to a server and vice versa. The Gizen format is programming language independent and so if we have this client with code reading in JavaScript it can communicate Gizen data to this API via this request which is reading for example in another programming language like Python and then after processing data the response in Gizen format will still be understood by this JavaScript client. Now to better understand the Gizen formatting let's look at this output which was generated after we made the GET request on this postman echo API right here you would first notice that it starts and ends with this curly braces so we start like this and we end this way and then we have information stored in key value pairs so we have the key and then we have the value and then each key value pair is separated by a comma so we have the first key see here key and then we have the color and then we have the value we have a comma and then the next let's say key one value one and then next we have key two value two and then separated also by a comma from the rest and so on and so forth so here you see we have this is key one here key one value one we have the comma and then we have key two this headers key two headers and then value two now value two years all this year so all this data is our value two so all this so value two and then we have a comma just like here we have this comma and then after we have key three and then value three now for a particular value we could also have some sort of dictionary and so in this case let's take all this off in this case where we had this empty dictionary this was quite simple but here we have this dictionary filled with its own key value pairs so here we have this key and we have this value and then this value is a dictionary made of its own key value pairs so you have the commas again and just like that so we have the commas uh separating each and every key value pair and then here you just have this string so basically what we have in here is we have in a key which is a string and we have a value which is some variable now this variable can be a dictionary like in this the first two examples or it could be a string or it could be an integer or even a boolean now if you check on this other request here the push request you'd see this same formatting so we have the curly braces which open and close and then we have each key and its value the key is value key value and finally key value now this was for the response so this was the response body let's go ahead and check at the request body request body we had this form data but actually we could have raw gson data which is passed so instead of using this like basically using this gui we could have this raw and then pass in gson data so you see here it's actually very easy to to write this out let's have this year let's check back at the form data we have email password and job um yeah we have the same so we have email there we go we specify the email uh neural learn dot ai and then we have the password our password is not a password so not a password there we go next on final one we have the job we have here new uh learns okay so we have that so notice how here we considering this raw data which we're going to pass in and we're going to see the output we get here so let's send this and what do we get here's what we get as output you see here unsupported major type text planing request so this means that the request we sent uh was text type and not the gson format so you'll notice here we have text which was selected and we just simply have to change this to gson so changing this to gson what do you notice is he changing color that is because this now knows that here are the keys which are in red and the values in blue it takes back to you see turns back to all black let's get back to gson and then re-run this so click on send and let's see the output gson parse error check that yeah let's take that off and then send again okay so you see now that once we correct that and send this gson data we have this output so basically using this form data right here which we could now take off let's take this off is the same as passing this raw gson data so the rule of postman is to make it easier for people to test their apis using these kinds of graphical user interface but you could always actually do it by passing this raw gson data as you can see here and so now you have the status okay whereas when we had this text let's send so you could see the status you see 415 unsupported media type let's get back to gson and send and that's what we have you also notice again that level of the headers we have this gson format so the body the output we have here is gson data then instead of passing this let's add in some random stuff here and click on send what do we have we have for not found but we'll notice that the content type that we have now is no longer the gson but instead html and when you check out the body you see we have this year preview page not found this shows us that we can have different content types exchanged between the client and the server hello guys and welcome back in the session we'll see how to build our own apis using the fast api framework in python language previously we have seen that we could wrap our deep learning model or more precisely our object detection model into an api and then let this clients consume this api that is allow them send requests like this and then get responses so yeah in black we have the requests and then now we have the responses in red and all this is done using the http protocol and so now that we understand the concept of an api and how this can interact with millions of clients everywhere in the world we are going to move straight to building our own api and to build our own api we are going to make use of this python framework known as fast api now other python based frameworks which can be used in building apis are django and flask but in the recent years the fast api framework has gained much popularity amongst python developers thanks to the fact that it comes with some key features like its speed so it's a high performance framework time taken to code is reduced that's now developers could now build features even faster with fast api you even have fewer bugs it's more intuitive it's easy it's kind of short robust and it's standards based so because we need all these features while building our object detection api we kind of have little or no choice but to turn to the fast api framework and you're going to see how easy it is to create or build a fast highly performant and robust api very easily using the fast api framework first things first feel free to check out this documentation at fastapitiangulo.com working with the fast api documentation doesn't really feel like a usual documentation as most of the concepts are well explained and broken down to enable just anybody with minimal python knowledge understand and use fast api efficiently so right here we have the features fast api people could get straight into this tutorial here user guide you see you have this well-written user guide you also have the advanced guide so once you're done with this user guide you could get to the advanced user guide you have special topics like on currency deployment project generation and all this and so before we move on don't forget to start the fast api github repository one of those cost purposes is having used python at a very basic level and so we're supposing that you've already installed python without which you could head onto python.org download and install python based on your own operating system so here you could see we have python installed python version check that out 2.7 and then python 3 version we have 3.6 this should be 3.6 okay there's an error there we have 3.6 okay now the next step will be to install fast api so we head back to the documentation you have here this installation all you need to do is pip install fast api all you could do pip install fast api all so you get other optional dependencies and features we'll start by installing fast api we have pip3 install fast api there we go api now installed we could do python 3 and then you import fast api you could check out the version right here we have fast api and then we have the version okay so that's it 0.78.0 so the next problem we may face will be what if we have this project this object detection project in which we have the fast api version 0.73.0 for example installed let's say 0.73.8 installed you have this other project on optical character recognition which needs a different version of fast api that is 0.75.0 you'll find that with this there'll be a conflict as what we've installed is this 0.78.0 version now the way we could solve this is by making use of virtual environments the way virtual environments work is you could have this project this object detection yellow x project have its own isolated python environment with its own interpreter where you can install these libraries in this python environment without it affecting this other project which is with another or which is built in another virtual environment where here we have a different set of libraries and library versions installed so this means that in this environment here we could install a version of numpy say let's say whatever version and then here again we could install another version of numpy which could even be the same as this version but the difference now is in this virtual environment here we have a version of fast api which is 0.70 0.78.0 and in this other virtual environment we have this other version which is 0.750 and now this problem of conflict at a level of the dependencies is resolved now that we want to create our python virtual environments we are going to make use of this python model which is the python vamp model so here we'll simply do apt install the python python3 vamp and that's it so here yes okay and that should be installed now we have the vamp installed let's get into our neural learn projects directory and then from here what we'll do is we are going to create our virtual environment we have python3 vamp and then let's call our virtual environment vamp emotion detection detection there we go we should have this created and from here you could do this and you see that we have this vamp motion detection here created let's get into this vamp emotional detection directory so let's do vamp emotion detection and there we go you see we have bean include leap leap 64 by vamp and share now let's get into the the bean so there we go we have bean and then you see we have this activate right here now let's get out of this and then let's do source oh well let's get back again another step so let's do source bean activate and we get in this error well let's get back into vamp emotions detection and then let's redo source bean activate and there we go so one thing you could notice now is the fact that previously we have this root and this but now we have this vamp before this so we see that everything we're going to be doing from now henceforth will be in the context of this our virtual environment now let's do python3 there we go let's import fast api you can see that we have modeled not found so it's telling us that fast api is not installed now let's get let's exit and then deactivate the way we deactivate by simply typing in deactivate and you'll find that when you deactivate you do not have this any longer so we are no more in the context of the virtual environment so now let's do again python3 and then let's import fast api you will find that here we have fast api installed you see we could also get the version you see we have fast api installed but in this virtual environment fast api isn't installed now let's exit and get back to our virtual environment so we have source bean activate there we go and now let's do keep install fast api so one thing you could notice the fact that we go out of the scope of this visual environment we have this version of fast api and now we could install another even different version of the fast api so that's it we install fast api and now that fast api has been installed like we could do let's in this visual environment we could do import let's do iton3 and then let's import fast api and there we go so we have fast api now installed we could get this version and you could see clearly that this version is different from the other version of fast api another thing we could do is create our virtual environment with a specific version of python so here we could have python3.8 and then we have venv let's say emotion or let's say emo detection okay so there we go we run that and then we create this new visual environment where the default version of python is that of 3.8 now let's again have this you see we have this here venv emo detection and then we could get into that emo um detection and then let's activate this um there we go and see this right here now when you do python you see that the default version is this 3.8.13 so again here if you try to import fast api you see model not found let's exit and then keep install fast api all and run that you'll get to see now with the speed freeze that we have many more dependencies installed because we decided to do that so here you see we have type in extension starlets nephio uh pydantic which is very useful and those others now the next step will be to install uvcorn and uvcorn is an asg server so asg here signifies asynchronous server gateway interface being a http server this uvcorn right here is responsible for taking requests from different users so supposing we have these three users right here the one to get for example some predictions from a model which is here in our fast api which we've bundled in this our fast api code so what we'll go on here is this user for example will make a request see to this server and then the server gets that request interacts with fast api obtains a response and then sends back this response to this user now if we want to connect with this web server locally we could pass in this ip address here which is that of the local host and then we specify a specific port such that we now here does the client sends the request which contains the method which could be for example let's say a post method and then the request body and headers such that once this web server receives this it processes the information and produces the required responses we are now going to dive straight away into some particles so here you get this fast api you have this simple tutorial here which we are going to follow and see how to easily work with fast api right here we have this which we are going to copy to open up the code with visual studio we'll get back and then we'll just write code and there we go so um now let's actually get out of the spy user mode and then we get back in here um neural learn there we go and then now let's just do this again and you see we have our visual studio which pops up we have our neural end projects folder and we could create this new file so here we have main.py create this new file and then we paste out that code now if you're working with vs code is recommended to install this python extension so we just go ahead and install this extensions see we have that installing let's get back to the code let's just get back to this so here for example let's like this so it's clear here for example you see we try to create a file main.py and with this code we are going to put in there now to keep things simple we are going to comment this region and then also comment this right here so you're not going to make use of this we're going to comment that and that should be fine okay so we we just have this portion of this part here now you see here we have this app which is an instance of fast api see we've imported this from fast api and then we have this decorator here and we have this read root method now this read root method we could actually change this so we could say read neural learn root or whatever name you want to give it so let's call that read neural learn or let's just keep it short let's just say read so we have this read method text and nothing and then it returns this dictionary or this gson hello for the key and world for the value we have this decorator right here where we specify the meta type recall we had different methods like the get the post put the delete and so on and so forth so here for example we specify this method so this is the get method and so if we are making use of this get meta right here we have this path here we have this path and then this means that if our web server is hosted on an address for example let's say htps whatever whatever.com then this root path will correspond to this path right here and then if we want to do say a login then we'll have your login this login path will correspond to having this login right here obviously we'll have a post so change this to post and that's basically how it works you see how easy it is for us to create these kinds of paths with fast api and also specify your methods very easily so here we have that and we could have slash or let's say a whatever we want to have there and then all we need to do is we have this here a now obviously we could have some variables here and we're going to look at that shortly with this other example below anyways we understand that we want to get to an address here whatever.com or let's let's even say google.com let's say we're trying to build google.com then we will just have here this and that'll be and so with this now or specifying this is going to get into this method and return this gzn now to run this we'll just use what we've been giving here in the documentation let's copy this out and simply paste this out here so here we're going to open up the terminal so open up this new terminal there we go clear that's it and we neural and project so make sure that you're in the same folder as where your main.py is found we could do uvcorn and we run that error as the app attribute app not found a model main so let's check this out let's save this and then run this again there we go see now it works we have started server process waiting for application startup application startup complete so it waits and is completed and now uvcorn running on this address on our local host so what we could do here is we simply control and we click open this so control click and it opens up this page right here there we go the first thing we notice is this message where we have method not allowed but why we why will we have this method not allowed message the reason why we have this is because here let's open this the reason why we have this is because here in the code we actually use this wrong method see we are supposed to use a get method instead so we should change this and have this get so let's save that again and then we would stop this and then run that again click on this open this up and there we go we have as we expected the JSON output here we have the id or like the key hello and the value world now another thing we could do is we could get back here let's stop this see we stop that and then let's do um let's say entry point let's have that and we save this so we save this and then we run this again and then we click on this and you see little not found so here because we do not have this path we output not found so let's let's do this let's say entry points and run that and you should have an answer um let's check this out here we have entry point and then here we have entry point entry point um let's get back here and then we should have this to resolve this issue we have to put the slash before the entry point so you have to be very careful with the way you write this now you save that again and then you reload but aren't you tired of always having to reload i guess uh as myself you would also get tired of doing this so let's do this instead of having that we'll add this reload here so that once we save or do modifications and save here automatically our server is reloaded so now we'll we'll run this uh application startup started let's click on that we have this and then we do entry points that should be fine so that's it we have hello world as we expect now let's go ahead and update this let's just say entry you see when you save now you see here you see it's shutting down waiting for application shut down application shut down complete finished server process started server process waiting for application startup and application startup completed so now we have this uh which has been reloaded so unlike previously where we would have to stop and then start again now uh once we make the modifications it's automatically reloaded so let's get back here oh not here let's get back here oh not here actually let's get back here and then we are going to have uh let's run this you see doesn't work but when we do entry it works now and we didn't have to stop the server for that the next thing we'll look at is another interesting feature of fast api which is the swagger ui so here we'll save this reloaded our server and then we get into this and instead of having that we'll do docs so you see we have docs so we have this page which pops up and you'll notice that we didn't need to specify that in our code so let's get back here and what we see is we have this uh method here and then when we open this up see that does a path and then we have the parameters there no parameters so we have this simple get request and then we have the response so here the the code 200 and that's a successful response and then the media type uh gizen and that's it so now let's go ahead and try this out so you see when we try that we click on try that out and then this comes up we click on execute and what we have is this output right here so here you could see the response body and there you could have the response headers so that's it you could also repeat this on the post map so we have that you see the request url and then you have the curl um this you could use also to make the api calls so that's it we see how to make use of these docs and now what we'll do is we are going to uncomment these two parts here so here we uncomment this and then we'll uncomment this part and then we save this and then you see here we have these items now we will notice that when we get back to our code and then we refresh this we have this automatically put out here so this means that no matter the number of different parts we create we can automatically test them without any extra effort thanks to this fast apis swagger ui so like here for example you not only can test but you can also see the documentation like here we told that oh this is an ism id it's an integer it's required then this is a string cure query it is optional because here you see it's not required and then you could try this out so let's go again try that out we uh you see well let's cancel and then you see you cannot do anything here once you want to try out you could put in an id so let's let's not put anything there let's just execute you see we must put out something so let's have that 20 20 uh the string let's let's leave that for now and execute so what do we get as results is uh this output item id 20 cure null now if we modify the code right here so let's modify the code and then let's just say uh item id save that we get back here and then let's re-execute actually let's re-execute what do we get you see it's auto automatically we have these modifications and then now we could do this let's say hello we execute that and you see here we have our optional string which has been shown to us in this uh gizen so we had seen the gizen format already here we have the key we have the value the key we have the value and so on and so forth and then also we have the response headers getting back to the documentation you could see here let's scroll back up uh no let's let's get down you could open this up you see the main the file the app the object created inside of main.py with a line abacore files api reload make the server restart after code changes only do this for development so uh they're saying that if you're in a production setting that if you've deployed this as we'll see in the next sections where you want to deploy this you don't want a situation where you get to have this reload automatically so from this now you see uh this basically explains what we've said already uh receives HTTP requests in this path and in this other path so we've seen this already now both paths are get requests so we have the get methods the path this path has a parameter item id that should be an int this other path has an optional string query kill so that's it and then here you you you they get to talk about the swagger ui and then you also have this alternative doc api docs here which is redoc so basically what we'll do is we just get here and then instead of having all this let's say we have that we just do redoc we run that and you have this alternative uh ui here where you could still again test this apis so that's it uh you could test this very easily you see we have that and then you should be able to test this now you have the response samples from here so for this you have this response sample uh you have the other successful response or the response is unsuccessful where we have this 422 now you could check out in the Mozilla platform where you see the exact meaning of this 422 so let's let's check out developer mozilla 422 so you could always get back to all this so you don't have any issues understanding any code you receive as output so here you see unprocessable entity and that's it that's it for this here we have different responses that's fine the next feature we'll be looking at will be that of schemas what we'll do here is we'll simply take this off keep it simple we'll not use this and then we'll change this into a post request so we have this post request right here which takes in a body obviously we have a body and then it returns the elements in this body now here we'll take this off and then we have our post request so there we go we have our post request it's no more read item but now add item now we here we're going to have an item which is going to be of type item so this is going to be a special type we're going to create now here we are going to import pydantic or from from pydantic we're going to import the base model so that's what we're going to do we're going to import this base model from pydantic and then we're going to create this item this class item which is going to be a subclass of this base model class right here so we have this item and then we have base model so that's it and then here we're going to say okay we want that an item should have a name and a price so we have name which is a string and we have a price price let's let's say integer let's say we our price and integer okay so that's it so there we go well what we're going to receive here is this item of type item it's no longer the usual types but now this special type item which we just created so now here we have item and then what we'll do is we're going to return this so let's call this item name here so we have item dot name so we get the item we return its name and then we also have the item's price so item price and then we return the item dot price okay so that looks fine let's save this and check our code here or check this API let's refresh this and then here we see we have our post we check that out let's try this out execute what do we get we get in response body item string item price zero now if we come here if we get here you see a level of this name and then we put let's say back and then price let's say 20 and then here we execute what do we get you see we have back 20 if there's something you can notice is the fact that this request body here automatically takes this name and price so get back to the code you see that if we modify this let's let's say we want to get the name want to get the price or let's say we want to get the cost or let's say we want to get a discount so let's say this is discount on the item discount and then let's put this int okay we save that and then we get back here we refresh this try that out and you see discount automatically is here so the role of this schemas is to provide a model which the request body has to always follow so this means that if you get back here let's get back here and let's take one of this off you see take one of that off we execute check out the response you see here expecting our property name and closing double quotes all of that you see here we have this unprocessable entity error now this is because here at the level of the request body we didn't specify the discount which we've already said in this schema that we need to always have so doing this means that each time anyone has to make a call on this api on this endpoint right here that person has to specify the name the price and discount for the request body now we could also enforce this kind of measure at the level of the output so we could specify a response body and then we define say let's say this is input item let's call this input item and then here we would have output item so we want to create another schema here let's copy this and then we're going to say that okay we want this output to be the price minus the discount so here we have output output and then instead of the price and the discount we just want the selling price so here we have selling price and that will be an integer okay so we have that now we have the output we have the input and we have the output schemas they're both subclasses of base model and then here what we'll do is we'll say okay we want to have the selling price we'll define the selling price to be the item price minus the item discount okay so we have that so we have this now and then here we have the item name still and then we have the item we have to be careful the same as this so we have to say selling price anyhow let's leave that for now let's just say we have here selling price so let's say we have selling price and then let's save this and check this obviously already post got an unexpected keyword argument response body this is actually a response model not a response body so that's the output response model let's let's let's change this to input so now this input item for the request and then this output for the response now let's save this that's fine let's go ahead and check this out so here we try this out there we go you see as usual that's fine and then let's let's give this a price let's say 20 and no discount and then we execute so what do we get your internal server error now let's get back to our code and check out what's going on here field require this missing now you see that everything looks like normal but we told that there's a field that's required at the level of the output and the reason why we have this is simply because this output expects to see selling price you see it expects to have selling price so because we do not have selling price here we get in that error now let's save this we get back here i'll refresh that and then try this out again execute no let's let's put 20 anyway it should it should be the same it's just that we want to have to see the difference in the or to see how the selling price is computed here again we have internal server error now this looks funny let's get back and see where we get an error we told field required and it happens that we even do not specify this correctly so here this here's name price discount and this is name so here we should also have name so not item name we save that again we go ahead and check this out here um let's refresh that try this out let's just execute and that's fine so you see we have that now and this is because we've put out our outputs correctly so let's have here say 200 and then 180 or let's say 20 and then we execute that um and see we have this output so that's it we see how thanks to this kimas you could now build apis which will meet the developer or whoever's making use of your api to have to specify this without which nothing is really going to work so this means that you could have a situation like here let's let's see we have this input item and then we let's let's see that again and get back here we refresh this page and then let's try out while taking off this so we take this one off and then we go ahead and we execute you see that the output here 422 is a client error so this means that the client knows that this error is coming from his or her side now we get back here and then before we move on we are going to define this structure or this code structure for our first api projects we'll be working with so what we want to do is separate the apis that's actually the end points from the core now when we talk about core we're talking about let's get back here we're talking about carrying out computation like this so you see this computation we want to have some separate file which is in charge of doing this in the case of some deep learning model want to have a separate file which will be in charge of say taking the inputs loading the inputs passing this input to a model doing some processing and then from there now the output from that is asked on to this end points because here this one end point and this another end point so this like read this add item now getting back here you see this like having read here this one endpoint read is under endpoint add item this may be under endpoint say update item so like root you have get items post items and so on and so forth so now at the level of the core we want to have our business logic in our case for this simple example our business logic was basically just taking the the price and then subtracting the discount so that was our business logic but we want to have this in a separate file now for the schemas we want to also have this in some separate file so we want to have this kind of structure where we have the core that's the business logic schemas and then we have the api in charge of all the different end points now to do that we are going to create our service so from here we have we're going to add a new folder which we'll call service and then in the service we're going to add a new file which is no basically let's copy this mean into the service here so we could do this we could copy that move that now let's stop the server okay we've moved this into our service and then in the service we have a new folder which we'll call api and then we'll create another the new folder which we'll call the core now in this call we have a new folder which we'll call the logic and we also have another folder which is called the schemas schemas okay so we have this no this is this is this call has this new folder which is called schemas and not the logic so let's let's delete this folder here let's delete this move to trash okay so we have that we have the car which has logic and schemas and then we have the the api now which will have different endpoints now given that we need to carry out our inference making use of this quantized onyx model we are going to start by pip installing onyx runtime so here we have pip install onyx runtime and there we go now we're getting this error so you have to make sure you run this as a super user we now we run this as a super user we have pip install onyx runtime that's now installed you see we could do python and then import onyx run time and that's fine okay that said let's do pip install open cv python we could also check that out let's clear this and python and import cv2 and there we go so you see we have now the onyx runtime and open cv installed we will then move on to create the endpoints so in this api we have this new folder which we'll call endpoints and then here we'll create new file which we'll call detect.py and in here again we have another file which we'll call test.py now this detect.py here will play the role of actually being that endpoints through which we are going to carry out the detections and then this one will be the endpoints through which will test if our api is alive or not now in our detect.py we are going to import the api router so from fast api import api router there we go so previously we had imported fast api but this time around we'll be working with this api router then now we'll create this object emo router of the api router class where this will play the role of router information from the detect.py to the main.py now we have our emo router decorator and then we have we specified i want to have this get method and our path will be detects okay so here's our path and then now we could define this detect method in here with this detect method given that we want to carry out detections based on input images we're going to have as input this time around this image and then you'll find that given that it's an image it's going to be of type upload file because the user would have to upload this so that's the type upload file we we have that we could simply import this from here upload file there we go and then once we once we define this we can now get into the body of our method so from here we could start by just let's say we output return return say hello hello world so let's just return this so now we have this method which takes in the image which is uploaded by the user and then returns hello world now to dive into what this detector is going to be doing we'll need to import this image from peel so from peel you're going to import image there we go so we have that imported and then we are going to open up the file so let's let's say we have our image we want to get our image image there we go we open up the file so we have image.open the file which has been uploaded by the user so we have em which is this um dot file the read but this um open method here takes in bytes so would have your bytes bytes IO will convert this to the byte form and then you would simply import that so we have from IO we import um bytes IO and that should be fine okay so that's it we now have our image image which would then convert into a non-py array so now we have image equal um non-py array which takes in this image and then now instead of returning this we could have this method emotions detector emotions detector which takes in the image so it takes in this non-py um array or this guy takes in our image and then is going to tell us what class that image belongs to so for now we have this let's import um non-py let's import non-py so we have um import non-py as np okay so that's it we now have this imported we are also going to check if the image or whatever file we are uploading is actually an image file so we get the the the file name we have the file name and then once we obtain the file name we try to get this extension so right here let's take this off we have that um split method then we obtain the extension and if this is um or it's part of this that's if it's gpg gpeg or png then we know that this is an image and so we're simply going to pass um if this is not a case then we'll raise an exception so here we have um race race HTTP exception and then we will make sure we import this um HTTP exception then we specify the status code as 415 and our detail is not an image so that's what we're going to do before calling on this emotions detector meta right here now before we move on let's let's pip install on pillow because you could see here that we have this um we have the pillar library which is yet to be installed so we're just going to do here pip install pillow there we go while that's installing also take note of the fact that this is meant to be a post request and not a get request so we replace this with post now that we have pillow installed you see we do no longer have the warning we had before so that's it now we could um check out on our test.py this time around we're going to create this test router so here we go we have a test router and then we'll just call this test um testing and here we'll take this off and this is a get request so here we have this get request and we will have this test so this is our entry point and then we could take all this off um there we go yeah we now have we're going to return um testing um oh let's see yeah testing um testing so that's it we have testing testing ipad working so this is what we're going to output when we um have this or pass this through the browser now that we have these two endpoints defined as this test and this detect right here what we are going to do is we are going to create in this api folder the api.py file so let's create this new file api.py notice how this um two are in this endpoints so detect.py and test.py are in the endpoints folder or directory and then this api.py is found in this api directory so now in this api.py what we're going to do is we are going to create this main router and the role of the main router will be to include this detect router and the test router which itself now will be included in the main app so as usual we have to import this api router from fast api and then once we have this imported we're going to import the detect router and the test router so from service remember we've created this already api endpoints detect our detect.py right here we are going to import detect router and then let's copy this space out here now we're going to import the test router from test so there we go so we have the detect and the test router is now imported the next thing to do will be to create our main router we have this main router um api router which has imported already and then now the main router is going to include um the detect router and it's also going to include the test router so we have main router including the detect and the test router so there we go again we can access the app via the detect or test or maybe some other let's say some lambda router right here and then all of this get included in our main router which now is also included in the app back to our main.py we have from fast api this time around we're going to import fast api take that off and then now we we import the main router so we have service api.api import main router there we go so we have this main router imported as usual we have the app fast api then we could specify the project name uh there we go emotions detection and then we have to include now our main router so this time around our app now or with our app we have the include router method which takes in the main router so that's it we have this um now included and we will now turn all this into packages so at this end point we're going to create a new file um init.py there we go on the api api new file we have init.py then for the core we have under logic we are going to create this new file init the py schemas new file init.py under core oops core new file init.py then finally for the service itself we create this new file we have init.py so we have our service package api package and core package now remember we have this emotions detector or meta right here which we had to define we're going to get that defined in the logic so on the logic we're going to create a new file we'll call this onyx inference.py there we go and then here we'll simply create this method um emotions detector emotions detector which takes in the image array and then we're going to follow up this simplest test we had seen already from the collab notebook we start by the import then we specify the providers then we there we go we have the quantized model the image path the test image we just created and then now we finally run to obtain our predictions so let's just copy all this out and then put out in the code there we go let's paste it out here um back here there we go and then finally we have those predictions so that's it now let's make sure we import onyx runtime see onyx import onyx runtime onyx runtime and then also we need to import open cv and what do we need again we need numpy so import numpy as np um there we go so yeah we import onyx runtime import open cv and then numpy now this is our our method which we are about to define let's send this this way there we go and then now we're going to return this output actually it's practically this here so let's copy this from here um let's copy this from here and then paste out here well so that's it we have this now let's modify remember what we'll get in here will be the image array and so um we no longer need to specify this image path let's take all this off take this off and then from here we would have this uh quantized model there we go we have our quantized model we've already specified the the providers so this our quantized model for our test image you see we're going to take this image array and then resize it so let's take this off from here we have 256 by 256 take all this off and there we go now we'll import this onyx runtime as rt so that should be fine now um that said let's get in here and see what we have um we'll print this out well let's just let that for now and then we see we have our predictions as we know already and then from here we have the output now one thing which is left out is this um this file actually we we get to get or put this file in this right position we've copied this onyx file here which we trained already and then let's update this then uh once it's fine let's get back to our detect.py and make sure that we import this emotions detector so here we have from service.core.logic.onyx inference we're going to import emotions detector so that should be fine we have now import this imported let's check that out and then let's save this uh get back to onyx inference and then now at this point we're going to say if if um np argmax onyx spread which is this here let's copy this and paste out here if this is equal zero um then what we'll do is we're gonna have the emotion here let's have emotion um set to um angry happy or sad so here we have emotion equal angry uh yeah that's angry and then um let's copy this well let's say let's have this here let's copy this here paste out here now lif lif this is equal one then this is happy and else the emotion is sad okay so that's it we have angry happy or sad and then what we're going to return here would be this dictionary where we have emotion and then the output will be simply the emotion so that's it let's take this off now there we go we have that set and when we get back to detect.py you see that this is fine we're going to pass the image and we're going to get the corresponding emotion now we have all this set what we could simply do is um run our server before launching the server let's add this root endpoint right here and then now let's do let's launch the server uvcon service main app reload so that's what we have we're getting this error cannot import name detect the detect route from service api endpoints detect well what we'll do now is we're going to create this new folder and we'll well this shouldn't be in here this is going to be here so in um neural end projects we'll create this new folder um new folder which we'll call emotions detection and then now we'll copy this service in our emotions detection folder um let's move that there we go so now we have um in emotions detection we have the service we're going to relaunch the server again you see we still get this error now let's get into this um detect file you see here we have this you see um it's actually emo router and not detect router so if we get here we should have emo router there we go emo router and here we have emo router okay so uh we reload that we took another important name test router well let's get into tests uh we have test router right here this should be fine now that we're getting this error let's make sure we have this here so we we created this um main router which is an object of api router class we save that and then um this should be fine this time around so application startup complete gets into the local host you see we get what we expect we have hello world so this is uh what we get when we pass this in our browser we now open up the swagger ui you see we have this detect here as expected we have testing and then we have the root okay so that's it let's open up this detect and you see we could try this out then we could pick out a file which we could actually test so let's go ahead and test with this file let's open this up and now let's execute so let's check out the output field to execute this and so that's what we get in we get in um an error let's get back here one thing you could see is the fact that this you could request your error right here isn't what we expect we expect to have a slash before the detect so let's get back to the code and then modify this so take this out from here save this and that should be fine so we should have the same with test save that and yeah okay so that's fine so let's now get back and then test this out let's refresh there we go we have detects we want to try this out we want to choose a file let's go ahead and pick a file this file and then let's execute okay so let's check this out we get an internal server error let's get back here and try to see where we have the problem so we've been told that this file doesn't exist all those will make sure that we put out the right path now we have service and there we go so we have this efficient quantize dot onyx model which is in our service directory you could find service right here so that is it let's save this let's save this and then let's go ahead and retest this so reset console try out and then let's retest this once more execute since that's going to work this time around let's go ahead we still have an error now let's get back to the code and try to understand why we have this error we we told invalid rank for inputs input got to be expected for please fix either the inputs or the model so this what we are getting here now um let's get back again to our code we told that what the model gets isn't um what is expected so let's print out the shape of this image array right here print out shape image array shape and let's save this and then try out run this again um reset console try out and then we test this once more just execute that and obviously we have the internal server error but now we want to see exactly what gets into the model so let's scroll and we should get the output okay so this the output we get now you notice that here we have this um image which is of shape 90 by 90 but what we would have expected here should have been 90 by 90 by 3 so it's um a grayscale image when we expect an RGB image now what we'll do is we'll say here that if um the shape is grayscale or rather if the image is grayscale then we are going to make sure we modify that so if the length of image array um shape is equal to then what we'll do is we'll ensure that we convert this from gray to RGB so let's do this uh let's go this way let's do this and then let's print uh what we have from here let's copy this and then paste out here okay so uh there we go we're gonna paste this again uh so we see what we get after we convert the gray to RGB so that's it let's save this again let's save this this should be fine so we'll save that then we get back here okay so let's test this again out um reset cancel try out pick the image execute and there we go you see we have the emotion happy right here so we have the expected response so we have your emotion happy now let's get back here and see see we have 90 by 90 now it's 90 by 90 by 3 after the conversion and then we have um after it's resized we now pass into the model and we have the expected output so that's it let's um take this off and that's it so there we go we've just tested our api and we see that it works just fine now we could take off whatever prints we have here take this off take this off and there we go we'll then move on to our schemas where right here we'll create this input.py and this output.py file now in our output.py file we'll start by importing base model then we'll create this api output class which inherits from base model and then we'll specify um what we want our output to look like now given that we've seen already from here our output is this string emotion so let's get back to the code and see that this is exactly what we want to have so let's save this or let's let's say we put this as an int and save this then let's go ahead and test so let's um well before doing any testing we have to make sure we integrate this with the detect.py so let's get into this here endpoints detect.py well you have this response model here but normally this isn't there so before we didn't have this so we we should have this now you see response model and then specify api output so um you are controlling what or how this output here should look like so now we have that make sure you you import api output from here and that should be it so let's save now again and then let's test this out so we execute and then we check out the response you see we get a response internal server error which makes sense value is not a valid integer so you see that we are able to tell our api what we want our output to look like now let's um get back to our schemas let's take down your output and then modify this and say we want a string so let's save that now and then go ahead and we test this so there we go we execute and we check out the response you see now we have exactly what we expect so that's the role of the schemas now we're not going to um do the same product inputs just um we just do only for the output and then from here we'll move on to the next part we'll then go ahead to measure out the time taken by the model to produce the output so we get into this onyx inference.py and then just before this well let's say at this level once we obtain our image we're going to have time init time init which is going to be equal time the time so we get a time this initial time let's import time right here import time there we go and then once we import time and we have said this time init we want to get the time elapsed so um after we've gotten the prediction we want to get a time elapsed we have time elapsed which is simply the current time minus that initial time and then for the outputs we want to have this emotion and also we want to get the time elapsed so let's have this well let's return emotion and then we also want to return this time elapsed which is simply our time elapsed we'll convert this into a string so we have that now let's save this well this should be time elapsed let's save this and then now let's go ahead and retest so right here we're going to uh execute and see what we get we get in um this output but we're not getting this here so let's let's reduce this and save this again let's get back here let's save this again um we run this execute there we go we're still getting this without the time elapsed well what we'll do now is we'll get back to our schemas get back to the output and then include this time elapsed and it's a string so let's save that and then kick back here um there we go we are going to execute and we should get this response so you see it takes about one second to produce our output now that we've seen that it takes us about a second to produce the output what if we loop farm means to reduce this time so let's get back into our code you see we have the provider specified we have the model from here and then we pre-process the image and then pass the model pass this input into the model and then we'll obtain the output which is um this now this model loading process here we don't know exactly how much time it takes so what we'll do is we are going to include this time elapsed loading so we have time elapsed as a total time elapsed for loading and running now want to also get the time elapsed for just loading the model like this so what we'll do is now we're going to include this in the output we have um time elapsed loading and then we have time elapsed loading okay so we'll include this in the output let's save that um even output.py we have this already so we have time elapsed and time elapsed loading let's save this and then get back here we execute and let's check out the output okay so you see here we have about 1.1 second total time but the time for loading is 0.72 so this tells us clearly that a huge portion of this time taken to produce the output is wasted on the model loading see this about 66 percent of the time is for model loading even though this is 1 this is 0.66 so let's see how to load this model once we launch our API so instead of loading the model each and every time we make an API call what if we take this off from here so instead of having this here let's take this off we are going to put this in our main.py um main.py here we will have the main.py we will have our model loading so import import onyx runtime as rt there we go then um here now we we're going to load our model so we have our model now loaded here we'll save that and then get back right here and then say okay every time i want to make use of this we're going to have this here we're going to import import um we import service.main as s so we have that service.main as s and then here we'll have s dot model quantized so we've taken this from our main.py from here so that is it we save this now and we had we still we took off let's take off this time elapsed loading we don't need this any longer because we we no longer load the model well let's um let's cut this from here and compute the time it takes to pre-process this image here so time elapsed pre-process take that off then let's modify this here time elapsed pre-process well let's okay pre-process so we have now this taken off pre-process okay so let's get into the schemas we have output time elapsed pre-process save that then go ahead and run this right here so let's execute once more there we go we execute and we check out the output you can see here time elapsed now is 0.45 seconds which is um about half or more than half um of the time we were taken previously so or less than half of the time were taken previously actually so here we have the time elapsed for pre-process this is actually very small so all this was almost of this was for the model to compute the output so let's let's rerun this again let's execute that again see 0.49 seconds now we've we understand how this works let's stop this run this again and then check out the time when we just um start our server um execute and let's go ahead and check the time okay so you see you have 0.41 second that's practically about 400 well 450 milliseconds so that's the time it takes to produce the output now we went we left from one second to now 0.45 seconds one other great advantage of working with the fast api and building your apis is the fact that you could build um asynchronous apis very easily and it's just by simply specifying your this async keyword so here we have async save that um test here async there we go and the main we have um async and that should be it so that's all you need to do to um make your code asynchronous and so this means that when working with fast api if we have this task for example this one which takes much time to be completed what goes on is this other tasks can be run asynchronously and so while waiting for this text for example let's draw here while waiting for this task we could already start working on this other task and then while waiting for this other task to be completed we could also start working on some other tasks this means that overall we have taken up this time to complete this three tasks whereas for synchronous code we'll have to wait for each and every text to be completed before taking on the next one so you see the the time difference between these two different methods we have here the difference we see this difference here so this again in time when working with fast apis asynchronous manner of running code nonetheless it should be noted that there are some tasks which are compute bound that's the actual cpu bound meaning that even if you have to run this code asynchronously this first task will still take up so much time so you would end up with something like this so there will be no real gain even though you had to run this other task asynchronously and this is very common in computer vision and since we're dealing with computer vision more specifically object detection we have to understand that even though our code is running asynchronously if at the level of this detector py we have this call on this emotions detector and we're still waiting for that to be completed no matter that we've already completed these two tasks right here we'll still have to wait for this to be complete so for such cpu bound tasks like computer vision machine learning deep learning will instead take advantage of parallelism with parallelism instead of having one worker one cpu worker to run all these different tasks what we could do is we could allocate say for example let's say three workers so we have now three workers where each worker can focus on a given task and now we can now take advantage of the fact that our code runs asynchronously and takes or makes use of parallelism now so far we've been using this uvcon http server which is an asgi web server implementation for python asgi actually means asynchronous server gateway interface which now serves as a minimal low level server interface for an async framework like fast api now although uvcon is coming with speed it isn't mature enough to be used in the production setting hence we have to or we tend to use gunicon which is a mature fully featured server and process manager for our production settings so that's why uvcon includes a gunicon worker class allowing you to run asgi applications with all of uvcon's performance benefits while also giving you gunicon's fully featured process management so we are now going to go ahead and keep install gunicon gunicon once we have gunicon installed we'll go ahead and run this command which we saw already gunicon service main app and then we specify number of workers let's change number of workers to three for example so let's say we want to have three um workers and then our worker class is uvcon so let's run this and as you can see we have this happening uh occurring tries started several process with a specific um id then we have here started several process with another id another started several process with this other id indicates in our three different workers now we could go ahead here refresh this page um try this out choose file then execute and see there we go so we have our output exactly as we expect so that's it now you could obviously reduce the number of workers if you reduce this um to two oops reduce that to two you would find that we have only this process and this other process so that is it in the section we've just seen how to build our own api and test it out locally in the next section we are going to deploy this on the cloud hello everyone and welcome to this new and exciting session in which we are going to deploy our api to the cloud now in the previous session we have looked at how to build this api using fast api where we could simply pass in an image like this and then obtain the output where we we get the emotion that is happy and we also get the time it takes to produce this output so we could execute that and you see we have our result right here that said the platform we shall be using to deploy our api is ero cool and ero cool is a platform as a service that's a pass that enables developers to build run and operate applications entirely in the cloud so uh first since first you could go ahead and sign up so you click here sign up and if you've already signed up you could go ahead and log in so once we've signed up and log in could get back to the code here we'll stop this let's clear that out and then now what we'll do is we'll do a pip freeze so let's do pip freeze and you see that we have this output which is basically all the different packages we have in our virtual environment so now um let's write out or put out all these um packages right here in requirements.txt file so we'll do pip freeze and then we'll say want to pass send this to our requirements.txt file now note that we are in this directory emotional detection does this so when we do this here you'll find this requirements let's take this one off we did this previously um move to trash okay so you find that now we're left with service and we we have this requirements.txt that said let's open up this requirements you see all the different packages which we installed already now we'll get into open cv we have open cv well that should be python let's search for open cv okay so it's open cv python and now what we'll do is we would modify this and the reason why we're modifying this is because this version of open cv right here is one which you could use even with a desktop application but what we need for the cloud is one which is headless that is one which doesn't contain all the functions it's really functions like those of visualization given that what we're doing with open cv here is essentially maybe resizing and that's that's basically it so we don't need this open cv python we instead going to make use of this open cv python headless so that's it we're going to um take this version 4.2.0.32 okay so with that done we save this uh fail to save well once we save that as a machine the next thing we'll do is we are going to create this proc file so in here we have well new file so call this proc file there we go see recognizes that already and then what we'll do is we're going to paste out essentially what we run each time we want to launch our server so we have web and that's it so here go specify number of walkers to two and that's it so here we have gunicon service main app number of walker set walker class set and that's fine so let's save this too and then we create another file runtime.txt now this runtime.txt is essentially going to take this python version so we'll copy this and we paste that out here so here we have python and this should be fine now also make sure you include this hyphen before this python version with our proc file ready the requirements.txt file ready and the runtime.txt file ready we are going to get into the eroku platform where we'll get right here and click on create new app so we want to create a new app and then we're going to give this app a name so while this loads there we go we're going to call this emotions detection now we could add this to a pipeline or deploy this directly so we want to deploy this directly we have three options using the eroku git using by connecting to github or using the eroku cli so we want to use eroku git and so we're going to follow this simple instructions right here so it's not very long now first since first eroku login so let's get back here we have eroku login now you find that this will work and now but you should note that let's let's stop this you should note that if you don't have eroku initially installed this wouldn't work so the first thing you want to do is do a sudo snap install eroku sudo snap install eroku before doing the eroku login now given i've done that already i'll just do eroku login and then that should be fine so right here we have eroku login pressing the key to open up the browser pressing the key and you should be able to log in very easily there we go you will click on this link right here and if you click on that link you should be able to log in so you could see here we have logged in so that is it we now log in the next thing we want to do is from here what we want to do now is get into our project directory so once we get into our project directory let's zoom this we get to our project directory and then from there we do a git init now if you're not familiar with git you could take a crash cost on git so you get familiar with all these terms so let's get back here and do let's clear this clear and then we do a git init there we go see we in this emotions detection directory now once we've done that you see initialize mtg repository in our directory now the next thing we want to do is have this here let's copy this and then um paste this out here so we have this arrow code git remote and then we specify the project name notice how this project name is automatically generated because this is what we put in let's get back here run this and that should be fine so that's what we have now our command field we told that to well we command field git remote fatal unsafe repository to add an exception for this directory called git config and that's it so what we'll do is now we're going to copy this and then paste out right here so there we go let's now get back and run this and this should work so that is it so set git remote um arrow code to this and that's fine now the next step we want to do is um deploy the application so we have git add git commit and git push so let's go ahead do git add we want to add up all uh files so pick the dot and then we have let's copy this git commit um git commit make it better paste that out well let's say our first first um commit of uh emotions detection app oh well api so this is a string which you could obviously modify now run that and then now oh you see we have all this um created and then now let's go ahead and do the git push arrow code master so here we have git push arrow code master and they should work fine what we get here is this push rejected to emotions detection neural learn so let's scroll up and try to understand why we have this problem so we told you request that um runtime is not available for this stack so open this up and then um let's try to check out on those different python versions available for the stack here you could see we have this um 3.9.15 which is on all supported stacks so let's get back here and then um a level of the runtime let's let's take all this off paste that now save that save that and then we'll do um git add um git commit now what we'll say here is we have um updated runtime dot txt file with uh python version 3.9.15 run that and then now we're ready to do git push arrow code master once more we still get another error right here where we told that um no matching distribution found for open cv python headless so now we're going to take off this version and then we do the git push one more time once they get pushed down you see that we now have this release and here's our link so let's click on this we open up and there we go see have hello world so this is what we have online now and we could go um docs and we check out on our swagger ui so let's reduce this and let's try this out so you will try this out online we've deployed this on the cloud let's execute and see what we get okay so it even takes less time as compared to with uh the pc's taking like uh 0.23 seconds that's about 230 milliseconds let's reduce this so you could see that um okay so that's it so we see that our model is working we're able to obtain our outputs and then now what we could do is we we could actually from here let's copy this here and then try this out on postman so there we go we have this enter the url and then we select post request um well here we have the body here we have form data then our key our key here is going to be image in and then the value given that it's a file we will now select the file there we go we have our file selected click on send and we should get our expected output is here we have happy and time elapsed 230 milliseconds so that's what we expect you could see from here we have this different timing we have the status and we have the time it took for all the different events right here and so that's it for this section in which we've deployed our api to the cloud and the next section we are going to carry out load testing on our deployed api hello everyone and welcome to this new and exciting session in which we are going to be looking at load testing the load testing tool will be using will be lockers and lockers will permit us not only define user behavior with python code but also swarm our system or api which we've built already with millions of simultaneous users so in this course we'll see how to test or better still to load test the api we've built previously both locally and on the cloud at this point you've already tested the model which has been deployed on eroku and everything looks fine but then what if we tell you that we will have to still carry out some load testing now what's the point of load testing the point of load testing here is to see how this your api which you've built will react when you have many users so you wouldn't generally want to create api to be used by yourself only but maybe to be used by tens of thousands and even millions of different simultaneous users so we cannot get access to this millions of people so you wouldn't start calling each and every one of these people here to test your api and what you do is to have an automated system where this millions of users are simulated and this tool we will be using here to simulate this millions of users is lockers which is an open source load testing tool that said we are going to create this locusts.py file so here we have locusts.py there we go and then in this file we are going to write out all the code we'll need to be able to control or simulate all the users who will be using the platform so here we have from locusts we're going to import sequential task set now one thing you should notice the fact that with locusts we using or we defining different tasks and here we are going to define a task whose role will be to simply open up a specific image and then pass this image via post request to our api and we obtain the required outputs so let's define this emotions so emotions task or let's say detector task detector task and this is sequential text set so we have this and then we have this task decorator we have our task decorator right here we're going to define our detection method and here this detection method is simply going to be this method where we define what's actually going to go on when we want to test our application or load test our application so here we have we open we open up a file which we're going to create locally so we have this image.jpg and there we go so we open this file as image and then now we are going to carry out our post request so we have that post request and then you see we're going to specify the path this URL and that's detect actually want to do for detect meaning that you could replace this with test or some other path you want so for now is detect and then let's get back here for now is detect and then now we specify that we're passing a file so here we have this files and we have the name em it's actually em and then here we have image so remember we've opened this up as image so this is image we're passing here now from the API we've designed already its name yeah what we have is em so that's why we're specifying this is em and then once we have this set so this is our detector task as we said the way we build this with lockus is we define different tasks and then once we define these different tasks now we could simply define our load tester class so here we have class load tester and then we have HTTP user so this inherits from class HTTP user and then now we specify the host our host in this case is going to simply be what we've been using already so it's this which we deployed on Iroquois let's copy this get back here and then paste this out so we should we should take off now this detect so we'll take that off and that's it so we're just passing in this as our host and then once we pass this as a host we now specify the task so this is a task in this case our task is the detector task okay so that's it we've we'll specify the task remember you could obviously create many more different tasks but for now we just want to work only with this detector task so that's it we have that set we could save this and then now we could go ahead and launch our locust so we have locust we specify that we have a file and it's locusts.py oh there we go so let's run this um locust command on the phone oh we forgot to install locust so let's do pip install locust there we go that should work fine so install locust and once we install locust we are going to we run this command right here and now that we have that installed what we're going to do is run back this so there we go we run this and we could click on this so let's click there see what we get see here uh we have our lockers running so uh yeah we to start new load test so now the number of users uh initially is one the spawn rate is one we're going to check this on later on and then notice how we have this our url which we had specified so that's our host we should specify it already now let's start the swarming by our locusts um there we go remember this is hosted online so it's going to check that up um looks like none is working well check out this chart statistics well let's get back here let's get back here we get in something like this so we got some errors we could see here no tags define on load tester let's save this and then we run this and see what we get we run that again but we're still getting this error so we had to stop and let's check out why are we getting this error um yeah we have no tags define our load tester but you have set a task attribute on your class maybe you meant to set tasks here um here yeah this should be tasks because this is actually a list remember as we said we define different tasks and then we have our load tester right here so we could have this task here we have some other task oops we could have some other task and so on and so forth so let's say we have two tasks let's say we have tags and then we'll call this tags too then we are going to come right here and then put this in this list but given that we just have a single task let's take this tags two out and then now as we saw here we should have your tasks and not just tasks so let's save that and then run this again and now everything should work fine okay so we have that we click open um starts forming looks like isn't working let's get back here stop this now let's check out the error we're getting okay no such file as this okay so remember we don't have this file in this directory so let's just go ahead and copy that file now we've copied the file you can see that here we have our file here let's go ahead and rerun this and now everything should work fine so let's click on this open up start the swarming and well let's get back we're still getting an error um stop this and check this out here we told that no such file or this well let's let's rename this rename um this should be test image okay so let's run this again sign open up start the swarming and looks like another error um all users want well let's check back here and you can see that we have this working as expected so you can see we have certain number of requests which are sent already zero fills we have the median um latency we have the 90 percentile latency 99 percentile the average the mean and the max so you can see that it's taken about two seconds for every user to receive whatever output the sent in so that's it so we have this mean and we have this max now we could get into the charts and you see that this is increasing slowly and we have zero failures so that's very important we have zero failures with just a single user and that single user is um as we saw here getting an average of now is 1.7 milliseconds to receive the output from the api so that's it now another thing we could do is actually stop this and then carry out a new test where we would have say 10 users so we want to test when we have 10 users at a time now this spawn rate here fixed at the value of two will simply mean that after every second we are going to add up two users so we're going to start with two users and then after one second we go to four after two seconds we go to six and so on and so forth until we attend the maximum row of users which we've said already here so let's start the swarming you can see here we start initially with two we're going to six and then ten and um after some time you're going to have the first requests so there we go we already have two two requests not that we have dealing with 10 users here and the average time it takes for each user to receive the response is 10 seconds meaning that our system will take about 10 seconds to reply to well whatever user whenever we have 10 simultaneous users so let's get into postman remember here we have this time elapsed note that this time elapsed was for the model but what we're getting now is what each and every user located in some place in the world would have as actual um time or latency if we have 10 users now we start seeing that we start getting failures so let's check this out when you run this um let's run arrow cool logs so yeah we let's clear this so you could see that clearly so we run arrow cool logs and then we specify the app emotional detection neural learn and specify tails we could get the last um locks so run that and uh this should run you could see that we have successful calls and we also have this calls which field see the stitches and your connection close without response so let's get back here remember when we had a single user we didn't have all those failures so let's stop this run your tests and reduce this maybe it's just three users and then see what we get well as you can see with just three users we have a few failures that's about seven percent as you can see here and it takes on average of 4.5 seconds to receive output when you send input images and so with this we've just built and load tested our api they showed us clearly that if we want this to go faster or to reduce our latency and reduce the failure percentage which as of now is at 2 percent then we would need to increase our compute capacities so that's it for the section see you in the next section welcome to the section and object detection object detection is one of the most popular computer vision tasks and also a very important one object detection entails correctly classifying objects which are in an image and also saying exactly where these objects are located in the image so if we have this image we see clearly that we have an aeroplane an aeroplane a person a person and then we could say car so this too is some sort of car now an object detector not only classifies these images but also localizes exactly their positions in the image so this aeroplane for example has this bounding box now a bounding box basically surrounds also square a rectangular box which surrounds the object and then for this person we have this bounding box for this this bounding box and this other aeroplane we have this bounding box which is in this other bounding box so unlike a classification problem where when given this kind of input we have as an output or a one-hot vector for example which represents the number of classes we're dealing with so supposing we have five classes then when given an input we have to correctly say whether that input belongs to one of the five classes now with object detection we not only have this but we also have the positions in the image so like this we have this position like this and we have several conventions for these positions one of the most popular conventions is the center convention where we have the x center y center width and then height now what does this mean this means that based on this referential right here so we could define a referential right here where we have this origin now recall we used to having we're used to working with this kind of referential now the referential for image data is considered to be this so our origin stands from here we move in the x direction and then you're in the y direction so this is our reference and then it's based from on this point on this origin that we actually define positions so if we have this airplane let's consider this bigger airplane so we could have this bigger airplane we could define its bounding box by its center and then its width so once we have a center obviously and if we're given its width we can obviously see that it is in this bounding box so this is the first convention the center that's the x center and the y center the x center is basically the distance from this to the center so suppose the center is right here so if our center is here then the distance from here to this is x center and the distance from up to this is y center so basically that's what we have so if we want to link this up like this we could see clearly how we obtain x center the distance from here to here and then y center distance from this origin to this which now gives us the center so that's how we get a center now we are given the width and then the height if we're given the width and the height obviously to to get this points or to get because we have actually four points right here to get all these four points which make up the bounding box we could start from say this point for this point to obtain the x and the y coordinates right here we simply take this x coordinate and subtract from the width divided by two because from this to this is width divided by two on the diagram it doesn't clearly that this points at the center but normally this should be at the center so we could rearrange this bounding box now so as we said to get this we have this x center minus the width divided by two to obtain this and then to get the y we have the y center plus because this is actually the positive direction and then this is the negative direction for the x so for y this is positive this is negative for y this is positive for x and then this is negative for x now to obtain the y as we said we have this y center plus the width of this or rather plus the height of this divided by two that's how we obtain this now to obtain this since we're going from this we're going the x center minus the width divided by two we obtain the x coordinate here to obtain the y coordinate we have the y center minus because this is the negative direction so we have the y center minus the height divided by two and this is very similar to obtain the x here there's a x coordinate right here we have the x center here plus the width divided by two to obtain the same x coordinate right here it's x center plus the width divided by two obviously these two have the same x coordinates but different y coordinates to obtain the y coordinate we have this y center minus the height divided by two to obtain the y coordinate here we have the y center plus the height divided by two either ways once we have this two coordinates that's this point here then this we could always obtain this on this automatically. Now another convention is the X mean Y mean X max Y max convention where we just given this coordinates. If we're given this coordinates and then this coordinates we could obviously obtain all this because when once given this and then given this we could just get the whole box automatically. So here are the two main conventions we use to actually locate an object in the image. This yellow paper published several years ago was one of the first to come up with a single neural network which predicts bounding box and class probabilities directly from full images in one evaluation. Now back then models or to be precise object detection models follow this kind of pipeline where we would have a region proposal generator, feature extractor and then a classification unit. So you could look at this with a simple RCNN model where we had the input image, the extract regions that's regions where the model things we could have images where the model proposes the locations of objects and then from here each proposed location is passed into this feature extractor which obviously extracts features from this warped regions as you could see here for example this region and then we have the classifier which tells us whether this is a person, an airplane or say a TV monitor. Now with the yellow as we're saying we have a single network so let's get here you see we do not have all that different stages in our pipelines here we just have our input image a single neural network and then we have the outputs see that there's also this additional nonmax operation here but we'll look at this shortly. Now the performance of the yellow is quite impressive or was quite impressive in terms of speed is especially as now we can obtain speeds of up to 45 frames per second and with a yellow the smaller version of the yellow we obtain up to 155 frames per second while achieving double the mean average precision of other real time detectors who look at the mean average precision subsequently this is the metric used generally in object detection and so a high mean average precision means the object detection model is performing better. Another advantage of the yellow to take note of is the fact that it reasons globally about an image so unlike the sliding windows and the region proposal based techniques we've looked at the regional the region proposed proposal based techniques like the RCNN here for the sliding windows the way it works is we have let's get this way we have this image right here and then this window you see you can look at this as a window so this window is slided through the whole image all I hear what we just simply pass in the image as an input with a sliding window you would have to take each window and pass into our neural network and you do that while you're sliding through the full image so as we're saying getting back to what we're saying here as compared to the sliding window and the RCNN kind of models the yellow performs better in the sense that it sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as the appearance so you'll find that models like the RCNN or mistake background patches in an image for objects because it can't see the larger context okay another point is the YOLO learns generalizable representation of objects so you see that here the YOLO which was which wasn't trained on this kind of paintings performs quite well you see the YOLO was trained on or this particular version of YOLO was trained on the Pascal VOC dataset but when we test this on this image painting you see that it does well so the YOLO models compared to others learns more generalizable representations of objects let's now go in depth and see how the YOLO algorithm works first is first remember that if you have this kind of model see the YOLO model which you could have here there we go you see this YOLO model has several conf layers and then completes with this connected layer so we have the feature extractor and the classifier unit right here anyways we are not going to get into this now let's just consider that we have this model and then we have inputs like this one see this our input and then we have some output right here now this output obviously is meant to be a bounding box so here we will we could draw this bounding box for this woman here so this shows that or this bounding box is for this this person and then let's take this off at the stroke let's change that color so here we have this here this this person detected and then we also have this other person right here so we'll have something like this see that so we have now these two detections one for this woman and and this other one for this other person and so we're gonna build a model which is this one which takes in these kinds of input output pairs and then learns to get this inputs and predict the outputs correctly such that when given a new input it could tell us where every object is located and what type of object it is precisely now that said the actual outputs wouldn't be this image with this bounding box the actual output will be different from what we are seeing here now the way this outputs are created is using some sort of encoding system where we have this input which is breaking up into some grid cells so right here let's take the pen so right here you see we have let's suppose that we have this 224 by 224 input image that's this image here 24 by 224 this one year and then we break this up into several grid cells so this is a single grid cell another one and so on and so forth now each grid cell here given that we have seven grid cells so we have seven by seven output see that we have one two three four five six seven one two three four five six seven so that's it and now each of this is going to be 32 by 32 so it's like 32 by 32 patches so we take each patch here combine them to form 224 by 224 or better still we take the 224 image and break it up into 32 by 32 grid cells see that now once we have this ready or once we once we have somehow broken up our image into this grid cells we are then going to encode the outputs based on the locations of the center of our bounding boxes right here so you see we have we're gonna redraw this bounding boxes so that you see that clearly let's increase this to say five okay so here we are having this bounding box right here see this bounding box and then it has a center at about this position here see that position if we want to locate this year it falls about this around this year so you see is about this now for this other person we have another bounding box like this see that this other bounding box and the center is about this about around the child's nose so it's around here see that so what we're gonna do now is we're gonna have each and every one of this let's change this color we're gonna have each and every one of this sorry we're gonna have every one of this here having certain values now in the case where the cell like this one let's this make this bit more transparent the case where cell like this one let's do trainee so you could see that better so here we're gonna have as we're saying in the case where a cell like this one which doesn't contain an object then we'll say okay its first value will be a zero see that this first value will be zero skip back and here our first value is zero so for this cell the cellular office value is zero now for this other cell here where the there is an object there see this value will be a one see that so each and every cellular each and every cellular has or takes certain values based on whether there is an object or not now as you can see this cell you have a zero the zero in fact all the zero and accept this tool which we put in red which will take values of one and this is simply because it happens that the centers of the bounding boxes fall in those cells so that's the first step now once we have this first step the next thing we want to do is we want to locate the exact position of our images so the first thing is we want to know whether there's first of all an image that's by encoding it like this the next thing we want to know is the exact position now this exact position obviously depends on the kind of the way we want to present our bounding boxes now we could represent our bounding boxes by specifying X mean X mean Y mean and X max Y max with this kind of representation an object like this baby right here or let's say person right here will have this values or will make use of these values to locate this person so we make use of this point here which is X mean Y mean and this other point here which is X max Y max with respect to the origin which is at the top left corner so this our origin right here you see that so we go X steps and Y steps downward then here X steps to the right Y steps downward to locate this person now once we once we've done this we could we could just put this out here so we could say okay we are creating our outputs remember our aim here is to create our outputs so we are creating our outputs we know for every cell or for every grid cell where objects are located that's it then now to get the bounding boxes we could make use of this but what's important to note here is the notation used by the authors of the yellow v1 paper was instead X center Y center then the width and the height see that of the bounding box obviously none of the image so here instead of having making use of X mean Y mean X max Y max we make use of X center which is the center Y center so we'll go X Y and then we look for the width of this box the width and also the height of the box so let's get back so that's that basically how we do that and then there's another special encoding which is done and what they actually do is for the width and the height of the bounding box they're gonna divide this by big W where big W is the over is a width of the whole image so if our image is 224 by 224 will take this width let's suppose that this width is say 160 so we'll take 160 divided by 224 and get our width and then for the height we'll do the same thing so you know if we have the height of say 200 will take 200 by 224 and we get that so it's h divided by the height of the whole image see that now once we have that we here for this XC YC we're gonna do something similar but not divided by the whole width and also not divided by the height of the image what we are gonna do here is we are gonna have this XC with respect to its specific width cell now let's explain let's pull this way so you could see that clearer so we're supposing that we have this here and we've already seen how to get for the width and the height we take that divided by the total width and the height divided by the total height total width and total height now example is 224 now for the XC and YC here for example we have this our XC YC is for this other objects our XC YC let's consider the example of let's consider this example here so what we're saying is we are not gonna take this with respect to the full image instead we're gonna say oh we're gonna take this grid cell and suppose that this distance is 1 and we take this and suppose that this distance towards 1 now our origin here is this see this points our origin now if we're for this cell this will be our origin and this will be our distance 1 and here our distance 1 now if this are distance 1 so distance 1 then this point here let's say this point here will be a fraction of 1 basically so be a value between 0 and 1 now if we take this distance we could approximate this to be about 0.5 6 and then this distance is about 0.7 so in this case XC will be this distance and YC will be this distance in that case we'll have 0.5 6 and 0.7 so that's it now once we have that the next thing we'll do is we'll just simply put that out here so we would have that XC with respect to the grid cell right at G and then we'll have YC G that's it so here we now know how to obtain those values let's change that color back so here we would have say 0.7 0.5 6 0.7 so that's how we have this now once we once we get this remember we already have one year so the next one we have is XC G for the next part if we have a data set where we have say 20 classes like the Pascal VOC data set or with a co-code data set where we have 80 classes what we would have from here see we find whether the object is there or not if the object is there we want to get this location now from the location want to know what object exactly is found there so here we suppose now we have 20 classes see now initially every one year is zero so we have 20 we just gonna align 20 zeros so basically we have this this 20 zeros here now if the person if in the you know in our classes we decide that the person occupies the city third position in a list of classes so we could we could have a class or let's just let's just check out the list of Pascal VOC classes whereas consider we have this 20 different classes and there if we count this we have 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 notice how this person set a 15th position 15 16 17 18 19 20 ok so we have the person at that 15th position and so this means that when encoding our image here or when encoding our output would have to take that into consideration will simply do 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 so here at this position will change this and put a 1 instead of the zero which all the other classes get so we take this off and then we have a 1 right here see that so we have 15 16 17 18 19 20 now if you have another data set with fewer classes say eight classes this will be of length eight now we've seen how for each and every one of these grid cells which actually several 49 of them because we have seven by seven we have this encodence that's so what our model sees or what our model gets as output isn't this but instead this kind of output so now we are going to prepare our data set such that we have an input image which is this and then an output level which is going to be this level right here and then the model is going to struggle to be able to correctly produce these kinds of outputs and so as we've just discussed our output will be of shape this output here will be of shape 7 by 7 by 20 see number of classes 20 plus 5 or could you say 5 plus 20 which is in fact 7 by 7 by 25 so this is what our level will look like so when we get our input and our output I won't get an input and the bounding boxes with the specific classes we're gonna create this kind of outputs from the labels we get depending on our data set nonetheless when we zoom into this model right here we notice that we taken some input and then at a level of the output we have 7 by 7 by 30 instead of 7 by 7 by 25 now the reason why we have this is simple we have the first position let's reduce this we have the first position which tells us whether there is an object or not so that's one and then the next position gives us the location with the X C Y C W and H so here we have the position here we have the position so we have X Y W H now the next ones give us the classes or tells us what class our object belongs to so you would have a series of zeros some point we will have a one and then we have your zeros but the length of this year is 20 now 20 plus 5 is 25 obviously which is less than 30 now to understand why this is actually 30 oops to understand why this is actually 30 what we'll do is let's take this off what we'll do is we'll suppose that we have for every cell two boxes responsible for locating the object so instead of having this year only this particular box so we suppose that this is a box let's take the pen we suppose not this year this year is a box this one is a box instead of having only this box we're gonna make two boxes responsible for locating the object remember this is this permits us to locate the object because first of all it tells us whether the object is found in the cell or not and here it tells us it gives us the exact coordinates of the objects this one is for the classes separate so the next thing we'll do is we'll take this and multiply by two so let's shift this again we have this and then we paste that out see now if we have the number of boxes to be three though this will multiply it twice then the in the yellow view on paper they use they use B to be they considered B to be equal to you could check that out your B equal to so that said we see we now have two boxes responsible for locating the object in that image so when designing the levels that we could design with just one because we know that this does the correct answer but what the model will predict will be two values you see we'll predict this to repeat this and predict this now the reason why we're doing this is because we want that one of these boxes or any one of these boxes should be more specialized depending on the size of the object now as you've seen here we're dealing with relatively large objects with respect to the to the image actually but what if we have an image of a person where the person is say just a very say we have a very very small person to image ratio like this where we will have a bounding box which will be small compared to the whole image in that case what we want is for the model or for these boxes to be specialized such that maybe this first box will will learn to detect the smaller objects while this one learns to detect the larger objects so that's it that's how we construct this output so we understand how to construct the levels and then how to construct the model outputs remember we have to update this weight such that the difference between the levels and the model output let's say all is minimized getting back to the paper we see the exact structure of the yellow model so right here we start with some conf layer conf layer max pool layer conf layer max pool we have several conf layers then a max pool then we have this other conf layers this year times for the max pool conf layers then conf layers then finally we have the fully connected layers for classification so this is for feature extraction this year feature extraction and then this is for classification now for the training what they do is the pre-trained this model on image net with 1000 classes but note that this pre-training is for the problem of classification so it's a usual classification problem and then the pre-trained this model for over a week year and achieve a top 5% accuracy of 88% and then from this model they add four convolutional layers and two fully connected layers we randomly initialized weights so while going going doing the training for the object detection we have weights from the previous training that's from the pre-trained weights from the image net and then the add some the add the four the actually spoke of four so they should be this and this one this one this last two year so they add up four conf layers and two connected fully connected layers which have been randomly initialized as a set year now following the example okay does it detection often requires fine-grained visual information so we increase the input resolution of the network from 2 to 2 24 by 2 24 to 4 48 by 4 48 so what they did was they trained on 224 by 224 images and then at detection time we used 4 48 by 4 48 images now they also use a linear activation function for the final layer and all layers use the following leaky relu so they use the relu for the final and then the leaky relu for the all other layers as the activation function now as a recall for that we have our relu which is simply this what the relu does is all values all values it takes which are less less than 0 are sent to 0 and all values greater than 0 are maintained so if you if you pass in a value like 3 the other to the relu you get back 3 but if you pass say negative 0.5 negative 0.5 what you get will be 0 because all negative values are sent to 0 and all positive values are maintained this guy is you remain X if X is greater than or equal 0 and you go to 0 if X is less than 0 now for the final layer or rather for all the other this is for the final layer this is what I use for the final layer now for all the other layers they use the leaky relu for the leaky relu what we have is not this year not the straight horizontal line but inside something like this see something a bit slanted and then here is still maintain the positive still maintain so we still have it's still X it still remains X for X greater than or equal 0 but for this one it goes to 0.1 X for X less than 0 so for all negative values would we would have 0.1 X so it means our gradient here this gradient is going to be 0.1 so the bottom border too much if you don't understand the national gradient anyway we have that we have this it means that if we if we send a value like say negative 0.5 now what we will get will be negative 0.5 times 0.1 and not more 0 so that is the difference and this is activation function used everywhere in the model except for the last layer so that's what is the that was defined right here now the the loss function that we hear the uses the sum square error so that's the simple loss function they use and that's why at the beginning the speak of we frame object detection as a regression problem so that's it so it's like a simple regression problem actually that is the sum square of the difference between the model output and the expected output which is the levels now that we understand globally how the models build and the training process let's get to look in-depth into this loss function so right here we have this loss function and then we're supposing that we have this levels here so here are levels and then here's what the model predicts remember we have 7 by 7 7 by 7 cells with cells by 30 for the models predictions whereas the levels are 7 by 7 by 25 see that so we have this first 5 for the location this location this location we have to add this one year oops let's change the color back to red so you see that Clara so we have this location here there we go we have this location here that's it we have this 5 this 5 and we have the classes here we also have the classes and then we have this 5 right here now the way we obtain the loss is we break it up into several parts we'll look at this part first we'll start with this part now this part you see just basically add in the path of the first part the next part this part this part and this part for this first part here this it punishes the model when it makes errors with respect to whether there is an object in a particular grid cell or not so if we have for this grid cell one as a level we expect in the model to predict a one year or end a one right here so this means that what will happen here is we are gonna go through you see this sum from i equals 0 to s square s square here in our case s is 7 so s square is 7 square which is 49 so we go through each and every grid cell here which is logical we go through each and every one of this so we go through each and every one of this 49 different grid cells and then we'll calculate the difference see this you have c i c i shuffle that's c i minus c i shuffle square so we have this here minus this square and then we add all those up now also notice that there's a double sum in your the reason why we have this double sum is because we actually gonna take this minus this plus this minus this see that take this minus this plus this minus this if we had say five or three boxes then we'll have three of this we'll have one two and add another one before the classes so in that case we will go three times so hopefully that's clear but what thing you notice is this this notation here one one of OBJ I OBJ is actually object one of object I so you notice this notation right here now or this notation let's get back and just circle that out here so it's clear so this is this is it right here you'll notice this and what to say in the paper is this one OBJ I notation denotes if an object appears in cell I and this one OBJ IJ because this is actually IJ not I IJ denotes that the jth bounding box that's here we have two bounding boxes so either this bounding box or this bounding box predict predict or in cell I so I is the cell clearly see that I goes to 49 and then J goes to 2 if we if we have this then it should be I equal 1 to 49 and this should be J equal 1 2 2 this means that there's a slight error notation right here anyways we understand that we're going from 1 to 49 and then we're going from 1 to 2 because we're basically going through each cell here and we're also going through each and every one of these boxes now getting back to our one IJ or one OBJ IJ notation we're saying that oh we've already seen that this denotes that the jth bounding box predictor in cell I in the given cell is responsible for that prediction you see that now what does this mean it means that if a particular like here if a particular box if this box is not responsible for the prediction then we are not going to include it when computing this error now how do we know whether this box or this box is responsible for the prediction the way we get this is simple let's pull this to the right or let's reduce that so we could get more space so what happens is let's suppose we have this image and then we have one object here and we have another object here then we have some bounding boxes so we have this bounding box and we have this other bounding box now for this for this year we have a particular cell let's suppose our we're breaking this up and then we have a given grid cell like this one which is responsible for predicts in this object now this the first box you see this first box will predict maybe this bounding box and then the next box will predict maybe this bounding box now what we say we predict this one this will predict this bounding box and the other predict this is actually because they have different values for X C Y C W H see this quadruplet here is different from this other quadruplet and because they are different it means that obviously the bounding box that you get will be different and because those these two bounding boxes you get will be different it means that you you can now compare which of these two is closest to the actual bounding box which is this so this is the actual and this is what the model predicts so we comparing these two they're competing for which of them is closest to the actual so let's suppose that the actual is is something like this suppose that actually something like this here so we have something like this okay so in that case it's clear that this one let's look for a neutral color now it's clear that this because we remember we are having this one this box competing with this black box competing with this black box so this is B 1 and B 2 competing but the blue box is the actual one B we just call that B so we're gonna compare the difference between B 1 and B and the difference between B 2 and B now the one which is which resembles B the most that's the one which has the least the smaller difference will be the one responsible as I said here for that prediction you see that so in our case here is clear that B 1 is responsible so for this particular case because for a different grid cell you may have B 2 responsible for whatever grid cell you may have again B 2 or B 1 it just depends on on what on this difference between the the the bounding box by that box specific box and the actual bounding box now that said another question you may ask yourself is how do we compare this bounding boxes now the way we compare this bounding boxes is by using the IOU score so if we have two bounding boxes like this we have this two bounding boxes and then we have let's see this one let's say we have this other box here and then we also have this one something like this if we had to compare the how close this one this pair this pair of boxes is compared to how close other pairs you see clearly that this pair is closer or simply put get more closer to each other as compared to this other pair right here now the way we look at this is we compute the area between the two boxes so that's it you do for this area of the intersection so this is their intersection and then so we have here we call this IOU IOU actually let's just put it right here IOU actually stands for intersection over union intersection divided which is equal the intersection we'll call it intersection divided by the union so if you take for example this intersection here and divide by the area this area plus this area then that's actually including the this intersection is basically this is this is the intersection and then let's change this so you can see Clara and this is the union see this this is our union that's it so that's our union and this is our intersection so we take this area divided by all this area here and we get the IOU score we're gonna repeat the same process for this one where this is our union and you could see clearly that this one will have a higher IOU compared to this one and so this is how we compute or we know which of the boxes is responsible for that prediction now getting back here let's get back to our loss function as we're saying we're gonna have that if this box for example if it happens that this box be one year this box here is responsible for the prediction then would have this difference times one see times one now if this box is not responsible then we have times zero so this this is not going to be considered when we computing this loss see that that's it so it's true we summing through the two boxes but actually we're gonna take only or consider only one when calculating this difference for this one we're not gonna we're gonna omit it now we move on to the next this other one year computes or permits us to have grid cells or permits us to correctly predict when there is no object notice here this is no object here is object so what we have here is for the cells where we have an object like you see well where the level we have the cell and the cell we're gonna use this year where there is no object we're gonna use instead this one year and here basically when there is no object we're just gonna take the the output or the value we have here minus the value we have here then plus the value we have here minus the value we have here now the next I also know that we have this lambda no object now in the paper the talk a little bit more about this here we have to remedy this first of all let's understand this here to say they use the sum square error because it is easy to optimize however it does not perfectly align with our goal of maximizing the average position it weighs localization error equally with classification error which may not be ideal also in every image many grid cells do not contain any object so this pushes the confidence cards of those cells towards zero often overpowering the gradients from cells that do not contain objects now this can lead to model instability concentrating to diverge early on so to remedy this they increase the loss from the bound and box coordinate predictions and decrease the loss from the confidence predictions for boxes that don't contain objects we use a parameter we used two parameters lambda coordinates this photo the positioning and lambda no objects for when we have no objects to accomplish this so we set lambda coordinates for five and lambda no objects to 0.5 so as we're saying this lambda no object here 0.5 and lambda coordinate is five as they've given us right here now all can deduce from this from this looking at this formulas here is that the model will be punished more severely if it has if a particular grid cell was meant to predict an object and it didn't predict that as compared to when it doesn't have an object and it didn't predict that correctly so for the object we have more punishment as compared to this one because this lambda no object is 0.5 now for the coordinates is it receives highest punishment year because the as we as we have lambda coordinate equal five for the classes it's still equal one so here we have one zero point five one and five five now getting back here we have this one for the classes basically what we have here is we have this condition so this is one object or OBJ of I this is the one was OB or one OBJ IJ so notice that this is now I now here basically what we having is we calculating this difference only when that grid cell has an object so if like we in the level year if we have a one like for this two grid cells then we'll go ahead and compute this difference but in the case where we have no object like in the cell in the cell the cell in the cell or all other cells have this too then we wouldn't get into this so we'll just keep that not also we have this oh sorry we have we actually here wouldn't get into this so here we have just one now this this what we do here is similar to what we're gonna do for the coordinates with the coordinates we have the same process where if there is no object wouldn't go ahead to compute this difference so like here if this is zero then we'll skip this now if it is one then we'll go ahead and compute the difference between this and this and that's what we have here now this is the X minus X of bar square plus the X Y minus Y bar square is basically this X minus this square plus this minus this square and then here and also notice that you know this this two squared and then it's added up and then here we have the square root of the width minus the square root of the order width or the predicted width all of that square plus the square root of the height minus the square root of the predicted height all of that square and then they add this up and multiply by lambda coordinates but it should be noted that we are only going to compute this if we happen to fall in a box which is responsible for that prediction so if this box is responsible for the prediction it means that we are not going to make we're not going to compute this for this box right here we're not going to compare this and this we're going to do this and this because this box is responsible and we've seen already what it means by a box being responsible for the prediction getting back to the paper let's get it back up here now what you have also is that the sum square error or equally waste errors in large boxes and small boxes so the error metric should reflect that small deviations in large boxes matter less than in small boxes to partially address this we predict the square root of the bounding box width and height instead of the width and height directly so what you're saying is if we have this couple here these two boxes and then we have this deviation or we have the difference which we're trying to compute for the loss and then we have also this smaller boxes here with this similar difference so let's let's try to have something similar to that we have something like this hope it's similar enough so we have something like this with the initial method we will have W minus W bar or Chapeau or T over the square plus the height minus the height bar square where this is W here is what the level expects and the W by is what the model predicts so the same for the H and the H bar now what the one is that if we have this difference here or better so let's say that what is going on with this year is that if we have this year these two boxes this difference because the equal difference you see this difference is the same if we have this year then let's say the differences is five let's suppose that this difference is five year difference five that's the width and the height difference is five then we'll have five square plus five squared as 50 now for this small box here this is five and then here is five then we also have 50 but this is not what we want the reason why we don't want this is because this kind of difference for smaller boxes is more important than this difference for this bigger boxes so it's just like you you suppose in that you have say say you have a loaf of bread like this suppose we have a loaf of bread and then you have this part here which you cut the control now compared to a case where you have this smaller loaf of bread you will find that cutting off this part here is like cutting off practically a third of my loaf whereas oh for this case this was like cutting out say one tenth of my loaf so so clearly from here the loss is less felt as compared to this other one and so to as I say in the paper to remedy the situation to add the square root here now let's add the square root and see this difference now the square root of let's say let's say this was 30 and this was 25 so we had five and then there was 30 and 25 the square root of 30 now is let's say this is 30 width 30 and for this other one so this one has width height 30 and so that one has width height 25 so this gave us the difference which was 5 and that's how we had this difference here now when we when we take now the square root of 30 let's take the square root of 30 the square root of 30 that's 5.47 okay so 5.47 now minus the square root of 25 will be 0.47 see 0.47 which when you square let's compute that directly which when you are gonna square it will give you approximately 0.22 see so you have 0.22 plus 0.22 now that will give us 0.44 or 0.44 for this two bigger boxes now for the smaller boxes let's suppose that this one was say because we want to have a difference of 5 we could say 10 and 5 so here we had 10 by 10 and here we have 5 by 5 in that case we will have square root of 10 minus square root of 5 and now when you compute that you would have 0.84 so what I mean here is when you take the square root of 10 minus square root of 5 and square it gives you 0.84 0.84 plus 0.84 we give you about 1.6 let's say 1.6 anyway that's already much greater than 0.44 it means that the model is penalized more now for making this error as compared to making this error so that now solve the problem of where the the model would have penalized them the other model would have been penalized in the same way for this same difference even when the size the size difference between these two boxes is quite considerable now that said from here this the train for over 135 epochs on training and validation data sets from Pascal VOC in 2007 and 2012 we test with testing on 2012 we also include the VOC 2007 testing for training test data for training then try training we use a batch height of 64 momentum of 0.9 and decay that's with decay of 0.000 5 then you ratio was as follows for the first epochs will slowly raise the learning rate from 10 to the negative 3 to 10 to the negative 2 so started with a relatively lower learning rate slowly increasing it because if we start at a high learning rate the model often diverges due to unstable gradients so we continue training with 10 to the negative 2 that's after going from this was to slowly increase to 10 to the negative 2 and then to continue to training with this 10 to the negative 2 for 75 epochs then 10 to the negative 3 for 30 epochs so after this 75 epochs to drop to 10 to the negative 3 and then finally drop against 10 to the negative 4 for 30 epochs that makes now 75 plus 30 that's 105 plus 30 135 epochs okay so to avoid over fitting we use drop out and extensive data augmentation drop out layer with rate 0.5 after the first connected layer prevents caught up adaptation between the layers for data augmentation they introduce random scaling and translations of up to 20% of the original image size we also randomly adjust the exposure and saturation of the image up by up to a factor of 1.5 in the HSV color space now given that after the detection has been made let's get back to the top that's that's after or let's say after the model has been trained we would get detections like this increase this would get detections like this so we might have many more detections than expected so we're gonna apply the non max suppression algorithm to remove those cells or rather to remove those bounding boxes which are repeated around a certain region and focus only on bounding boxes which have the highest probability scores so like this one you see the thickness here signifies the probability score the probability of an object being that location see that so that's why we left with this after the non max suppression now the way this non max suppression algorithm works is as such we have after the person after the model has been trained we pass this input image and we may get predictions like this now let's let's suppose that the this one this this two whites have the highest probabilities and then we also have some other predictions we have maybe see this other prediction here oh that's it we have see another prediction around here something like this now what we're gonna do is we are gonna consider that for a particular object let's say this this bounding box right here for a particular bounding box we look at its probability and compare with that of the bounding box around it and obviously to know what a bounding box surrounds or is very close to this bounding box we look at the IOU so if we fix the IOU to a threshold of 0.5 it means that if we're taking this box for example considering this box then any box with an IOU that's any box is close enough to this here sorry that is IOU is greater than 0.5 meaning that they are very close then we are gonna remove that bounding box so it means I was gonna take this off because they are already very close so you see that now you could you could play around with this value meaning that you could take say 0.2 or even 0.7 depending on the data set you're working with now so what we're saying is because these two are very close to each other we take that off now obviously they must be present the same object now if we have another box like this one if we had another box like this one see suppose we had another box like this one then this box will not be taken off because the IOU is less than 0.5 so when the IOU is greater than 0.5 we know that they are very close to each other we compared your probabilities the one with the highest probability is gonna win the other one is gonna be taken off and the term known max suppression so if you are not in you're not a max we're gonna suppress you so we suppress all of that and you see this one is left now for this year we're gonna say okay this one has the highest probability we're not we're not gonna compare this with this because obviously the the IOU will be less than 0.5 now we're gonna take this one and this one we're gonna compare this to the IOU is gonna be greater than 0.5 and so here we're gonna remove this other one so this one here will be taken off you see this one is still left so that will be it will be left now with this three predictions anyway generally when training our YOLO model we aim to even be able to avoid the known max suppression as a whole and other YOLO variants have been developed to try to reduce that dependence on the known max suppression other YOLO variants are like the YOLO 9000, YOLO V2, YOLO V3, YOLO V4 you also have YOLO V5, YOLO X, YOLO R which perform even better than this YOLO V1 we're discussing right here so here you have some tables which compare to other methods see you could always look review this here fast R CNN you have the YOLO see that it performs better than the than the YOLO but this one is faster than the fast R CNN we also have this comparison table for different objects see for different objects we see the the precision for this different objects and we compare with this different methods here you have the YOLO then you could get down here see the recall see that we've already had a tutorial on the precision and recall so you should be able to understand this now if you're new to that you could check out our previous videos here this is some quantitative results in the VOC 2007 because when people aren't data sets so that performs best on the on the this VOC 2007 it performs best on the Picasso it runs best on the people art data set let's get back here oh okay this is fast R CNN while you're there comparing with the with a with a simple R CNN okay so that's it we see that with a Picasso and the people art it performs in outperforms other methods like our CNN mean that it's as marginalization capacities as compared to other methods techniques here we have some limitations of the YOLO YOLO imposes strong spatial constraints and bounding boxes so since each grid cell only predicts two boxes and can only have one class so this means that let's get back here this means that if we have if we had a person who was say starting just behind rambien sort of person here it would have been difficult to predict this person and this other person and also in the case where we have images where the objects are quite small so we have some very small objects and act up like this this YOLO more algorithm or this YOLO model will find it difficult in detecting each and every one of them getting back to the paper our model struggles with small objects that appear in groups such as flux off of birds since our model learns to predict bounding boxes from data it struggles to generalize to objects in the new on usual aspect ratio or configuration so this has been trained on the Pascal VOC dataset where the objects have certain aspect ratios I'm a turn it out you have a different data set where the aspect ratio is different by aspect ratio with simply meaning the the width to height ratio so it's this ratio right here this this aspect ratio can be say two by five if we taking height by width then you're gonna be like say three by two so what what goes on here is you have trained this on Pascal VOC dataset where you have a specific as or a general kind of aspect ratio or aspect ratios but when this is taken to different images where the aspect ratios aren't similar to that of the Pascal VOC then past the YOLO model finds it difficult or struggles to generalize in such situations finally we went wild train while we trained on a loss function that opposomates detection performance our loss function trees arrows the same small bounding boxes versus large bounding boxes and they say that the main source of error is incorrect localizations that said we're done with this review of the YOLO paper in the next section we are going to build this YOLO from scratch hello everyone and welcome to this new and exciting session in which we shall focus on preparing our Pascal VOC dataset using the TensorFlow dataset pipeline so here on Kego we have this Pascal VOC dataset which is made available by Wang Hang China and it's made of this five different directories that is annotations image sets JPEG images segmentation class and segmentation object nonetheless we shall be making use of the JPEG images and annotations for our object detection problem and now get into the code we are going to start by installing Kaggle we are going to copy this Kaggle.json file into this directory which we just created now note that this Kaggle.json file as we've seen already is gotten from our Kaggle account so you get this from your Kaggle account and you copy out here and then now after this copy you change the access mode of the file and then start with the dataset downloading now to download this or to have that command you just simply scroll like this get here copy API command and you paste it out here so what we have here is simply what we've copied so that's it that's how we download this dataset so we run that and we download the dataset now once this we download it we'll go ahead to unzip the content of that dataset into our dataset directory right now we'll have our dataset there we go we have our dataset we pick out this one here and you see we have the annotations image sets JPEGs segmentation class and segmentation object open up this and this okay so that's it now what we'll do is we are going to define some variables here we have our train images which is simply this path to this JPEG images we have our train maps which is simply the path to the annotations so that's it and then we have our classes so Pascal VOC dataset takes in 20 classes from airplane right up to TV monitor then we have B set to 2 now to understand the significance of this B remember from the paper that we had seen that the image is divided into an S by S grid and for each grid cell and each grid cell predicts B bound in boxes now the other define S to be 7 and then B to be 2 so that's the this number of bounding boxes which we are considering to be B and that's exactly what we have here so that said we have number of classes which is simply from year 20 we have the image height and width which is considered to be 224 we have the split size S which is 224 divided by 32 that's equals 7 we've just seen that this is actually S in the paper we have number of epochs 100 learning rate defined although we have some sort of learning rate scheduling so we could take this off and here we could say 135 then we have a batch size of 32 so that's it we define all this and then we move to the pre-processing of annotations now given that this annotations are essentially this XML files we have here what we are going to be using is this element tree from this XML package which will use to parse this XML data right here so diving into the code you could see here we have the file name which is passed into this parse method from which we obtain a tree then from the tree we could get its root once we obtain the roots we cannot get the tree size see we have root that fine and we specify size and then once we have this tree size we could get the height of a specific image and its width here we have width width and height that is how we obtain this tool from the size tag so we could also obtain the depth from here let's copy this and paste out here and then let's say we want to get the depth here we specify depth and then you see we obtain a text so let's run that and then get the depth we run that and then let's have pre-process XML and then the file name its train maps actually it's actually a fire path train maps plus we have this file 207 207 oh oh oh 33 dot XML okay so it's actually this exact file here see this if you look up here or if you look at this here file name exact same file name so let's run this and then see what we get there we go we should have okay so we have 366 for the height see here we have 500 for the width and then we have three for the depth we've converted all this into floats okay so now we've done with obtaining the images width and height which are all in this size tag let's now move on to obtaining the different bounding boxes of the different objects so you see here we had root dot find size that's because we have a single size for that image now here you see we have rooted find all objects and that's because we will have many or we could have many objects for a single image here again single size but you're great no or a possibility of having more than one object so that's why you always specify find all so we are going to find all objects meaning that we are going to get into each and every object tag we have here you see this is an object this is another object if we scroll down we'll see we have another object so essentially in this image we have one two three objects okay so we have this three different objects and now for each and every bounding box in this objects like this is our object tree here if we pick out a specific object so here we pick out this object for example we have which we go through each and every bounding box in this object and we take its mean that's x mean y mean x max and y max so that's exactly what we do here you see we have bounding box that find now x mean and then we convert that to text we have y mean text x max and y max and at the end of this we now convert this into a float so that's how we obtain x mean y mean x max and y max now you'll notice that we have a break here and the reason why we want to have this is because for a particular object we just need a single bounding box so if we have other bounding boxes we are not going to take those into consideration okay so that said now what we'll do is we're gonna print out x mean y mean x max y max and yeah x max y max okay so let's print this out for each and every object now let's run this and see what we get as we expect you see we have 9 107 499 263 that is that what we have here we have 421 200 482 226 and then finally we have this so yeah what three different objects now what if we try out another different image so let's change this to 32 we run that and see what we get yes it's on XML 5 series 32 instead of 33 you see now we have actually four different objects and so we have this four different bounding boxes which you could see here we have object object object and object now if you consider this image here you see we put this cursor on this point we have could read from here we have 24 188 so matches are with this year see this 26 189 and then here we have 46 240 matches are with 44 238 now in order for us to make it easier when working with a yellow encodings what we are gonna do is we are gonna get the center of this bounding box so the center should be around this point here now that center is about 35 215 so here we have 30 oops let's get a pan so here is about 35 215 and then now we've gotten the center we could also get the width the width is about 18 and then the height the height is about 238 minus 189 that's 49 so we we now have the center which is this we have the width and then we have the height and then what we'll do is we'll divide all this by the total width and total height of the image so for 35 would take 35 divided by the total width the total width of this is 500 so we have 35 divided by 500 and then we'll have 215 divided by well this 215 is divided by the height that is 281 because this this year is x-coordinate and then this is our y-coordinate so this is respect to the width and then this respect to the height so this is divided by 281 okay so we take this divided by 500 and then this divided by 281 then we have the width which is 18 divided by 500 and then we have the height 49 divided by 281 and so instead of having X min Y min X max Y max we have the center which is divided by or which is normalized and then we have the width and the height which are also normalized now putting this in form of code once we're done with getting a specific bounding box we could go ahead and obtain the class name so you see we have as usual make use of our object tree and then we find the name see here here we have name and here is airplane here's a plane here's person here is person we're not gonna be interested in the pose or what else truncated or not or what else difficult or not just interested in the name and this bounding box just as we have seen already that said we have our class name from this class name we could create this class dictionary which will use to convert the different class names into a specific integer so what this simply means is we're gonna convert airplane to zero bicycle to one bird to two both to three and so on and so forth so this is 0 1 2 3 4 5 6 7 8 9 10 this should be 10 yeah 11 12 13 14 okay so we have airplane which is 0 and person which is 14 okay let's take note of that now getting back here you see we're gonna we make use of this dictionary where we simply have a class and that's converted into an integer so that's quite straightforward and then now we have our bounding box which is essentially x min plus x max divided by 2 that is we get the center this is the center and then we divide by the width so that's it we have x min plus x max divided by 2 times width is essentially the center divided by the width and then we have the y center that's y min plus y max divided by 2 that's the center then divided by the height and then for the width we want x max minus x min because to obtain the width to obtain this width you simply take this minus this to obtain the height we take this minus this and that's it so that's how we obtain the width and the height so here we have x max minus x min then divided by the width and then we have x y max minus y min divided by the height and then we have our class which is gonna be an integer instead of say person or airplane then once we are done with this bounding box we now store this in this bounding boxes list so let's create your bounding boxes list there we go we have the bounding boxes list and then now we will return bounding boxes so that's it let's return that and then there we go so let's run this and then see what we get now first thing you can notice is that we have our four bounding boxes now take note of the fact that we have the classes 0 0 and 14 meaning that we have airplane and person which matches exactly what we expect and then when we get back to this image you see 35 divided by 500 and 215 divided by 281 should have 35 divided by 500 and then 215 divided by 281 okay so you see we have 0.07 and 0.76 okay so does it make sense and then for the width for the width we had 18 divided by 500 and 49 divided by 281 so here we have 18 divided by 500 and then 200 divided by 281 0.04 or 0.036 and 0.17 0.71 this should be 71 oh let's get back here it's actually 49 divided by 281 and not 200 so this will be 49 because the height is 49 so we divide that and then see we should have 0.17 okay so that makes sense so that is it we have encoded bounding boxes and now we are ready to produce our outputs based on what was described in the paper so in the paper we saw that our output will be this 7 by 7 by 30 tensor where each and every cell we have here does each and every one of this 49 different cells because we have 7 times 7.49 will take values depending on whether they have an object or not now for a cell like this one that's actually matching up with this one where there is no object will take values like would have a value of 0 for the objectness meaning that there is no object and then for the positioning we will have this year that's four zeros and then for the class because there is no object we will have all zeros now we have 20 classes so we'll go from 0 or we'll have 20 of this zeros and there we go so you see we have 20 of this zeros now we're gonna have the same for each and every grid cell where we do not have an object now you should be noted that a grid cell like this one for example this one here let's change your color this grid cell here stick this well let's get back this grid cell here has actually no object because we consider a grid cell to have an object if the center of that object is in the grid cell now although we have the wing of the plane in this cell given that the center of this plane isn't in this cell we do not consider that this cell has an object so here would have exact same values we have here so there is no object like this is 0 1 2 3 this is 0 1 so we'll go 0 1 2 3 0 1 so here would have this exact same values all zeros and with this with this with this this and all this other cells now what we left with will be this cell which contains the center of this object this cell which contains the center of this object this cell which contains the center and this cell so we have four cells which contain the center or the centers of this four different objects while the other cells contain no object now we shall focus only on this one here so you understand how this outputs are generated based on the bounding boxes and this length this length here is essentially the number of bounding boxes so it's got it could be gotten from this bounding boxes so here we have bounding boxes which is in this case here for this object is actually what we have here that is we've normalize this value so that we have the center x center normalized y center normalized we have the width normalized the height normalized and we have this class now as we saw in the paper this isn't exactly what we want so what we want is a value which tells us the position of the object we respect to that specific grid cell so if we take off this year let's take this off and then we specify the center the center here is around this position here your so center will find that based now on the paper this position here has to be encoded such that we have the this value based off this origin so it's based off this origin because we have a grid cell here which is this one so we have this grid cell it's actually the same grid cell we have here so it's based off this and not based off this origin of the whole image remember the image has its origin and this grid cell has its own origin now let's shift this isn't very clear let's shift this so that we have this full okay so you see clearly now the origin of the image which is this and the origin of this accurate this point here and the origin of the grid cell which is this point and then we have the center of the image which is around this year so centers around this so the idea now is to obtain this distance from this year from well from this origin to this point and that's it so let's get this distance and this distance so that's it we need to get the distance from year to year this is from year to year as of now we have our center normalized we respect to the whole image and if we want to normalize this now with respect to a specific grid cell we need to take this value and multiply by the number of grid cells we have so given that we here we have 0.07 we'll take that and multiply by 7 so we have 0.07 times 7 which will give us 0.49 and then we'll take 0.75 0.75 times 7 which will give us about 5.25 now what this means is that the distance from year to this that's in this horizontal direction is 0.49 that's approximately 0.5 and this makes sense because the distance from year that is origin to where we have the center year of this object is approximately half of the distance from year to the full cell and then the distance from year this origin going the horizontal in the vertical direction to this center is approximately 0.25 so we go about 0.5 and then year about 0.25 nonetheless after multiplying year that's 0.07 times 7 we have 0.49 that's 0.5 that makes sense we have we also multiply 0.75 times 7 that gives us 5.25 but this distance actually only 0.25 so what we'll do is we'll take 5.25 modulo 1 and we'll obtain 0.25 because the distance from this origin to this center the center this is the center of the object is 0.5 in this direction horizontal and 0.25 in the vertical direction now let's make this bigger so you could see that even clearer what we're saying is we have a center which is around this and then we have this origin here the distance from year to year is about 0.5 of our cell and then the distance from year downward up to the center this distance is about 0.25 of our cell and we've seen that to compute this automatically what we need to do is get this already normalized values from year we already have this normalized values we multiply them by seven and then we compute the modulo of those where we find the output from year modulo 1 to obtain these distances so let's do 0.49 modulo 1 it should give you 0.49 and then 5.25 modulo 1 it gives you 0.25 so you see that now we have this center with respect to this grid cells origin and that's exactly what we had in the paper now diving into the code you see that we are going to create this output level which is essentially going to be a 7 by 7 by number of classes plus 5 that's 25 the number of classes 20 by 25 tensor and then we are gonna go through each and every bounding box we have here and then put in the values corresponding to the specific cells so again we have seen that for all these different cells here we have all zeros but for this cell this cell this cell and this cell we have non-zero values not all values are zeros so let's concentrate on this one as we see already so we have here for being range length that's for being the range of the number of bounding boxes we say that length is the number of bounding boxes we have the bounding box that specific bounding box this zero year is simply this so this is X center the center we multiply by the split size multiply by seven so you just like taking 0.07 times seven that will give you 0.49 so this is actually 0.49 if we are dealing with this bounding box here and then for this next one is essentially 5.25 so this is this times seven will give you 5.25 now or one other thing we need to do is we need to pick that specific or the specific grid cell to pick the specific grid cell out of all the 49 grid cells because we have seven by seven year this one two three four five six seven one two three four five six seven so we have 49 options we need to pick only one option and that's what we are doing right here so to do that what we need to do here is we take this 0.49 and then we convert it into an integer always simply round this down so running 0.49 down will give you zero and they're on in 5.25 down we give you five so it simply means that we are going in the X direction our is in the direction we had the zero position but in the y direction we go to the feet position so this is 0 1 2 3 4 5 and then the x direction we still at zero so that's how we we get this here and then so we have we have said 0 5 and then the output level 0 5 just like remember we had this output which is 7 by 7 by 25 this is 1 2 3 4 5 6 so at this other one 7 and then here we have 7 so we have 7 by 7 by 25 let's add this here we have this this this 1 2 3 4 5 6 and then we have 7 so essentially what we're saying here is 0 5 that's 0 and then 1 0 1 2 3 4 5 at this position right here for the first five values for its first five values would have 1 1 signifies that we have an object then for the positions you see we have 0.49 modulo 1 which is going to give us 0.49 then we'll have 5.25 modulo 1 which will give us 0.25 so now we're finding its position with respect to this cell and then we have the width we have the width here which is this bottom box 2 because it's 0 1 2 so this is the width but remember from the paper it's of what we need to have here was a width of this bottom box with respect to the whole image so what we have as value here will stay unmodified so that's it 0.036 0.036 and then for the next value is 0.17 0.17 so that's how we obtain the first five values you see we assign this values we have right here now for the classes we have from 5 right up to 25 so we have from last 5 plus this we assign the value of 1 now to understand why we what we are doing here remember that this bottom boxes or this bottom box of value or index value 4 takes in this value 14 so what we're saying is at the position 5 plus 14 we want to assign a value 1 now remember that we have this first five values which tell us the objectness or which gives us the object score which gives us a position and then the remaining 20 values will tell us the class of the object so after listing this we also have the class but we have 20 zeros see so again we have this objectness right here we have objectness we have bounding boxes and then we have the 20 zeros now when we say 5 plus 14 it means that we'll get to the 19th position this is 5 plus 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 so at this position here you're gonna take this off and then replace that with a 1 so here we have a 1 meaning that this year has an object this is bounding box and that object is of class person so you see here we assign this value 1 and that's it we output or we return the output level recall that we've just done this for this object but we have four different objects that's for in fact four different bounding boxes so we'll do for this we'll do for this we'll do for this and we do for this that's it at this point we could take all this off and then carry out some testing so we have this pre-process XML which outputs this year so we could do generate generates output which takes in pre well let's copy this here let's copy this paste out here the output of this is this bounding boxes and then scroll so you can see that clearly and then we have the length of that output okay so that's it we have this output and we have the length of the output let's run this and see what we get this isn't defined let's run this and there we go so we should have something reasonable from this you see we have this output and if you check this out you see 7 by 7 by 25 and then now what we could do is we could say okay want to get 0 that's 0 0 so 0 0 let's run that so we get 0 0 that's this cell right here you see all its values all the values of this cell let's take all this off all the values of this cell here are zeros which makes sense now let's do 0 5 0 5 we run that you see we have exactly what we expect here we have 0.49 we have 0.31 meaning that the distance from here to the center is about 0.31 here we have 0.036 we have 0.174 and then we have a 1 at this position in that as a person now if we go 0 1 2 well this is 0 1 2 and that was in the direction and 0 1 2 in the vertical direction so this is 2 2 let's do 2 2 and see what we have we have 2 let's get back 2 2 we should have an object there see we have an object so we have here 1 we have his bounding box and then notice how this is his class remember the aeroplane the aeroplane as we had here was the very first class so it makes sense that we have a 1 at this very first position so that makes sense so that's it we see how we could generate our output from this data set we've we've been given all this XML files we have for each and every image before we move on we are going to do some slight modifications on the code so here we no longer need this length we have the bounding boxes then we're gonna create this numpy array instead of the tensor flow variable we had before so it's exact same shape as before still our output level and then we'll get the length directly from this bounding boxes so we have for being range length of the bounding boxes then to account for the fact that we are gonna have this computations bashed we are gonna add this three dots right here so we have now our grid X computed similar to all of your scene and then we have our grid Y computed still our i j garden and then here again we add the three dots to account for the batch computations and so that's it this exact same code and then now we're gonna convert this numpy array into a tensor so use the convert to tensor method in tensor flow so that's it we run this and we still have our same output now to ensure that our validation set doesn't get mixed up with the training we are going to define this set of 64 images which will be our validation set or which will make up our validation set there we go we have this vowel list and then the next thing we'll do is we are gonna copy as you see here we're gonna copy or rather we're gonna move this files into this two directories so here if you open this up now you'll see we have we've created this directory vowel JPEG images which is this and we've created a directory vowel annotations which is this other one which contain the validation set images and annotations and so that's it so now we're gonna create this different lists here we have the image paths and the XML paths we have the validation image paths and the validation XML paths let's run this you see we have 17,061 files for the training and then 64 for the validation and that's it again you should note that this training images has already been defined here we'll define training images right here we've also defined vowel images and vowel maps so that's where we getting all this from that's how we obtain all those different paths now from here we're gonna create the tensorflow data sets that's the train data set and the validation data set which is essentially going to be made of these different paths which we've just created so we make use of this from tensor slices method and we put the image paths and the XML paths and the validation paths and the validation XML paths so let's run this and then we could visualize our validation data so here you see we have this path here that's our image path and then you see we have this on the path which is our XML file so we have the image and its corresponding XML file then now that we have our image paths and our XML paths already making up our data set from this tool we could obtain the image and the bounding boxes for the image all we need to do is pass in the image path in this read file method then decode that read file and then we go ahead and resize and cast the image so let's take this off we actually do the casting here so no need doing that before resizing so as we say we resize and then we carry out the casting and we obtain our image from this image path now for the bounding boxes remember we looked at this pre-process XML method which was already explained here we looked at this method which takes in our XML path our file or file name and then outputs the bounding boxes so that's exactly what we're doing here and so that's it now given that this method does a pre-process XML method isn't made of only tensorflow operations will need to make use of this numpy function method where we are going to pass in our function here we specify the input which is the path that's our XML path and then we also specify the data type of our output tensor which in this case is float 32 so this let's let's say this is XML path and here we have XML path okay so now we have the path and we have the method we can now obtain the boxes and then now we have our train data set which is going to be redesigned such that we have this year we have this paths the image path and the XML path which gets in here and then outputs the images and the boxes so you see train data set we map and then we specify the method which is a get image and bounding boxes this we get the images and the bounding boxes from the image path and the XML path so now our train data set is no longer going to give us the the image path and the XML path but it's going to give us the image itself and the bounding boxes so let's run this and then see what we get there we go you could see we have this image and then we have its corresponding bounding box in this case we have just a single bounding box let's go ahead and write this then we check this out here there we go we have this output you could see so you see we have this output here and it's showing that we have an airplane so if you check this out you see we have the bounding box here and then we have the class the class is zero so if you scroll to the top you would find that the class airplane was here so this is the zero class which makes sense now let's try out with some others well most of those ones here have only one object but there's this one at this 18th position let's keep let's keep that and then a break there we run this again and then see what we get okay so this this particular image has several objects so we could check that out run this okay so you see here we have now many more objects here we have many many people actually you have 14 and this happens to be person so if you if you have here yeah we have this well okay so if you do classes well the list is classes so let's just do classes 14 and we get the exact class we have the classes 14 and you should see you should have person okay so you see we have 1 2 3 4 5 6 7 8 persons and if you get here you see we should have 1 2 3 4 5 6 7 8 exactly what we expect and then we have 10 8 8 well this 8 this 8 here and then after we check out 10 it is cheer so maybe is this cheer okay it is twice so we have a cheer year and we also have another cheer year and then 10 is let's check that out should be dining table okay see we have the dining table we have dining table we have chair chair and then 8 persons so now we have the image and as button boxes the next thing we want to do is have the image and its output levels remember with our generate output method which we had seen already here it takes in the output boxes and outputs the levels so let's get right here and then you see we have this pre-process method which takes in again this image and this button boxes and then right here we are going to output the image here just simply output an image but for the button boxes we need to convert this into output levels so we make this of generate output method which itself takes in the button boxes and then we specify the data type of the output tensor again we're using this numpy function method right here because this generate output isn't made of only tensor flow operations so that's it we'll run this and then there we go our final steps now will be to batch our data set and then implement prefetching so let's run that and then we have our training and validation data sets which have been prepared hello everyone and welcome to this new and exciting session in which we are going to build our own YOLO like model so from the paper we had already seen that we have this initial conf layers which are pre-trained on the image net data set such that they could be used to extract very useful features from our input images and then this conf layers were followed by two fully connected layers which were designed in order to adapt to our problem of object detection now given that we do not want to train this backbone here from scratch on the image net data set we'll use an already pre-trained backbone which is the ResNet 50 again we have the output dimension defined as number of classes plus 5 times B from the paper B is given to be 2 we've seen this already and then 5 because we have the probability of obtaining an object and then for the remaining 4 we have the bounding box so we have two of this bounding box predictions that's why here we have 2 times 5 10 plus the number of classes which in our case is equal 20 now we define this number of filters to be 512 so that's it from here we have our full complete model you could take this off you see that we have our pre-trained ResNet which is our backbone our base model and then we follow this up with several conf layers similar to what we have here in the paper and then we have the global average pooling which is what is given here in the paper now one thing you should note about the average pooling is the fact that when we have inputs let's say we have this 7 by 7 then by let's say 5 so we have 1 channel 2 3 4 and then 5 so let's suppose we have 7 by 7 by 5 input now after going to the global average pooling what we will have here is the averaging of each and every value or let's say pixel in each and every channel so for this channel for example we'll have one representative value for this channel will have one representative value which is the average so for this who would have the average this would have the average and this would have the average average. So we average all those values here. And the problem with this is information about object position is lost. And so instead of using this average pooling is preferable to use the flattening. And so what we'll do is we'll take this off from here, and then we'll have flatten. Okay, so once we have that, just with the paper, you see here, we have the fully connected layer, which is this dense layer right here, and then this other fully connected layer. So we have that, and then we reshape. Now this should be actually split. Let's take all this here, copy and paste. So split by split by split, or split by split by output dimension. So it's 7 by 7 by 30. So this is now our model. You can see we have a total of 53 million parameters, 30 trainable and 23 non trainable. That's from our ResNet. We are now going to get into the YOLO loss. So here we have this YOLO loss method, which takes in Y true and Y pred, where this is our target Y, and then this is our predictions. Now the YOLO loss we have here will be an implementation of what we already discussed from the paper. And so it's made of four different parts. The first part is for the coordinates. The next part is for the object. The next new object. And then finally we have the classification. So we have this four different parts. Now we're going to start with this first part here. We're going to start with the object. So in this case, we shall penalize the model for not detecting an object at a particular cell. And that's why you see here we have this one OBGIJ, which denotes if the object appears in cell I and this J right here. Notice that here we don't have a J. Here we have this J. It denotes the fact that the J bound in box predictor in cell I is responsible for that prediction. Now if we take a look at this figure right here, where we have split this image, which is actually 224 by 224 into seven different parts. So it's basically now seven by seven grid cell image. And each and every grid cell, as we have seen, has its own predictions. Now if we specifically pick this grid cell right here. So let's pick this one. As you see, we've picked this. Now we have two outputs or two possible outputs. We have the Y true, Y true. And then we have the Y predicted by the model. Now you'll notice that if you count this, it's going to be a total of 13. And this is going to be a total of 18. As we've seen already, this first year represents whether we have an object or not. The next four, which is all in green, is the position of the object in the image. And then this other eight specify the class of that object. Now before we move on, you should note that when we first recorded this section on the yellow loss, we were working with a data set with just eight classes. But since we now dealing with a Pascal VOC data set with 20 classes, our Y true will look instead like this. So now we're going to have this additional 12 values here and this additional 12 values here. So instead of having 13, 18 as with eight classes, now we are going to have 25 and 30. So it's like 13 plus 12 and then 18 plus 12. Nonetheless, this doesn't change much on our yellow loss method. So wherever you have eight classes, you should consider in our specific case of the Pascal VOC data set, we're dealing with 20 classes. Now we had also seen previously that although we have this Y true, which has this 13 different values for each cell, this Y pred has actually 18 values because we have two possibilities or two possible positions of the object. We have this first possible position. Well, let's call this B1. And then we have this other possible bounding box, which we'll call B2. So we had already seen previously that we said B to two because we have two possible bounding boxes. And we also saw that even though our model or even though our data was designed such that the outputs were 7 by 7 by 13, the model outputs 7 by 7 by 18 outputs as you could see here. And so when we have a loss function like this, where we are given the Y true and the Y pred, where the Y true is in fact 7 by 7 by 13 and the Y pred is 7 by 7 by 18. Since we want to start with the object part of our loss function, which is this part right here, our focus should only be on these two grid cells, this grid cell and this other grid cell right here. So we want to focus only on this grid cell and this because these are the only two grid cells where we have an object located. Now the way we shall select this programmatically is by gathering all those grid cells where the target is equal to one. Now our target here is simply Y true where we've selected only this object score. So if you look at Y true here and if we pick this object score that coincides with this one we have here. So in the case where we have an object, we will obviously have a one for the Y true. And so when we say we are going to pick all the different cells of Y pred where we have a one at this Y true level, that is this first for this first value of Y true, it means that we'll now have Y pred, which is going to be all these different cells where we have actual objects. And then we're also going to do the same for Y target or Y true. So here we have Y pred extract and then we have Y target extract, which is simply taking or getting all the positions where we have the objects, but this time around for Y true. So now we have Y pred where there are objects and we have Y true where there are objects. And with this, we can now focus on only this two cells, this cell and this cell. Now let's take this off and run this so that you could see what this produces. So we run that. Let's, let's print this out. So let's print out Y pred extract and let's print out Y target extract. Well, we could also print out target. So well, let's just print out this, this way right here. So let's print TF. Let's copy this. We'll get this. So let's paste this out here. There we go. And that should be fine. So let's run this and then now we'll test with some inputs. Now the inputs we are going to be using will simply be the exact same coordinates we got from the dataset. So here you have this and then you have this where this year corresponds to this vehicle right here. And then this one corresponds to this other vehicle. Now we had seen already how we could use generate output method to produce our Y true or our dataset value for Y or the output. So once we have Y true, we are going to add an extra dimension. Now we have Y true ready. Then for Y pred, we'll just generate this random values. And then what we'll do now is for this specific values or for the specific cells, that's one four and three two, we are going to put its own values. You'll notice that here we have 18 different values where here we have the probability of having an object. We have the position 1234. There we go. We have the position, the probability of having an object. We have its position 1234. This is 1234 here. And then the photos rest here, we have the class or the different class probabilities. Now we could do the same for this other one. We have this and then we have 1234. Okay, so there we go. So this is what we have as our Y pred. So we're supposing that the model predicts this and then this is our Y true. Remember we had already seen that this Y true here would produce our datasets output. So now let's run this. Let's run this and then run this and then see what we get. Okay, so there we go. You could see from here, we have all the different positions where this target value is equal one. So that's what we printed out here. And you could see clearly that we have one four and three two. Now the zero year is simply because this is the first batch value. Now that's it. We have this 1432. For the next we have Y pred extract. So want to extract only those values where we fall in this grid cells. Now you'll notice one thing that you have 0.9, 0.47, 0.31, which coincides exactly with this. And then we have the sort of one 0.3, 0.01, which coincides exactly with this. And so clearly we are only focusing on the models outputs where we have actual objects from our dataset. So we've, we've picked this year and we've picked this and then now we're ready to compare what the model produces at this positions and what was expected, which is this Y true now. So this is Y pred and then this is Y true. We're comparing only at this two cells. You see here we have two, two. Now what if we suppose that we have only one object? So if we have only one object, what we'll do is we'll take this one off. Let's run this again. And you see now that we only have one, which is picked. So let's get back and run that. And there we go. So you see, we've, we've picked this, we've picked this cell and this other cell. And we already now as described in the paper to compare the different probably discourse with one another. So we just simply have to subtract this and that would be good for this part of our loss. Now, if we only had a single bounding box prediction, like here, if we didn't have this, so let's suppose we had only this and the Y true only this, then what we'll do is we'll take this one minus whatever value we have here and then we'll subtract. So if Y pred is one and Y true is one, we'll just have one minus one. We'll subtract that and then we square it exactly as we have in the paper right here. You see, we have the object score minus the target object score. And then that squared. And from here, we add this to all the other positions or all the other objects at different grid cells. That's why we have this summation. Anyway, we had seen this already. So getting back here, now that we actually have two bounding box predictions, as you could see, we have B1 and B2. What we'll do is we'll take whatever value we have here and then for this, let's wipe this off. Let's say this is a dash and then this is a dash or some value. Let's call this lambda and we'll call this lambda. What we'll do is we'll take this one minus one of this, that's either lambda one, if this is lambda one and this is lambda two. So we would have one minus lambda one or we would have one minus lambda two. And the way we shall pick between lambda one or lambda two is by looking at this coordinates right here, this positions. So if this bounding box we have here is closer to this other bounding box, that's a true bounding box, then we'll pick lambda one. But if lambda two's bounding box that this is closer to this true bounding box, then we'll pick lambda two. So that's how we do the peaking. And the way we would compare this bounding box with this and then this other bounding box with this is by making use of the IOU's course. So let's take this off and suppose that we have this input image. Let's take all this off too. We have this input image and then we have this bounding box. So let's say this is the true bounding box here. This is our true bounding box. And then we have this bounding box B one, which is something like this. So let's say here we have a lambda one and we have its coordinates. So we produce something like this. Let's change this color. Let's say we have something like this. See, so we have something like this. So this is what we're getting from this one year. And then we have this other bounding box where we have maybe say something like this. Now, in this case, because this year let's get back. So we differentiate between the two bounding boxes. Let's redraw this year. And then we suppose we had this. Okay. So because this green bounding box is closer to the red bounding box, we are going to pick lambda one. And simply we are, we are going to take this into consideration. We're not going to take B two into consideration when computing objectness part of our loss. And so getting back to the code at this point, we have to subtract 0.9 with one that we have here, all would subtract. This is 1234 0.8 and one. So we have to, we actually have this or this and the choice between this and this will depend on this coordinates and this coordinates. But then our IOU method is designed such that it actually takes in pixel values. So what that means is we could design our IOU method, which will take the center and the width and the height of these two boxes. Let's say we have this, this box and this other box or this rectangle and so the rectangle, we take the center of this, it's width, it's height. We think sort of once center is width, it's height. And based on that, we compute the IOU score. But the point that we have in here is that what we get in here is not the pixel values. What we have here is this pre-processed values, which we had seen already. And so if we have the center here, you could read that from here. If we had the center, which is at position, let's say 43, let's fix this to this. We have 46, 140. So we have this center, which is at around 46, 140. Now we have, well, 46, take this off. We have 46, 140. If we focus on only this first coordinate here, then we would have 46, modulo 32, which will give us 12. All right, a 14. Let's get back. This will give us 14. And then now this 14. Yeah, we move to the next step. 14 divided by 32 would give us, 14 divided by 32 would give us 0.44. So we would have 0.44. And this is the percentage occupied in this grid cell. Remember, to obtain this coordinate or to obtain this value from the original center value of X would have to make sure that we get this position of the center with respect to this grid cells center, which is right here. So this distance from year to year is about 0.44 as we've just calculated. And that's exactly what YOLO takes in. But since now we need to get back to this original values from these values we used to train, because we need to be able to compute the IOU scores between a given box and some other box, then what we'll do is now reverse this process. So we are going to go from what we get here. Like for example, this value back to this value 46. Now, if you're wondering why we picked 32 right here, you should know that this image is 224 by 224. And if we divide 224 by 7, you would have 32. So each cell, though this is not from year to year is 32, from year to year is 32 and so on and so forth. So if from year to this point here is 44 or rather 46, then if you want to get this remaining distance, you just need to do 46 modulo 32 to have this remaining distance in year. And then you divide this distance by 32 to obtain the fraction of occupied by this distance in this full grid cell. So with that, we are going to also repeat the same process for this height coordinate. So this is a center X and center Y, we will take 140 modulo 32, we obtain this value divided by 32. We now get this fraction occupied from this year to the center. We've got in this already, this is 0.44. While we could also get this Y and so that's how we obtain this fraction as we have here. Anyway, we've seen this already. But now what we're trying to do is get back from this to a value like 46 or from this to a value like 140. Now we should be noted that for the width and the height is going to be quite easy because remember when designing this or when obtaining this, what we had simply done was we took the distance or we took the width of this and divided by the image width. So if we take this and divide by the image width, then to obtain the original width, all we need to do is multiply by the image width. So if we had originally, let's say this width was originally 20, for example, then to obtain what we have here, this width and this height would take 20 divided by 224. We get some value. Let's say approximately, this is like 111, something like that. So it's like 111. Let's say approximately 0.11. Okay, so if you have 0.11, then now to obtain 20 from this, we just take 0.11 times 224. So that will give us back 20. So that's how we reverse this. It's easier compared to this where we need to go through all these two steps before getting back this original values. And again, the reason why I want to get this original values is simply because we want to be able to compute the IOU between a given bounding box and this two bounding boxes here. So we make the choice of whether to say 1 minus lambda 1 or 1 minus lambda 2. Now let's dive into how we move from 0.44, 0.375. It's actually 0.375 to 46 and 140. So the first thing we'll do is we'll simply multiply this 1, 4 by 32. Now 1 will become 32 and 4 will become 128. And this will take us from this origin right up to this position. So this will be found at this point here, at this point you have from year to year 32. We've already seen that each grid cell here has a size of 32. So from year to year 32, from year to year is 128 because we have 32, 64, 96, and then 128. So you see we're already getting close to the center of our object. And so that's why we have this rescaler right here. We have this rescaler, which is gotten by multiplying those values where the target equal 1 by 32. So all the positions where target equal 1 will multiply by 32. And so now let's run this. And you see that we have 32, 128, and for 32 we have 96, 64. So for this other object here, for this object we have 1, 2, 3, that's 96, and then 64 is 1, 2. So we add this position. We're now left to get to the center. Center should be around this year. Anyway, we've taken this first step. Now the next thing we want to do is get rid of this batch here, the batch dimension. We don't need that. So we'll just have 32, 128, 96, 64, and then we'll add two zeros for the width and the height. So here is width and height. Well, this is X center and Y center. Now to do this, we are going to simply take off the batch. So you see, we start from one, we take off the batch dimension, which not consider the zeroth index. We take that off and then we add two zeros. So here we just add this two zeros. Now the number of lines we would have would depend on the length of our rescaler. If our rescaler is the length of two, as in this case, then we'll have two by two metrics, which we are going to concatenate with this rescaled output. So essentially we had this year, we had this 32. Let's rewrite that. When we take only this first index or from the first index to the end, we have 32, 128, and then we have 96, 64, and then we concatenate that with a metrics, which is two by two, where we filled all those with zeros. And so now we have this year, which represents X, Y, and this height or width and height. The next thing we'll do is we are going to take this target coordinates. Note that we take from one to five. So it's essentially X, Y, W, and H, and they will multiply by 32, 32, 224, 224. Now let's explain why we choose to multiply by all this. So let's suppose that you have, for example, 0.44, you know that this is 0.44 right here, and then you have 0.375 for Y for the center. Now taking 0.44 and multiplying by 32 will give you this distance from year to year in pixels. So if you have 0.44 times 32, you should have 14. Now remember that to have 0.44, you had 14 divided by 32. And so now to get back 14, you simply have to do 0.44 times 32. And that is exactly what we do here. You have this 32 times 0.44, and then you have 32 times 0.375. So once we have this first two, now you have the height and the width, where you have the values multiplied by 224 and 224, which makes sense because as we've seen before, to obtain the width and the height, we divide it by 224. And so now to get back to the original width and height, we need to multiply by 224. So we essentially have in this year multiplied by the coordinates or bound in box coordinates. Now the reason why we repeat in year is simply because we could have several objects. Now in case we have two objects, so we just repeat this twice and carry out the same calculation for the two different objects. And this repeats here where we specify the length of the rescaler. The length of rescaler is equal to 2. So this permits us to repeat twice. So that is it for the target where we've taken this value, because this is 1, this is 0, 1, 2, 3, 4. So we go, when we say 1 to 5, we're taking 1, 2, 3, 4. So we're taking this, this, this, and this, multiplied by 32 for this, multiplied by 32 for this, multiplied by 224 for this, and this by 224. And that's how we obtain this distance from year to year in terms of pixels. So from year to year, we get the distance. And then now, we could also repeat the same process for the other two predictions we have for the prediction one, and then we have the prediction two. For prediction one, we simply do the same. You see here, but the difference is why prep, and year is why prep, unlike year, which is why target. We still have the same calculation. And note how year we go from 6 to 10, because this is 1 to 5, and then this is 6 to 10 for bound in box two. So that's the only difference we have here. And then we would get the pred one and pred two. So at this point, for this object year, we note the distance from the origin right up to the nearest cell, which is this distance. In this case, it's 32. We had seen that already. And then we also notice distance from year to the center of the object in terms of pixels. And that's what we just calculated. And that gives us about 15.21. So we have now 32, and we have 15.21. So it means we have the center or the distance from year to the center in terms of pixels. Now for this 128, you see, we're going to go from year to year. We know that this is 128. And we also have this distance from year to year, which is 10. So now we could add this up for the width and the height, we have simply taken what we had and then multiplied by 224. And that's what we've had. So we don't need to do any modifications here. Now you will notice now that because we have this, we could add this with this, this with this, zero with this, zero with this. And then we could get our output now, which will be this width and the, or rather this center, respect to this origin and the width and the height respect to the full image. Now to make it very clear, let's, let's consider we had only one year. So let's take off this prediction and then take off the sort of prediction. Suppose that we had only one object. So you could see on that very clearly, you see here we have 96, 64. Well, this should be on the order because we took this one off. We should instead take this year off. Oops, this should be the one taken off. Okay. So let's run this again and there we go. So you see here we have 32, 128, 0, 0. We have 15, 10, 28, 52, where we will take now 32 plus 15 and then we'll take 128 plus 10. We'll take 0 plus this, 0 plus that. And that gives us the, the width and the height and also the center. And then for the predictions, we will take this plus this, and then we'll take this plus this, take this plus this, this plus this, and that will be it. So finally we'll be able to obtain the bounding boxes in terms of pixels for the target and for the two predictions. So finally now we are going to take what we had here and then add with this other one target this of scalar one and target of scalar two. So let's run this so you could see the outputs. There we go. You see we have now 47. We've taken the 32 plus that 15 and we've had 47, 128, this, this, we've had this, this, this, this, and then this. So we now have those bounding boxes in terms of pixels. So the next step one we follow is actually compare this box with this and then compare this box with this and then see which of these two are actually closer to this. Now, one thing we did while designing this, why Prez was to ensure that the first one, this one was going to be the closer one because you could see here 47, 47, they are actually almost the same values so that when testing, you'll see that this one is closer. So to compare this, we'll need the IOU score and to compute this IOU, let's consider this simple example where we have two boxes, this B1 and B2. Now the IOU, as we saw already, is simply the intersection over the union. So we have to compute this, it's going to be this here intersection divided by the union. So it's essentially this region, as we've seen already, divided by all this, including this intersection region. So it is this divided by this. Now starting with this intersection, what we'll need would be this coordinate here and also this position and this here. So the way we're going to get this point is by starting by getting this point, this point here, this point, this point and this point. But remember that we're actually dealing with the center and the width and the height. So we are not having the X min, Y min, X max, Y max and X min, Y min, X max, Y max values by pass in here. What we pass in here is, what we actually receive there is the center, the width and the height. Now the first thing we have to do is to convert these coordinates given to us in this form of center, width and height to one in which we have this X min, Y min and this X max, Y max. And to do that, we have this piece of code right here. Now first thing you have to note is the fact that we have 0, 1, 2, 3, where this represents our X center, Y center, width and height. Now if I'm given this here, X center, Y center, to obtain this position X min, Y min, it suffices to take for example this XC, which is this distance from year to year and then subtract it or take away half of the width from it because from year to year is the width and then from year to year is half of the width. And if we take XC minus half of the width, then we get back to this position. And then if we take YC minus half of the height, then we get this position here. So at the end of the day, we get back to the origin, obviously of the specific box. So if I want to get back to the origin of the specific box, which happens to be X min, Y min, I need to take XC minus half of the width and YC minus half of the height. And so you'll see here we have this is XC, this is X, we've seen this already, minus half of the width. The width is two, see the width, half of the width, that you see width divided by two and then XC, YC, which is this minus half of the height, see three year. So that's what we do to have X min, Y min. And now if we want to have X max, Y max, then we have to take XC plus half of the width and then YC plus half of the height. And that's what we do here, just simply replace and put the plus and that's fine. So that's how this box is now. This box one is converted to box one to you where we now have X min, Y min, X max, Y max. And we'll do the same for this second box. Now, once we have this, the next step will be to get this coordinates here. Now it happens that if we want to get this X min, because remember, this is the intersection. Now this intersection forms another box. Now to get this year, it suffices to compare this X min, Y min and this other X min, Y min. So we take the X min, Y min of B1 and the X min, Y min of B2 and look for the maximum between the two. And is this maximum or the one which is right most because our origin already is at the top left corner. So increasing is this direction, increasing is this direction, increasing is in this direction and this direction. So when we say maximum of this and this, then we're looking for the one which is in the right most direction. So when we compare in here, you see, we're taking the first two and the first two year. So we're comparing the Xs and X min, Y min actually. So when we compare the X min, Y min, we get the one which is maximum and that's the one which will play the role of the X min, Y min of this intersection rectangle, which will form here. And then for the X max, Y max, we need instead of minimum. So we need the one between these two, between this and this, we need a one which is to the, which is left most. So that's why we need to take the minimum and we compare the last two indices, that's X max, Y max, as you could see. And then that's how we obtain this position and this position for this intersection rectangle. Now, once we've gotten that, that's X min, Y min, X max, Y max for this intersection. The next thing we want to do is actually compute the width and the height so that we could multiply and get the area of this year. Now to get the width and the height is quite simple. We just need to take, we need to subtract this simple because here you have, let's write that again, we have X min, Y min, X max, Y max. So you just take X M that's X max minus X and then you multiply by Y max minus Y. That's all. So this is the width and then this is the height. Take this minus this times this minus this. That's all. So that's what you see is, which is actually done here. You see we've, we've done the subtraction from here. We've done the subtraction and then we now multiply this two and that's how we get the area, the intersection area. Now, once we have the intersection area, the next thing we want to do is obtain the union area. So we take the box one, you see two is the width and box one again, the height, multiply the width times height. We have this area. Take for the box two, width times height, we have this area. Now we add these two areas up. You see here we add these two areas up. We remove the intersection because we want to get only this and not, we do not want to add this twice because when you calculate this area, you already have this. And when you see calculate the area, you still have this. So you want to take off the intersection because it's going to be completed twice. And then we now have the full union. And so now we have intersection divided by union and that's it. So once we have intersection divided by union, now we will be able to compare this with this and this. And so now in order to compare this different boxes, we would have the target box compared with the second prediction. And we also have the target box compared with the first prediction box. Well, before doing this, we could actually print this two out separately. So let's print this one out. Let's do TF print. Second, second box that's comparing the target with a second box. Let's have that there. And then we would compare the target with the first box. So let's print now first box. There we go. We have now print one. Okay. So that's it. We'll run this before having this. So you better understand what we're doing here. So let's run this. Let's take this off. We could take this off and then run this. You see that with the second box, we have 0.16 and with the first box we have 0.93, which makes much sense because when you look at those predictions, the first box looks much more similar to the target as compared to the second box, which is this one. So it makes more sense that the first box has a higher IOU score. Now up line the tensor flow math greater method, we'll be able to have this Boolean output, which tells us whether this second prediction is greater than the first prediction or the IOU between the target and the second position is greater than the IOU between the first and the target. So given that in this case, for example, this IOU between the first and the target is greater than the IOU between the pred two and the target, then this will output false. And since this will output false, casting this to an integer will produce an output of zero. And the zero will simply mean that between the two options, that's the first output and the second output, this is actually the first output. And then this is the second output or second bounding box between the first and the second bounding box. This is the one which has, or which is closer to the target. And that is exactly what we want to have here. So there we go. We have this output zero. Now let's include this other example. So now we'll suppose that we have two objects, this object and this object by adding this we've added this other one year, which happens to be this object. Then we'll uncomment this part and then see what we have for the mask. So there we go. You see, we have zero one, meaning that for the first object, that is this object here, it is this second box, which is this one year, which is closer to the object or to the target. And then for the second box that for the second object here, it is this first, which is closer. Now let's, let's interchange this. Let's take this off here and paste out here and then take this from here and then paste out here and see what we get. Take this off and add a comma right here. So now because this one year will be closer, we should have this one being picked. So we should have one, one. Now let's run this and see what we get. See, we have one, and you could see that from here. The second boxes have higher IOU scores compared to the first boxes. Okay. So now we have the bounding boxes, which are closer to the target. Like in this case, we know that B1, B1 is closer to the target as compared to B2. So the next thing we need to do now is get Lambda one and Lambda two. And then from this Lambda one, Lambda two, choose Lambda one. Now let's start by getting Lambda one, Lambda two. It's going to be quite easy. All we need to do is pick this zero value and this fit value. Remember this is zero, one, two, three, four, and then five. So this is a fit. So that's why you see, we picked this. So this is the probability of having an object for B2. And this is that for B1. In fact, this is Lambda two and then this is Lambda one. So we concatenate this and then we rearrange this by transposing. So let's run this and see what we get. There we go. As you could see, we have 0.9 and 0.8. That's it. And then we have 0.3 and 0.98. And so what we'll do now is based on the mask, we'll say, okay, for the first value, which is zero, it means that we are going to take this one instead of this. So we're going to take this probability instead and then move into the next position or move into the next object. We are going to select the value number, this first index or this first or the second value. So we'll skip zero and then we'll pick one. So for the first one, this is our Lambda one we picked. And for this other one, Lambda two will be picked. Remember that if we change these positions, then we'll pick Lambda two for this one and Lambda two for this one, because this will be one one. So since this is zero one, we'll pick Lambda one for this first object. And then since this is one year, we'll pick Lambda two for this other object. So now to do that programmatically, we are going to gather all this Lambda values that this probability is here based on the mask. So doing that, you'll see that we'll be able to pick those values of Lambda corresponding to those bounding boxes with the higher IOS cards with respect to the targets. So we've seen how when given this, let's take a single example. So let's comment this again, comment this and this, run that. We're seeing how we're able to pick these two bounding boxes, that's this bounding box and this other bounding box. And then from this, pick the one with the higher IOS card, and hence pick the probability which we are going to use to subtract from one. So now we know that we will have one minus 0.9. Now let's reverse this. Let's reverse this again, reverse this, take this off, and then we paste this out here. Then we take this off. Then we paste this here. Okay, so let's run this and then see what we get. You see now we have 0.8 that we've picked this other one instead because this bounding box is having a higher IOS card with respect to the target as compared to this other bounding box. So this is how we are going to pick the bounding box, which we are going to use for computing the objectness part of our loss. Okay, so finally now we just need to compute a difference between what we have here. That's in this case 0.8. For example, let's get back. So we have what we had originally, which is going to be 0.9. So different between 0.9 and 1. So you see here we have 1s, which you could see the length is based on the rescaler. So since we have only one object, this is going to be 1. So 1 minus 0.9, and that will be it. Now this difference method we have here has been defined. It's basically the square of Y minus X. So we take two subtract, find the square as defined in the paper, and then we have this reduce some method to get a single value. So that's it. We run all this and we could now print out our object loss. So we have print object loss. There we go. Let's run that. And you see what we have 0.01. You see that that's quite small because 0.9 is close to 0.1. Now let's just modify this and say, okay, let's say 0.09. See that it's going to be having a higher loss. See that we have higher loss. Now that's it. So that's what we do. Now you see that no matter what we do here, even if we put here 1.0, we'll never have a loss of 0 here. And that's because it's this one, which is used to compute that loss. We are now going to move to the new objectness part of our loss, which is this year. And it's kind of similar to this calculation. But what the difference is that now we focus in on those regions where we do not have objects. So unlike here where we focused on this year, this cell where there's this object and this other cell, what we'll do now is we'll be focusing on all the other cells except for this one and this one. So as you could see, we get all the white prints. We gather all these predictions where the target is 0. That is simply where there is no object. So let's print this out. Let's print out these different positions where we have no object and the corresponding predictions. Here we have TF prints, Y, print, extract. So this is seven by seven meaning that we have 49 different cells right here. Now two cells have objects and the remaining 47 do not have objects. So let's run this and see what we get. And there we go. Here's what we have. So you could see clearly from here that we have all these different cells and let's do print so we could get its shape. There we go. Let us run that. You could see this is actually 48. Now that's because, okay, so that was because we had only one object. So if we have two objects, two objects, let's get back. We have two objects. So let's run this again and see what we get. Okay. So you see now we have 47 cells where there is no object. So that tells you that out of the 49 we have two where there's an object and then this Y, print, extract here now contains all this score and bounding boxes. You could see from here that this is 47 by 10. So let's get to the bottom. You see 47 by 10 because for each and every one of this, we could take this off from here. We'll make this connection from a cell where there is no object. So this cell, for example, here we have this 10 different predictions and then now we will make use of this to compute the loss. So obviously the no objectness part of our loss will be computed using this lambda and this lambda. So we'll take this one or rather zero because we expect it to be zero. So this time around our Y true will be zero because when there is no object like here is zero. So this is going to be zero minus lambda here minus lambda one square plus zero minus lambda two square. So the idea here is to ensure that this probability is equal to zero when there is no object and equal one when there is an object. So getting back to the code, you see that we break this up into two parts. This is like getting the lambda one or competing with the lambda one and this is like competing with a lambda two. So unlike with the object part of the loss where we're trying to pick which of this was responsible for the prediction, here we just simply take this minus this plus this minus this. So our target is obviously zeros unlike here where our target was once. So now we have zeros and then we compute the difference between our Y target here and the Y pred. Now we pick the zero for this one and then we pick this five for this other box right here. So that's it. We simply find the difference or compute the difference using this method which we've defined already right here. And then we sum those two up to obtain our no object loss. So that's it. Let's print out or let's take off this here and then print out our no object loss. We have no object loss. There we go. You see we have 110. Now we'll move to this next part. That's for the classification where we'll focus on the objects class. You'll see that we are going to only compute this or get this loss for cells where we have an object here only where we have an object. And we do not care about which of the bounding boxes is responsible because we actually focus in on only the classes. So instead of i j here we have just i because we are not focused on choosing or we are not interested in choosing any specific bounding box given that it doesn't really matter since we're focusing on classes. So getting back here we have for object class loss we have the predictions and the target. Now if you check from here you remember that the target will start from five because this is zero one two three four and then five. But for the predictions it will start from ten. So you see we have zero one two three four five six seven eight nine and then ten. So we start from ten right here. So this starts from five and this starts from ten and we go from ten to the last and then we go from five to the last and I will simply compute the difference between this and this. So that's it. We also make sure that's where there's an object. So now you have this difference and you obtain your class loss. You see that we obtain a class loss of five point four seven. Now we get to the last part of our loss which is that involving the coordinates which itself is broken up into two soft parts. This first part is just for the center and then this other is for the width and height. Now this part is more similar compared to this object as part of the loss simply because here we are going to focus only on cells where we have an object and only on those bounding boxes which are responsible for the prediction. So again we are going to gather all our predictions here. We gather all our predictions where the target is equal to one. So simply where we have objects then now we combine the centers. So we see we go from one to three and then from six to eight reason being that this year because this is zero this is one this is two. So this year represent our centers and then when you have here this is actually six seven and this is representing the other centers. So this one is the center this two center. So that's why you see we take this and this and stack them up to form our center joint. And similar to what we had done already with the objectness loss we are going to only pick a given center based on whether it's that bounding box which is responsible or not for the prediction. And again we're going to use this mask. Remember we had seen this already previously with the objectness loss. So we had this already seen. So the exact same process we following year. We just want to make sure we pick in the bounding box which is responsible for the prediction. And since we've completed this mask already that's what we're going to do. Now let's let's print out our center joint and then let's print out the center print. So we see that we actually pick out only some bounding boxes from the two choices we have center right. You see here that for the first object which is this we have this option and we have this option. Now because for the first object is the first or the zeroth index that's responsible you see here. And then for the second object we have this option and this option. But because it's this one that's responsible or the second bounding box or the other first index that's responsible we actually pick this. So you see that this one is discarded and this one too is discarded. So we focus only on this. Now for the target we simply just pick out this one and two. So that's it. So we pick this obviously going from one to three is simply taking one and then taking two. So we pick this and then now we compare with whichever one of this is responsible for the prediction and comparing that is simply applying our difference method. So now that we're done with the center we finish with the center that's actually this part here we're now going to move to the width and the height. So here is exact same thing with just the difference that we pick in the width and the height. So now instead of one two we're going to pick three four. So that way you see we go from three to five and then here we pick eight ten. Well okay this is the prediction. So instead of picking this this we pick three four. So we have this let's pick this we have this three four and then eight ten. So that's what we do here. We stack them up we carry out the selection and then we also get the target. So we take this three four see that and then we compute a difference. Now remember that when computing this difference we have to make take the square root. So you see here we have square root and since the square root takes in only positive numbers we make sure to compute the square root of the absolute values. So that's it. Bread and size target and from here we've gotten the center loss. We've also gotten the size loss. This now forms our box loss and that's it for all those different loss functions. We're not simply going to add them up. Now before adding up from the paper we had seen that lambda coordinate is going to be five and lambda null object is going to be 0.5. We have seen this already from here. We have this lambda coordinate and this lambda null object. So that's it. We have this. We make sure when adding this up we take this into consideration. So let's run this and then well let's print out the loss. So let's print out the loss. There we go and that should be fine. We are then going to define our model checkpoint where our file path is this year. Then we're going to save only the weights. We're going to monitor the validation loss. We're going to obviously save the model which produces the minimum or the smallest validation loss and that's it. We save the best weights only. So we run that and then now we move to the scheduling. Here if the number of epochs is less than 40 so the first 40 epochs we use a learning rate of 1 times 10 to the negative 3. Between 40 and 80 we use a learning rate of 5 times 10 to the negative 4 and then after that we use a learning rate of 1 times 10 to the negative 4. So that's it. We compile our model and then we start with the training. Now after training for epochs you'll notice that the model starts to overfeed. And so in the next section we are going to use several techniques to help solve this or resolve the problem of overfeeding. Now we've been training for over 20 epochs and you could see clearly from the loss and the validation or the training loss and the validation loss that our model starts performing well and at some point starts overfeeding. As you could see here we have the training loss which keeps dropping right here and then the validation loss drops and then at some point starts increasing. So clearly our model is overfeeding. Hi there and welcome to this new and exciting session in which we shall be looking at different strategies to reduce overfeeding. And in the yellow V1 paper some strategies were underlined. To avoid overfeeding they use dropout and extensive data augmentation. Now a dropout layer with rate 0.5 after the first connected layer prevents co-adaptation between layers. And so here you see that after this fully connected layer we're going to have the dropout and we're going to give it parameter 0.5. Then for the data augmentation the authors introduce random scaling and translations of up to 20% of the original image size. Then they also randomly adjust the exposure and saturation of the image by a factor of 1.5 in the HSV color space. So that said we are going to break up our data augmentation strategies into two main categories. The very first category will entail modifying the pixel values without modifying the positions of the different objects. So we could have something like this. Let's click on edit right here and then let's try to say brighten up the image. You see we could modify this like this. So we go from this initial image to this by playing around the brightness, playing around with colorization and so on and so forth. So this first category as we've said already entails just modifying the different pixel values without any changing position of any object we have here. And then for the second category we could go from this image to this one where you see that this flipping has made this object position to go from here to this position right here. And now in the first case where we just modify for example the image brightness there is little or no updates made to our existing code base. But when we have to modify the image such that the bounding boxes have to be changed or the positions of the bounding boxes have to be changed like this one here or this one which will go from here to here, this one here which goes from here to this other one. It means that we are now updating this bounding boxes and so we would have to write some extra code for all these different modifications. Now nonetheless it turns out that when we work with a library like albumentations, let's take this off, when we work with albumentations all changes made in the positions of the bounding boxes are carried out automatically. So you could see here we have this input image with this dog, this tennis ball, and this cat. And then after going through some transformation like here we see we have some transformation on this image where first of all the image is flipped so you see the dog moves to this other position and then the image also appears zoomed in. So you see that this cat for example is now not as complete or we don't have the complete cat as we had in this original image. And so now we have completely different bounding boxes. This one for example becomes this, tennis ball becomes this, this dog's bounding box becomes this. So you can see that it becomes larger compared to the input. And so with albumentations as we're saying you have this input bounding boxes like you could see for the dog at this position 23, 74, and 295, 388 is automatically converted to 149, 69, I could see here 295, 381. So here you just need to define your transformations and then albumentation make sure you have the right bounding boxes as output. So now diving into the code as you might have seen in some previous sessions we're going to import albumentations. So that's the import of albumentation. We now move on to integrate our transform. You see right here we have the different transforms. The first thing we'll do is to resize our images to 24 by 224 and then we'll apply a random crop. Now this random crop is applied such that the output image will have a height or rather a width line between 200 and 224 and a height line between 200 and 224. So we could just have this height or let's say height minus 20 or we could just say 0.9% of the height. So we have that we go from that to and then here we also have 90% of the width. So we have 0.9 of the width. Take that off and there we go. Here we have the width. Okay, so what we're seeing here is as we've had already, we want to randomly crop the image. So we have the image now our new image will have a width, which will fall in this range and the height which will fall in this range. And then we want to have a probability of 0.5 of applying this transformation. So if you want this transformation to always be applied, then you could always set always apply to true. If not, you just have P or the probability of applying to set to 0.5 or maybe even 0.2 or say 0.8. It just depends on you. So that's it. Now for the next we have this random scaling. Here we specify the scaling limit. We have the interpolation type and then again we have this probability. Then here we have the horizontal flip. So we're going to apply horizontal flip and probability set to 0.5. Now finally, because after doing this random crop, we'll have an image which is not 224 by 224. We actually resize this back to 224 by 224. So that's it. That's all you need. Those are all our transformations here, which we pass in this compose method. So it's basically a list made of this transform, this transform, this transform, this and this. Now one additional term we're going to pass in this compose method is this bounding box parameters. And the reason why we need to pass this is simply because unlike with the image classification with the object detection, we have bounding boxes which are going to be modified. So here we specify this bounding box parameters to take into consideration the kinds of boxes we're dealing with. And here you notice that we specify a format YOLO. Now getting back to the documentation, we actually have three formats. We have the Pascal VOC format, our implementation format, COCO format, YOLO format, Pascal VOC, our implementations. Now it turns out that in our specific case, we're actually dealing with the YOLO format. Not just because we're building a YOLO model, but because the way we've normalized our inputs or process our inputs is such that we have our X center, Y center, width and height representing bounding boxes. So if we had instead X mean, Y mean, width, height would have picked this. So it's not, it's not, it's not because of the name, although it actually coincides with the fact that we're building a YOLO model. But then as we said, we have an X center, Y center, width, height. And again, this is normalized. Notice here it's normalized. So this here is divided by the width, this divided by the height, this is divided by the width, and then this divided by the height. Remember, this is the width and the height of the specific bounding box, which happens to be exactly what we have seen when we're doing this here. Remember we took the X mean, Y mean, we obtained the X center and then we divided by the width to the Y mean, Y max, obtain the Y center divided by the height to the width divided by the total width to the height divided by the total height. So is the YOLO format we actually using right here. Now again, gets into the code, you see, we have the format YOLO specified. Now here we have this mean area set to 25 and this mean visibility set to 0.1. Now to understand the concept of the mean area, consider you have this input right here. And then after carrying out the transforms, what you have is say this in this output here. So we have this output where the area of the, this box is 4,344 pixels. This is actually the COCO format. So you will find that the bounding boxes will be different from the kind of bounding boxes we have. Nonetheless, the area as we said, after transformation is 4,344 pixels. So clearly it's quite small compared to the 23,892 pixels we had already from this 132 times 181 computation we had here. So after transforming this, we obtain this right here. But if we set the mean area to 4,500, it means that any box less than 4,500 is going to be omitted. And so that's why you see that when we specify this mean area to be 4,500, this box here disappears. So if you don't set anything, the box remains. But if you set this, then the box is going to disappear. And then we also have the mean visibility. So let's take this here. If we set the mean visibility to say 0.3, then if the output box, which is this box divided by the initial box, which is this is less than or gives us a ratio less than 0.3, it means that box is going to be omitted. And so right here, if you take this, this area, which is 6,888 divided by 24,108 from this original box here, 24,108, you would have 0.286, which is less than 0.3. And so when you see, you see when you say 0.3, this box disappears. So that's it. That's the idea behind the mean area and the mean visibility. Now we can actually leave those out. So let's just take this off. And that should be fine. So this is it. We have our transforms. We go around this. There we go. We have our org album and method, which takes in the image takes in the bounding boxes. And then all it does is it passes the image and the bounding boxes into our transforms here. And then we obtain the transformed image and the transformed bounding boxes. So as we are seeing in the schematic here, we go from this image bounding box pair to this transformed image, transformed bounding box pair. So let's get back to the code. We could run this. And then we have our process data method, which makes use of this TensorFlow NumPy function, because actually here, those are known TensorFlow operators, which we are calling, especially here when you use make use of implementations, it's made of computations in NumPy. So we make use of this method right here in order to integrate that in our data pipeline. So yeah, we specify the function or album and the inputs image and bounding boxes, the output tensors. Here, this one's this two year floats, actually float 32. So that's it. So we run this and we create our train data set. So we could visualize that. You see, for example, here, we have this image, let's write that image out so we could see it, we had output one and output two, let's check that out. See here, we have this, oh, no modification was made. So in order to be sure that we make OC some change, what we could do is we could make sure for example, let's let's comment this random scaling. And then let's carry out let's make sure the flipping is always done. So we have always true set, always apply set to true. There we go. So we run that run that again. And we have output one and then output two, which hasn't flipped. Now, one thing you would notice is also the fact that this bounding boxes are changed. So let's copy this from here. And then let's pass it out just here. So we could see that. Okay, so this is what we have before. And this is what we have after. Now you see that this actually makes sense because when you have this original image, when you have this original image where you have something like this for the bounding box, oops, where's all of that, we have something like this. And then when you flip it, you see when you flip it, what actually changes here will only be this X center. So if your center was around this, let's say centers around this, then flipping the X center changes position slightly. And that's what we will notice here, you notice how this X center changes position just a little bit. But for the Y center, it doesn't really change the width and the height remains the same. Now let's play around with this year. So let's have this random crop now. Let's do this. Let's set this to always apply. Let's set always apply to true. So we'll have that. Oops, we would have always apply. We set it to true. Run this and check out our output. So yeah, we would have, okay, we should have, let's run this year to have out two. So we'll have out one and then we have out two. Okay. Now what we have is this and then this, you see appears somehow zoomed in. Anyways, this example now shows us that we have, let's take this off. We have this image which has this bottom box here, something like this and this, and maybe the center around here. And then now it's modified and you see the height gets modified and although not too much, actually the width doesn't change. Well change is just a little bit. Now let's change this example so that we could see this clearly. This example isn't very demonstrative of this transformation process. So we'll take maybe the second. What is it? Do a skip. So here we have skip. Yeah, then we break. Hopefully the second has maybe many more objects or it's a better example actually. Let's check this out. Okay. So after flipping, we expect to have something which is looking different from this. Um, okay. So that's it. Let's take this off for work with this example. Now let's run this again. We have this year around this. There we go. We have that. We have our output which we obtained from skipping and then we get back here and then we also skip. So we skip and there we go too. And then from here we break and run that and then see what we get. Now you could see from here we have our output one. Let's also check out our output two. There we go. We have our one and then our two exactly as we expect. So that now this, this example is much different from what we had before and should be a better way to demonstrate what goes on in our limitations. So right here we have this input, um, boxes. Let's copy that. We have the input boxes. We're going to paste it just here. Our input boxes. Let's take that off. There we go. We have this output boxes. Copy that and then paste right here. Now before we move on, notice the fact that those classes are the same. So class 18 and here's class 18. And if we get right to the top, you, well, we use classes. So let's, let's say classes. Um, let's get 18 and see what we get. It should be trained. Um, from here we have, well, yeah, we have classes, classes 18. There we go. We see we have trained. Okay. So that is it. Let's take this off. Now let's check this out. We have our art one. You can see from here, it makes sense that the center is about, um, 27% of the full width. So this distance is about 27% of all this distance. So that's it. Then this distance too, it's about 36% of all this distance. And then we have the width, the width, which is about 0.54 or 54% of the total image width. Let's change the color. This is about 54% of the total width. And then this is about 71% of the total height. Now after flipping, you see that this has to change. Let's drag this now, drag this this way. And we have someone like this. Let's take this off. You would see that this now the center of this distance here, this distance from year to year is about 73% of the full image. See that? Excenter changes. And then the photo for the height, it doesn't really change much or changes very little. See that almost the same. The width remains practically the same. And then the height remains also almost the same. But at least the idea here is to show that after going through this augmentations transforms, augmentations permits us to obtain this output bound in boxes, which match up with a transformed image. Now don't forget to make sure you change this back from always apply to probabilities of 0.5. So that's it. The next set of transformations we'll make will be with TensorFlow. And we'll make use of TensorFlow image. So here we have this random brightness, random contrast. We're going to leave out the random crop for obvious reasons. Remember, if you had to do this random crop, it means you would have to write the code which permits you also modify the bound in boxes because when you carry out a random crop, the bound in boxes actually change. So that's why we're making use of augmentation since it makes life, it makes life much more easy. And then we use this, we use this, we carry out, no, we're not carrying this out. We carry out, no, not this, we carry out random U random saturation. Okay, so this we're going to make use of, you can see that here in the code, we have brightness, saturation, contrast U. And then we finally carry out this clip in by value to make sure all the values lie between zero and 255. Now you could always feel free to comment on comment any one of this right here. So that said, we again going to carry out this pre processing. See, we have that. And then remember, this is for the training, and then this is for the validation. So that's it, we carry out this mapping, we batch, prefetch, and then you could check out your outputs right here. So let's say out one, out two, let's check out out three. Here we go, we have out three. Well, this should be let's go to skip, let's keep this skip to and break. Okay, so let's run that again and see what we get out one out two and out three out one out two and out three. Well, since this was already bashed, we will maintain the tick. So we'll take the first we'll maintain this and then you would pick out on one, run that and there we go. We instead have this so this should be two. Let's run that. We have out one out two and then out three out one out two and out three. Okay, so you see that this now appears much darker as compared to this one. So that's it. We have seen how to carry out this different transformations or augmentations. But before we go on with the training again, one slight modification will make is we'll replace this ResNet 50 with the EfficientNet B1. Then we'll go ahead and compile the model and restart the training. There we go, training has begun. And after training for several epochs, here's what we obtained. You can see here that the training loss and the validation loss both keep dropping. See that they all keep dropping up to where we have this 123 around this year. So up to around this, our loss keeps dropping and then somehow increases slightly and stabilizes around 128. So that's why at this point, we have to stop the training and get the weight which produced the lowest validation loss. And that's it for this section. In the next section, we are going to test out this model. Hi there and welcome to this new session in which we are going to dive into testing out our YOLO model which we trained in the previous section. And to carry out this testing, we are going to make use of the COCO dataset. And so now we'll pick around some images from this dataset and test them out with our model. Now the first thing we'll do is load the model. So there we go. We load the model. We're going to create this outputs directory and then we'll specify the path to our test images. Now let's dive into this test method. Here we're going to take the file path, test path. And then we have this image on which we are going to put the bound and boxes and the classes. Then given that we are not, or we did not use OpenCV to load the image previously, we are going to go with the exact same process we had already. That is we read the file, we decode and then we resize. So this is what we had. And once we have this image, we pass this into our model. Now the output of our model will be something like this. So we'll have this 7 by 7 by 30 tensor. Now remember that for a cell like this one where there is no object, we suppose that this person here is an object. A cell like this one where there's no object will have a zero for the other first position. And then for the next position, we will have this for bound and box positions or bound and box values. And then we'll have another zero and then we'll have four and then we'll have now the 20 values for the class. Now, given that there is no object here, all this wouldn't really matter. And so that's why we are only going to take the boxes where this two values, this value and this value is going to be greater than or equal to the threshold, which will define to be 0.25. So that said, a cell like this one, which is a center of our object, remember we have this object here and we have this center. So you would have this cell here in the center, meaning that if we take this, let's take its values. If we take this, you would have our let's say 0.75. And then you have four values for presenting this position or its bound and box. And then we'll have maybe another say 0.9. Then we have four values and then we have different values here for the class. And so the idea here is to get all the different positions where we have values greater than 0.25. Now, you should note that we could pick a threshold of say 0.5 or 0.7 or 0.2 as we have done. And it really depends on how this threshold affects the model performance. So we picked 0.25 because it performs better than picking 0.5. As with 0.5, many objects were missed out. So that is it. We move to the next. We simply just gather all these different outputs. That is we have the object positions from here. And then to obtain this different outputs here, we'll take the output itself, which is all this. And then based off the positions, we'll get this output. So if we do here, from here we print out the object positions and then below we print out the selected output. Here we have this exception. Okay, so let's run this. You find that for this image, for example, we have these positions. That is, let's get back to this. So it's telling us that this is under position 4, 3. We have an object. So we go 0, 1, 2, 3, 4, and then 0, 1, 2, 3. So an object is found here for this image we have. Now the reason why we have this duplicate is simply because it happens that for this first position, that's for this first score, there's an object. And for this other score, there's also an object. You could see that from here that we actually compared the 0 position and this feed position, which is this 0.75 and this 0.9 respectively. And now we could take a closer look at this selected output. You see that this is 0.96. So here is 0.96. Well it's for this position, so we could just take all this off. So if we take all this off, all this here off, this is dirty. You find that this, which is 4, 3, the object for this specific image is for 4 at this cell 4, 3, and outputs this year. And you see the first position 0.96. Then we have the 4 for the bounding box. And then the next 0.98. So it shows clearly that the model is sure that there's an object there. And then from there we have this 4 again. And then now we followed with this 20 different classes. Now just by looking at this, just by looking at this here, 0.3, 0.3, try to look for the one with the highest value. Okay. It shows clearly that this is the class with the highest value. And so from this we know that there's an object at this position and that object belongs to this class. And obviously we have the bounding box surrounding the object. So now that we've had this different values right here, the next thing to do would be to convert this bounding boxes as this into the X mean, Y mean, X max, Y max format, which we are then going to use OpenCV to draw this bounding boxes on the image. Now we are going to go through each and every object position, which we've had already from here. You can see here, we have this object positions, 043, 043. Well, it's a duplicate. So let's focus on just a single or this single one. So we have 043. That's essentially the position for three as we've seen already. And to obtain the output box, which is this value, this value and this value, what we'll do is we have the output and then we'll say output position zero position is from the object positions. Remember the object position in this case is 043. So when you say position zero, you're taking zero. So here you have zero, position one is four, position two is three. And that's how you select this specific output here. Now, once you select the specific output, the next selection you want to make is that of the bounding boxes. Now when J is equal to zero, here you have zero times five is zero. So you go from one right up to zero plus five, that's five. So we go from one up to five, obviously one up to five minus one. So we have one. So we see this position one year, position two, position three, and then position four. So that's how we select this year from our output. And then notice that given that we have two different bounding box predictions, you have this year. So for the first one, you have one to five. And then for the next time we get into this loop, we have, since this is one, we'll have one times five, which is five, five plus one is six. So we go from six to 10, which now is this year, this is six, seven, eight, nine, well, six, seven, eight, nine, okay, yeah, six, seven, eight, nine, go again from six to 10 minus one. So that's it. So this is how we obtain the output boxes. That's how we obtain this year, this bounding boxes. And then given that, as we said already, we need to convert this into this X min Y min X max Y max format. The first thing we'll do is convert it into the X center Y center format or X center Y center with height format, which is what we do here. Now to obtain the X center from this year, from this 0.53, for example, and so let's suppose for example, that this 0.53 is at the position 043 as this, well it's 43. So we go 0, 1, 2, 3, 4, 0, 1, 2, 3. So we have this year around the center or 0.53, 0.17, so it's around here, we have this. And the idea is to obtain its value with respect to this full image height and image width. So first things first, we know that the distance from year to this position year is simply four divided by seven times 224 times 224. And that's simply because all this is 1, 2, 3, 4, 5, 6, 7. So because the full image width is 224, it means getting right up to this position year is 4 and 7 times 224, which is in fact 4 times 32 because 224 by 7 is 32. So you take this year and multiply by 32 and you get this distance from year right up to this year. Now to account for the fact that we have this 0.53 year, 0.53, we'll take 0.53 times 32 because this full cell is 32. So 0.53 times 32 plus 4 times 32. So to obtain this distance, we have 4 times 32 plus 0.53 times 32. Now for the height, because for the Y center, because this is X center, we'll still go 0, 1, this is 1, 2, 3, so we still have this year. Again divided by 7 times 224, this is going to be 3 times 32 plus 0.17 times 32. So that is it. To obtain the Y center, we have this position, that's 3 times 32 plus 0.17 times 32 to find this distance year. So that's exactly what we do right here. You will notice we have this post one. This post one is actually from year. This is post. So post one is 4, which is multiplied by 32 because this is this post one plus this output zero times all of this times 32. So you have this times 32 plus this output box zero. Output box zero is 0.53, 0.53 times 32. So all this is just like saying we want to have 4 plus 0.53 and then all of this times 32. So that's what we do here. For the Y center, it's the same. We have post two. Now this is year three times 32 plus output box one. Output box one is this 0.17 times 32. Remember, we got output box from year and it coincides with 0.17. So that's it. We obtain X center and we obtain Y center. And the next thing we want to do is obtain the width and the height. For the width and the height, it's going to be easier because when encoding this, we simply divided by the complete width and the complete height. So now we'll simply multiply by the height and then multiply by the width to obtain the width and the height. So that's how we obtain this. From here, we could now leave from X center, Y center, X width, Y width to X min, Y min, X max, Y max. Now if we have a bounding box like this, let's say we have the center and we know the width and the height. To obtain the X min, we could simply take this center minus half of the width because this distance, let's say this is the origin, this distance here, here is X center. If we subtract half of this width, then we would have this distance which will take us to the X min. And then we'll do the same for the Y. That is we take this distance right up to the center and then we subtract the Y, the Y height, that's the height. We subtract half of the height, not the height, but half of the height. Then we'll get to this position. That's why I mean, that's what we do here. We have X center minus half of the width. And then we have Y center minus half of the height. And then for the max, we have X center plus half of the width. So if we want to get this position here, we'll take this plus half of this width, which would take us to this point here. And then if we want to have this for the Y, then we'll take this distance, this distance plus half of the height, which will add up to this point right here. So we have X min, Y min, X max, Y max. Now be careful. In case where the X min happens to be less than zero, we want to fix this to zero. If the Y min is less than zero, we fix that to zero. So we don't have negative values. If this is greater than the width, we fix that to the width. If the Y max is greater than the height, we fix that to the height. And so once we have this now, we obtain our final boxes. That's X min, Y min, X max, Y max. And then not to forget the fact that we have some classes. So we are going to simply get the class with the highest probability score. So we just do this arc max and we make use of the selected output. Remember the selected output is what we had already seen here. So based on this, we are going to take the last 20 values. This year we get the arc max, which happens to be this year. And then we get a class which corresponds to this position. So that is essentially what we do right here. We have that position and then we have its corresponding class. And we make sure that this is string. And then we add that to our final box. So that's it for our final box. We also need our final scores. We will understand why we need this final scores later. For now, just take all the final scores. We make sure that we have again, you see, if J is equal to zero, then here we have zero. So we have the selected output, I, and we pick zero, meaning that we pick in this. And then if J is equal, if J is equal to one, then here we would have one times five, that's five. So that's a fit position, which is going to be this probability score. So essentially we're getting the probability scores for the two, um, predictions. Remember we had actually, we actually have two predictions. So we get the, uh, probability scores and then we, um, in them out. So we see what this looked like. There we go. As you could see, we have 0.965, which makes sense. This is 0.98. Uh, yeah, we have 0.965 and we have 0.985. Well, this is because there are some duplicates here. So that's it. Then we also see the final boxes. You see the class person, see person, person, person. Now, uh, we're going to see how to eliminate this duplicate shortly. And, uh, that's it. So for now we have understood how we could get from this models outputs to then be able to obtain this final boxes and the final scores. And now the next step will be to get into this norm max suppression. So we have, uh, maybe seen already the, we looked at the norm max suppression already in theory. Now we'll see that with tensor flows actually very easy to implement this, but before implementing, let's, um, take a look at what it's all about. Let's suppose we have an image like this and then we have, um, this object. Let's say we have this object here and then we have some bound in box. We have this bound in box. Remember for each, um, cell, we have two predictions. So let's suppose that our cell predicts this and that same cell predicts again, another bound in box like this, all this for this same object. Now what we'll do is with a norm max suppression algorithm, we are going to compare these two probabilities and say, okay, which one has the highest probability? If it turns out that is this one with the highest probability. So let's say if this is 0.98, oops, is let's say if this is 0.98 and then this one year is 0.96 and then these two are predicting the same object, then we are going to discard this box. So hence the term known max suppression. So we'll suppress this box and we'll be left only with this. So that's how we are going to also, um, discard those duplications. Now, um, going back to the implementation, all we need here is just this, um, nonmax suppression method we have here. So we have this no max suppression from tensorflow image. We specify the boxes. So we have this boxes here. Now note that our boxes from year included the classes, but we do not need that year. So we just, as you see, we pick the first four elements as essentially X min, Y min, X max, Y max. We pick this first four boxes. Then we also make sure we send in the scores. Remember in the normal suppression algorithm, we need the scores to be able to discard certain boxes which have, uh, which are not the max, uh, scores, which we do not have the max scores and which are predicting, uh, an object, which has already been predicted by another box of higher score. So that's why we need to pass in the score year. So, um, essentially we pass in the boxes passing the scores. We want to specify the total, the maximum, um, output size. Yeah. We just pick a hundred. We, we don't expect to have more than a hundred, but depending on your, on your task, like you could have a task where you generally have maybe say 150 objects to be detected at once. In that case, then you will need to increase this max output size to maybe say a thousand. Now we have this IOU threshold right here to understand this concept of the IOU threshold. Let's take back our example we had here. If we have this year, if you have this example, in order for those algorithm to know that these two boxes are trying to predict the same object and we want to actually discard this one, what we'll make use of is this IOU, um, threshold. So remember we have seen the IOU already. So if you have two boxes like this, these two boxes will compute the IOU score. That's essentially we'll look for the intersection between these two boxes, which is this area, and then divided by this total area occupied by these two boxes. So in this case, it's all this area right here. So let's, let's, let's have it back. We have this year is the intersection and then this year totally is the union. So we take that intersection divided by the union to obtain the IOU score. Now, if that IOU score is greater than the IOU threshold, like in this case, let's make, let's specify an IOU threshold of 0.5. It is greater than 0.5. Then we are going to discard this box. So we're going to discard this if that IOU score happens to be greater than 0.5. Now, if it is less than 0.5, many of that, if we have a box like this, um, let's say we have a box like this where this area, this area here is divided by all this area is less than 0.5. Then we are not going to discard this box. So we consider that this box is trying to is, is for a different object and not this other object. So this IOU threshold year permits also determine whether two boxes are trying to predict, um, the same object or not essentially. So that's it. And then here we have this call threshold, which is set to negative infinity. Now the documentation is said that the score threshold actually is already float tensor, representing the threshold for deciding when to remove boxes based on score. So if you have a score threshold of 0.4, for example, one, then what you're seeing is all boxes, which are, uh, which have a score of less than 0.1 are going to be discarded straightaway, mindless of, or regardless of whether, um, they overlap with a box of higher score or not. So that's it. We get back here and then now we could print out, let's print out, um, our norm max suppression output output. Let's run that. Now, one thing you'll notice here in this output is the fact that we have a single element. Now what the single element actually means is between all this four options we have, that is here, we have this person, we have this person, we have this person and this person only this one year at this position one is going to be left. All the rest will be discarded. And to understand why they're discarded, you could look at this course. This is 0.96. This is 0.85. This is 0.96. This is 0.98. Oh, this is 0.985, not 85. So what we're saying here is because this one has the highest probability and because it overlaps with the others, like you see here, uh, this one and this one will overlap because it's actually represent the same person. Then, um, this others will be discarded. Now in the case like this, where we have this exact same box with exact same probability, one is going to be left out and the other one left. So, um, that's it. We have our output now. We know that we only have a single box instead of all this four boxes. So we have our normal expression output. The next step will be to, um, show visually what our, um, predictions look like. Now, yeah, you know, the fact that we are going to write this in this image. So we will draw the bounding boxes and put the text on our image only for I in, uh, no max operation output. So in this example, we're going to do that only once, unlike the case where if we do not have no max operation, I would have had to do that four times. So now we're doing this only once because after no max operation, we're left with only a single box. Now take a look at what we have here. We have the X mean, we have the X max. Remember from the final boxes here, what we had here, X mean, um, Y mean, like you see, we have the X mean, we have, um, the Y mean, sorry, not the X max. We have the X max, we have the Y max, and then we have the color for the box and that's it. So that's it. We now put the text. Now we're putting this text based on, um, certain position. So, um, the text itself is going to contain the class from the final box. You see, we take this last element. Remember from here, we had the class and then, uh, once we obtain that class, we write that as the text we're going to put to the text and then the position of that text will be based on the X mean Y mean values. So you see, we go, we go to X mean, but for Y mean we step 15 pixels, um, downwards. So that's it. We define the font and the color and that's it. So we've put out this text and then now we're ready to write this out in our, um, new image. So we create this new image and then we resize it. Obviously the content of this image is this year, this image, which on which we have written the, or we on which we've drawn the rectangle or that's the bounding boxes and the texts. So let's run this now completely. And then you see again, we have one open this up. Okay. So we should be able to have our output and there we go. You see, we have a person notice from your, um, from your, we had decided to go 15 steps. Oops. Let's take this off. We had decided to go 15 steps. Let's go downward here. We decided to go 15 step. That way you see the text comes slightly down. So if you, if you, if you don't have this and then you do this, see it goes up and it's not very visible. So let's have that back. You could obviously change the color. So let's say the 225 and you could play around with all those different parameters. So that's it. Well, let's get back to the color because it's actually better. Let's say we want to have one. So now we could run this for all the different files and see what we get. Well, before you've been checking on that, let's suppose that we do not have this normal expression output. So let's, um, leave out the normal expression algorithm and see what our outputs will look like. Let's have your, uh, for I in range, the length of the final boxes, the final boxes here had a length of four. We had four outputs. Um, let's take this off. There we go. And then run this again and see what we have. So we have outputs. Oops. Well, we already had several different predictions. Um, like, okay, let's, let's take this one. For example, you see here, we have this one. Uh, no, let's take, let's, let's not have that. Let's say we want to have your 40, let's get back to 40, um, 40. Okay. So one thing you can notice here is the fact that this, you see, we have this two predictions for this person and, uh, that's not what we want. So you see that the fact that we add this in a, uh, not my suppression year permits us to remove some of the boxes. So this threshold is what you play around with to ensure that you have a single box for a single object. So yeah, you can see that this model does well and predicts in the location of this train and knowing that it is actually a train. We have this year predicts that this is a person. We have this produce this person, but unfortunately doesn't get this other people year. Um, guess they're playing this person, this person well doesn't get this person, um, does quite well here. See, this is the dining table. Um, this person, this person and this person. So that was great. Um, yeah, we have the TV monitor, um, though it doesn't get the other monitors. Okay. So from here, we also have this bus. Unfortunately, yeah, we, we, it predicts two buses, um, a car also predicts two cars, first, maybe due to this auto car being here. Then we have this dining table and this person. We have this person and a dog. We have this person, this person, well, stills a dog here, but that's not right. Um, person, person, but doesn't see this other person, um, sees this two people here. Um, here says this person's though the bottom box isn't, um, quite well put out. Then we have the cut. We have this person. Um, we have this car and then we have this person doesn't see the dog. We have here sees this cows. Well, this, this particular image was gotten from the paper. So just basically crop this from the paper to test it out. So it sees a person, but doesn't see that this to a dog store actually locates them quite well. So that's it. Um, yeah, it sees a cat, but doesn't sit as dog. Um, she's a person. She's a cow. She's the person person. She's two people sees this person. And this person though the bottom boxes aren't quite well put out. Unfortunately I see cars, um, not quite correct. Then here we have motorbike and person, but this should be two motorbikes actually. Okay. So that's it. We've just tested out our model. We'll see how, um, it does or how it works with our images on which it has never actually seen. Hello everyone and welcome to this new and exciting session in which we shall dive into image generation. So as you could see right here, this image was AI generated, but back in 2014, 2015, 2016, images like this weren't yet to be generated using AI with advances in AI, like the original auto encoders, the GANS and even more sophisticated GANS like the W GANS, pro GANS, SR GANS, and cycle GANS. AI algorithms have been trained to produce high quality images. And today we even get much better results with a diffusion models. That said, in this section, we shall treat the variational auto encoders and the DC GANS. And so at the end of the section, you should be able to produce images like this. Back in 2014, one of the best performing image generation models was this model you have right in front of you. That is the variational auto encoder. And the way this model worked was quite simple. We had an encoder block, which took in some input image and then generated this embeddings. And now this embeddings having encoded information about the inputs could be used by the decoder to generate output images. Nonetheless, by 2014, Ian Goodfellow came up with this idea of the GANS. And the GANS signifies generative adversarial neural networks. So here we have two neural networks here, the generator G and the discriminator, where this G and this D that's the discriminator are both in some context where the generator is learning how to produce images which look like those from the real data set or the training set. And on the other hand, the discriminator is learning how to differentiate between real data like this one and fake data produced by the generator. If we consider this simple example here, you can see that we pass in some input noise, we get this output. And because this output doesn't look like the real data, the discriminator considers this as fake. Whereas now for this other example, the output from the generator looks like the real data. And so the discriminator sees this or the discriminator is tricked by the generator to think that this is real data. So after updating the parameters of the generator and discriminator, such that we get to that point where the discriminator no longer knows the difference between what is coming from the generator and what's coming from the training set, we now have this generator block, which is able to take in random noise and generate outputs, which are similar to those from our training set. And although this architecture was groundbreaking in 2014, 2015, today we have more advanced better models like the StyleGAN. Hi there and welcome to this new session in which we shall be trading the variational encoder. And we shall see how it could be used in image generation. In this first part, we shall dive deep into understanding the theory behind the variational encoder. And then the subsequent sections, we shall practically implement a working variational autoencoder. That said, we shall start with explaining or understanding this autoencoder and we'll make use of this blog post by Jeremy Jordan. Now to understand the autoencoder, we can break this word into two parts. That's auto and we have encode. So essentially we have a system which self encodes itself. Now, supposing you have an image like this one right here. When we pass this into some encoder block, let's have something like this. We have some encoder block and then we could obtain this output vector. Now this output vector is six dimensional. So yes, six different positions and each of the positions represent a specific characteristic of this image. You could see your smile 0.99, skin tone 0.85, gender negative 0.73, beard 0.85, glasses 0.002, hair color 0.68. So all these six values here, the six values we have here are characterizing our image. So they encode information or information about this image is encoded in this vector, or this vector right here. And then on the other hand, when we want to retrieve this encoded information, what we could now do is we get a decoder which takes this encoded information and then reproduces this original image. And so that's globally how we produce this kind of system which could be used in image compression where we could take this image, encode it so that we have just this vector. Then we could pass this vector via some network. And then on the other side of the network, we decode this vector such that we have the original image. Apart from compression, another field where we could apply this kind of auto encoder network is in image search. So let's suppose that we have this image right here and we have this vector. Now if we have another image of this same person here, so we have another image of the same person. We call this image B and here we have image A which produces a vector which we'll call VA. Then it means that in this 6D vector space, six because we have six different values here, it could be 128D or whatever dimension we make it to be. So as we're saying, we have this image B which is the same image here, the image of this person but not necessarily the same image, maybe some other image of the same person. Then after encoding, after passing through an encoder, you have our 6D vector VB. But because it's a similar person or because it's the same person, we would expect these values to be similar. And so VA will be close to VB. And if we have another person, let's say another person C, this is here and we generate that person's VC, that's this encoded vector, then we would expect VA to be much different from VC. And so this means that in an image search scenario, we'll just pass this input, we obtain this vector and then we'll compare the two vectors to see whether it belongs to the same person or not. Now it should also be noted that when training an autoencoder model where we have an image A and we have a reconstructed image A', then our aim here would be to minimize the difference between A and A'. So we could minimize A-A'. Now it turns out that in image generation to get better results, instead of dealing with discrete values like what we had here, let's get back to the top. You see here we had the given value, let's take this off, a given value for smile, for skin tone and so on and so forth. So as we're saying, instead of having a fixed value for each and every one of these features, what we'll do is we'll make use of a probability distribution. So here, instead of having a value, let's say this is negative 0.6, we would have a probability distribution whose mean is at negative 0.6. Well, this looks more like negative 0.5. But here we suppose now we're going from 0.6 to this probability distribution which means a negative 0.6 with a given variance. Now for those of you who don't have a map background, what this essential means is instead of picking a value or picking the value negative 0.6, what we'll do is we'll pick some random value within this range. So instead of having negative 0.6, as we said, we're going to have a random value in this range and values closest to negative 0.6 have a higher probability of being picked. So instead of having this, we could now have negative 0.3 or negative 0.55 or negative 0.75 and so on and so forth. So we have values which we can pick in this range. Now if we see this example here where we have 0 now turned to this probability distribution, we pick values in this range, negative 1 to 1. So here the variance or the range of values which are in which we are allowed to pick a value is larger than this other one. But still values around this zero, that's the mean, around this zero have a higher probability of being picked. So here you would have a higher chance of picking 0.1 instead of picking 0.9. To see that clearly here, let's say this is 0.1 at this map and then this is 0.9 around here. You'll find that if you link this up here, this is 0.1, you see that this has a higher score and a higher chance of being picked as compared to this one which has much lower chance of being picked. So in a nutshell, instead of having this 0.5, we now have a mean value which is 0.5 and a variance which shows us or better still gives us the range of values for which we can pick the specific value for a given feature. And so as you could see, this one here has a smaller variance as compared to this and as compared to this one. And from this point, we'll define this mean as mu and the variance which is some distance from here, this distance as sigma square. This probabilistic approach to generating a latent vector which previously was this vector we had here. Scroll back up. Previously, it was this vector. It's now what leads us to the variational autoencoder. So you see that here we have our input image. It gets into the encoder which produces mu and sigma square or let's just say sigma. Sigma is a standard deviation, sigma square is a variance. So it produces mu and sigma. And then using mu and sigma with our decoder, we are able to obtain our reconstructed output image. Now note that in this case, we would have 1, 2, 3, 4, 5, 6 positions. So mu would be this 6D vector, sigma would be another 6D vector where mu1, this first position here, mu1 and sigma1 represent the mean and the standard deviation for this distribution. Now it should be noted that the main benefit of a variational autoencoder is that they are capable of learning smooth latent state representations of the input data. Now to better understand that statement, let's consider this output generated by an autoencoder and this other output generated by a variational autoencoder. You will notice that as we go from one digit to another, like let's say we're going from six to eight year, you see here we have six. Well, it looks very well like six. Let's take this one which looks already very well like six. This is six but here it's really confusing because we don't know exactly what this is. Now this looks more like eight but not really very clear and then here we start getting eight and eight and well this too doesn't look very clear. But when you look at the output generated by the variational encoder or the variational autoencoder, as we go from one digit to another, we can see here that we have an even much smoother transition. And this is thanks to the fact that instead of working with discrete values at the level of our latent vectors, we're going for a probabilistic approach with the variational autoencoder. Because we're going in for this probabilistic approach, the training of our variational autoencoder is no longer evident. And this is simply because during the training, we need to compute partial derivatives with respect to z here with respect to mu and partial derivative with respect to mu and with respect to sigma. But because the z we have here is drawn from a normal distribution, with mean mu and standard deviation sigma, we won't be able to compute this partial derivative. And so the idea now will be to convert this node that's here to one that is deterministic. You could see here we have this key random node and then deterministic nodes. So this one is deterministic, that's fine, this one fine. Now, well, this is fine. Now the idea will be to convert this into one which is deterministic such that we could compute this partial derivatives and hence train the model such that we could update the encoder and the decoder parameters. And so now this idea of converting this node from one which is random to one which is deterministic is known as the reparameterization trick. So instead of having this where we have, well, let's take this off, let's make it simple. So we have this where we have the mean mu and the standard deviation. Well, we'll pick any value at random in this range. We are instead going to define this epsilon which is drawn from a normal distribution with mean zero. So our mean now will always be zero and then the standard deviation will be one. So we have negative one, one. So this epsilon here as we've said is drawn from this probability distribution and then to obtain z, unlike here where we obtain z randomly from values surrounding the mean, here we'll do the mean plus the standard deviation times a random value which lies between or which surrounds zero. And so now we could compute this partial derivative here, this respect to z, and hence train our variational autoencoder model. The next and final point we'll look at in the section is the variational autoencoder slots. Now from the autoencoder or the variational autoencoder paper, the authors break up this loss into two main parts. The first part, let's take this off, the first part is the reconstruction loss and this other part acts as a regularizer. For the reconstruction loss, we try to minimize the difference between x and x prime or x-shapo. So we want that the input and the reconstructed input or the reconstructed output should be similar. Here is denoted as this, we're trying to minimize this and then for the reconstruction loss, we're computing the KL divergence between this distribution and this other distribution. Now to understand what these distributions actually signify, we can take a look at this figure. So here we have this KL or this distribution KL of z given x which happens to be a learned distribution meaning that when we'll be training this encoder model right here, this is our encoder model, we'll be training this encoder model, we shall in fact be getting this distribution which as we said already is a learned distribution. Nonetheless, we do not want this learned distribution to be very much different from the distribution P of z and so that's why we are going to minimize the distance between this distribution and this all this learned distribution and the true prior distribution P of z. Now it should be noted that this KL divergence here is a tool which permits us measure the distance between two distributions and so if we could minimize this, if we could minimize this, then we'll reduce the distance between this distribution and this distribution P of z and getting back to the original paper, it should be noted that the reconstruction loss can be taken as the mean square error while this here will be our regularizer and so this is what we obtain after computing the KL divergence between those two distributions. Hello everyone and welcome to this new section in which we are going to be building our own variational autoencoder models from scratch. Previously, we saw how variational autoencoders could be used in helping to generate new images where we build out this encoder decoder structure such that we could produce outputs which are similar to the inputs while being entirely new images. In this session, what we'll be doing will be to build our own variational autoencoders and generate our own images. The data we'll be using and training our variational autoencoder will be the MNIST dataset which you could get from TensorFlow datasets. So right here we load this dataset and then we concatenate both the training and the test datasets. So generally, we usually have a dataset made of say x-train and y-train, x-test, y-test but since here we are not going to be making use of those outputs as the y-train and y-test we just get this tool and then we concatenate both since we would not be having a test set. So basically, we have that and then one other modification we make is we or one other reprocess instead we take is we divide these values by 255 so we normalize our dataset. So let's run this and then once our dataset has been downloaded what we do now is we convert this dataset into the TensorFlow data format. So we have our dataset which is tf.dataset or rather tf.dataset from TensorSlices. So you see that we take that and then we pass in our MNIST digits which we've already downloaded. So we run that and that should be fine. Now we could check out the length of this dataset and you see you should have 70,000. So we have 70,000 different data points which make up our dataset. From here we're going to define the batch size. So we would have a batch size of 128. That's it and then we'll go to the usual steps of shuffling our dataset. We have our dataset, we shuffle, we batch and then we prefetch. Now if you're new to this you could check out the previous sections in this course. Anyways we have this three as we've said and then now we could run this. So that's it. You could see train dataset. There we go. You see that we have this train dataset here and we could see its shape. So it's all 28 by 28 by 1 images we have in our dataset and there are 70,000 of them. So that's fine. Now getting to the modeling we're going to start with the encoder. You can recall that what we had seen so far was this model or this encoder model which takes in an input image right here and then outputs the mean and the variance. So we have the mean and the variance and then this two have been combined via the reparameterization technique where we have mu plus sigma times a random value drawn from a normal distribution and then this z is passed into a decoder here. So it passes into a decoder and then we get an output image such that the difference between this two is minimized. Now that said let's get back to the code and we design our encoder. So our encoder here is going to be a very simple conf net. We'll start by defining the latent dimension. There we go. Let's just put this right here. So latent dimension will be two. We have that and then we get back here. For encoder we're going to start as we said with this encoder input. So we have an encoder input and then this has a shape which we're going to give to be 28 by 28 by 1 just as we just the same as that of our images now data set. So that's the input. We've seen this already and then from here we'll define a conf 2D a conf 2D which has 32 filters 3 by 3 activation relu. So supposing that you already have some background knowledge and conf nets activation relu number of strides equal to the pattern same. So we're going to build this very basic conf net. Now this takes in the encoder inputs. Remember or recall we're using the Keras functional API right here. So we have encoder inputs. We then create another conf layer. We just basically copy this and paste it out and then what we would have here is an increased number of channels. So we have 64 here and from here we'll go ahead and flatten our outputs. Now note that here we have X so we should change this to X from here we move on to flatten. So we have flattened and this takes in X. So now the output is flattened. We are now going to output both the mean and the standard deviation which we are going to use in sampling. But before that we'll pass this into another dense layer. So here we have this dense layer let's say 16 outputs activation, activation relu and then we take in X. Okay so we have that and now we're ready to let's just copy from here. We're ready to get the mean and the standard deviation. Copy that paste it out here and this. Okay so here we have the mean and we'll have the standard deviation. So we have dense activation relu standard deviation and then here since remember we have this output to be the latent dimension. So here we have instead of 16 we now have latent dimension which we've already fixed right here to be 2. Now one very important reason why we we have those activations to be relu here would be simply the fact that the the mean and the standard deviation are all positive numbers. So because we may output or we may get negative numbers here we want to always make sure that the values we get are positive. Now the problem with the standard deviation particularly is the fact that it's usually a very small number between 0 and 1 where the number is very far away from 1 meaning that the number is very instead closer to 0. So we have a number very close to 0 like this but then the problem with working with the relu is that having to find derivatives around this 0 here will lead to numerical instability during training and so what we want to do instead is to map this range of values or this possible range of values that the standard deviation can take to a larger range. Now to carry out this mapping we have to use a function which is both continuous in this range and monotonous that is either increasing or decreasing. Now one great function for this task will be the log function as this log function will map values of x in the range 0 to 1 two values in the range let's open it up here in the range negative infinity log or the limit as we go to a 0 the log is negative infinity if you plot out a log you would have something like this let's have something like this so you find that as you go towards 0 log goes towards negative infinity so that's why we have negative infinity here and the log of 1 is 0 so we go from this range to this larger range hence we can have a much more stable training process and so what we'll do now is instead of relying on this ReLU activation to ensure that our standard deviation is always positive what we'll do is we'll instead compute the log of the standard deviation square which happens to be the log of the variance so we're going to take this off for both the standard deviation and the mean stick that off and then right here we have log bar that's a log of the standard deviation square log bar and then we'll move on to the sampling process where we're going to obtain z remember z is equal to mu which is the mean plus the standard deviation times epsilon where epsilon here is a random number drawn from a normal distribution we are now going to create the sampling layer which takes in the mean and the log of the variance and then outputs the z so here we have z equals sampling there we go we now taking as inputs the mean and the log of the variance and that's it so from now we're going to create the sampling layer so let's go ahead and create the sampling layer so we have your sampling sampling there we go and this is a layer okay so we have that then we just we just have this call method which takes in our inputs let's let's just have our inputs and then from those inputs what we would have is the mean and the variance so basically we have we extract the mean of the variance from this input remember we have the mean and a log bar let's say mean and the log bar actually so we have that mean and log bar which we extract from the inputs and then if you remember what we have here so the way we obtain z is mu plus sigma times this random number right here now let's let's see how when given mu when we when we get mu and we get the log of the variance we are able to obtain this sigma because we already have mu but now we need to get sigma now note that sigma is a standard deviation and the standard deviation or let's say let's just write sigma and sigma itself is equal the square root of the variance you see that it's equal to square root of the variance so it means that and then we know that the variance written like this can be written as e that's exponential to the power of log of the variance so generally we know that x equal e to the log of x you see that so here we have e to the log of the variance and then we also have the square root so let's have this here now this is equal e to the log of the variance let's just leave that as v and all of this to the power of a half we also know that e to the power of e to the power of say x to the power of all of this to the power of a is equal e to the power of a x see that so here this is going to be equal e to the power of half times log v so it's going to be half log variance so now we now that we have the log of the variance to obtain the sigma this basically what we need to do so let's get back and then we have the mean plus sigma which is exponential and then we have half so 0.5 times the log of the variance and then we need to multiply this here the sigma by a random value so we have tf the random normal and then we specify a shape which is simply the batch size and the latent dimension now that we have this we can now go ahead and define our encoder model call this encoder model which is a tensorflow model and we have encoder inputs for the input there we go that's it here and then for the outputs we have this list made of z the mean and the log variance so we have mean and log var okay so let's give it a name we'll call this encoder now once we have this we could get a summary of this so we have encoder model summary which we could visualize right here see that from here now we move on to defining our decoder the decoder as we've seen already we'll take in the z here and then output the images so we'll go ahead here and create the inputs call this latent inputs and then there we have a tensorflow input and shape is gonna be the same as that of z so right here we have latent zim and that's fine okay so now we have this we're gonna take our z let's get back here so now we have to up sample this z here which has a shape to now an image of shape 28 by 28 by 1 but generally what we've been doing is we've been used to down sampling so in down sampling we have an input we have some conf net layers we stack them up and then when we get towards the end we could flatten and then we have some dense layers with a specific output which matches the type of output one again but in our case we're now doing some sort of the opposite of this so what we'll do is we'll go through some dense layer here we'll go through some dense layer and then from this dense layer we would pass this into a transpose convolution layer so we've been used to the convolution layer but here we use the conf to the transpose layer essentially what we have is this conf to the layer with its weights which up samples inputs now getting back to the code we have this input which has shape batch size by the latent dimension let's say two and then from here we want to make use of a conf to the transpose layer which takes inputs batch size by some x by some y by some z that's similar to the conf layer so what we will have to do is we're gonna uh reshape this actually so we have to reshape this such that we have something like this now if we want to have uh x y z such that x is say for example seven y is seven and z is say 64 let's take that example then we have to ensure that what we're getting after this has shape b by um this seven by seven by 64 seven times seven times 64 you see getting back here we have this let's let's maybe redraw this again so it's clear so what we're saying is we have this input and then what we intend to have is something like batch size by seven by seven by 64 now the reason why we pick in seven is because the output is 28 by 28 so one to be able to up sample sorry that we could say seven by seven up sampled to 14 by 14 and then the 14 by 14 of sample to 28 by 28 so that's why we pick in seven year now uh this uh year you see it doesn't match with this there's no way we could reshape two to become this so what we'll do is i will pass this through a dense layer which has outputs batch by seven times seven times 64 and now after reshaping we could obtain this so let's get into the code and what we'll have is uh dense layer dense oops we have uh dense layer um seven times seven times 64 and then we have the activation activation relu and this simply takes in the latent inputs so we have here latent inputs there we go now from from here now we we do the reshaping so we have x equal reshape and then we specify the shape so we have seven by seven by 64 so that's it now we have that we have x there we go we now start with our conf to the transpose so you see we'll reshape this into this now could make it so far come to the transpose come to the transpose which takes in uh number of filters is very similar to the conf layer so we have a number of filters let's say 64 filters and then the the the kennel size 3 the activation relu so we're going to use the relu activation number of strides um two the padding is going to be the same see so it's quite similar to the conf layer but the difference is that now we up sampling instead of down sampling so we have that then from here we are going to change this to 32 so for the encoder what we did was we increased this year this number of channels and then here we reduce the number of channels now in our final output layer we're going to have this decoder output which is going to have just one channel so here we have an output which is 28 by 28 by 1 where the values lie between zero and one so what we're going to do now is we're going to have channel numbers equal one um the activation instead of relu will be sigmoid so here we have sigmoid and then we're not going to use any strides since we're not up sampling so that's it now the reason why we're using sigmoid here is quite simple since one of our values fall between zero and one we want that each and every time we have an input we have something like this so no matter the input we have the sigmoid will always put the the value or make that input turn into a value which lies between zero and one and that's basically what we want to do in this last layer right here so that's why you see we're making use of the sigmoid now once that's done we create our decoder model which is tensorflow model and then we have latent inputs there we go we have our decoder output the name is decoder and then we could get a summary of this model so that's basically it let's run that and there we go see we have a decoder model now for the training we are going to make use of the atom optimizer with a learning rate of 0.001 and we're going to train for over 20 epochs now as we've seen already our loss we made of two parts that's the reconstruction and the regularization part now for the reconstruction part aim is to minimize the difference between the output image and the input image so we'll go ahead and start with the reconstruction loss we have our custom loss there we go it takes in y pred takes in y let's start with y true takes in y true y pred and then the reconstruction loss itself is defined such that we have recounts let's just say loss reconstruction is equal tf keras losses binary cross entropy loss okay so our outputs remember outputs range between zero and one so we could use the binary cross entropy loss here feel free to test all our different losses so we have that and then we pass in y true and y pred now once we make use of this we are going to now sum all the values because we have in let's say we have something like this let's suppose that it was our five by five outputs we are having so we would have something like this one two three four five that's fine five there we go you see we have this five by five here so with this binary cross entropy we'll be able to get the the difference for each and every position here now we need to sum all this here so what we would have is um let's take this off we'll take this off and we'll have the reduced sum so this reduced sum now we'll sum all this different um difference all these different loss values we get here so now we we we sum for each and every position obviously we have 28 so it's 28 by 28 positions and we also need to specify the axis here so the axis we're going to work with is one and two one and two now to understand why we have in this let's take a look at the shape of the output is b by 28 by 28 by one but where we're actually carrying out this where we're actually computing the loss is in this axis here so uh we specify one two because this is zero one two three so this is where we want to compute our loss so it's on this two axis now said we have that and then we now look for the mean so we could after summing up we could look for the mean let's average the values and that should be it so that's it for our loss reconstruction the next step let's take all this off the next step will be this loss regularization getting back here you see we have this sum see the sum of the the the the variance plus the mean square of the mean minus one minus the log of the variance so let's get back here and we have this negative half here if you take this negative and multiply by each and every one of this would have log var plus one minus the mean square minus the var before we continue remember again that we could get sigma j as e to the log of uh sigma j see that so oh so yeah yeah no let's let's let's let's write this better so we have sigma j equal e to the log of sigma j okay so that's it um let's get back here we have uh loss uh we'll still have this mean and sum so we'll sit uh average sum and then find the average uh let's have that and then we'll have negative 0.5 times the log var let's get here we'll need a log var oh let's say log mean and then let's let's get exactly what we had here remember in this um encoder model we outputted a mean and a log var so let's have mean there we go we have mean and log var okay so we have this set now we could make use of it right here we have as we've said already log var plus one so we just have plus one minus the mean square so tf mat mean uh rather the square of the mean and then minus tf mat e to the power of log var see that and now we return the loss reconstruction and the loss regularization so that's it so this is our custom loss we run this uh we get in this error let's add this and run that again okay so that's fine and we now set to start with the training but before going on we have to um also specify the axis for this sum so right here we have this axis which is equal one and we explain why we we specify this axis to be one now the shape of the log var and the mean is this matched by two and so the shape of all this sum will still be this and so if you're comparing this all right if you're computing this here you'll get this kind of output and so since for the loss we need a single value we need to sum all the um values we get in this axis so that's why you see we specify the axis to be one we have this uh input which we define then we have the encoder model which outputs z the mean and the variance which you're not going to make use of and then we have the decoder which takes in z and then produces this output so here we have this uh vae model which contains both the encoder and the decoder now we're going to go ahead and build our custom training block we suppose that you already have an idea of how this works so we have that training block which takes in our x batch and then we're going to make use of tensorflow's gradient tip so we will have this width tf gradient tape gradient tape there we go as a recorder we're going to pass this batch into the encoder so we have our encoder model which takes in x batch and then what it outputs is z the mean and the log variance here we have this okay so we have that set and then now once we get this we we we get the z from here and pass this into our decoder so we have our decoder model which takes in z and then what it outputs is our y predicted see that and now from here we could obtain our loss by simply calling on our custom loss method which we'll define it takes in the y true it takes in the y pred y pred it takes in um as we'll define here the mean and it takes in the log variance now the this y true y true happens to be the x batch remember we are having this here this encoder takes in that so we we have our input image here let's let's take this off android a bit here a bit clearer so here we have this and we have this so we have our encoder and we have our decoder we have our input image which is what we expect to have here so our y true is what we pass as input here which is this x batch so that's why you see we specify y trials x batch then y pred is what a decoder produces so we're going to compare the y pred and then the y true which is as we've said already the x batch that's fine we get back to the code we have that said and then now we have our partial derivatives our partial derivatives which we'll get by making use of the recorder so we have recorder gradient it takes in the loss it takes in the overall models trainable weights so we have trainable weights okay so talking about the overall model which we yet to define we have it right here it takes in this input 28 by 28 by 1 it takes the input pass into the encoder model gets the output from the encoder model and then into the decoder model and then from there we create our model which takes in v ae input and this output right here so from here let's get the summary and we see that we have exactly what we expect so we have this model which takes as we've said already this input the encoder outputs z the mean and log bar and then the decoder outputs this image right here so let's get back to our training as we're saying where we we have this partial derivatives from the loss and the trainable weights so that's it we then go ahead with the gradient descent step we apply the optimizer so we have optimizer that apply gradients there we go which takes in our partial derivatives and our trainable weights so there we go we have z partial derivatives which we just calculated right here derivatives and then the trainable weights trainable weights there we go okay so that's it we could from here just simply return the loss now we're gonna run this and let's we haven't run this yet okay so we have that already now we could define our neural learn method so we have your neural learn which will take in the number of epochs and then from here for the epoch in range epochs there we go we're going to start by printing out the training starts for epoch number whatever epoch we add so we format that to take in the epoch plus one since we're going to step from zero or we could just take this one from here and then there we go now we have to add plus one here okay so we have that and then now we're going to do for step x batch we're going to enumerate in enumerate um the train data set which we've defined already let's get back to the top and we see we have our train data set so we're going to go through this train data set let's get back here there we go so we go through our train data set we take a specific batch and then we compute the loss so we have that training block which we've defined here our training block has been defined here so not only we compute the loss but we also apply the gradient descent step for that specific batch so we're doing this for each and every batch of our training data set now what this takes in is our x batch simple as that now once we have this the next thing we'll do is we'll print out our loss so the training loss um is um there we go we have loss and then once the training is complete we could simply print out training complete okay so we have that set now let's run this and then we have neural learn let's train for ebooks um here we get in this error let's take that off that's fine let's run this again so training is now complete and here's what we get you see that our loss drops and then we can now get straight into testing out our VAE model now before testing let's recall that our VAE is comprised of two units that's our encoder and then the decoder right here now we've trained this VAE model end to end to make sure that the inputs look very similar to this outputs produced now if we want to generate new outputs what we'll do is we are going to cut off or we are going to take off this region here and focus only on the decoder now to generate an image at random or to generate a digit in our case at random we'll just have to pass in a z in here that's a random value of z remember z is mu plus sigma epsilon so we have to pass this value of z in here and then get a random output now remember z is two-dimensional so z is this vector made of two values and so that said we will define the first values here which we will call greed x and we use the lean space method to get values from a scale let's add a cell above this so we would have a scale which will take a value of one and then we will have n equals to say 16 different values so here we'll go from negative one to one and then we'll have 16 values in between we'll pick this for grid y now so this is the first element and this is the next element so here we would have different elements so that we could generate different images there we go let's run this and then print out greed x and greed y now that we have this done the next thing we'll do is we'll plot out our different images which we shall generate using our decoder so here we have this figure we define the fixed size let's say five anyway five by five and then for i and greed x and for j in greed y we define the different subplots subplot um five by five by k plus one let's define k right here so k equals zero okay so that's it this is greed not grad we have greed that way and then now we're ready to use or to use this greed x values that's the values of i and j to generate new images okay so what we'll do now is this is plt the subplot so what we'll do now is we'll have our input which is tensorflow constant and then we have i j so these two values and then from here we have the output which is our va e but notice how we pick out uh layers too so uh to better understand this you should get back here where we defined our va e model so here you see this is uh the va e our original auto encoder which is made of different layers so uh this is layer zero layer one and then layer two um if you do va e dot layers um let's say zero see you have this input layer right here now if you change this and say anyway let's let's say for i in uh range three there we go we're going to print this out so we're going to print our layers i let's run that and we see that we have this input layer we have this uh functional model this other functional model but you should note that's basically our encoder and our decoder okay so that said we're going to make use of our va e layers here so we have the layers two this is to say that we're using the decoder actually so our decoder is going to take in the input see that it takes in the input there we go and then we'll have to select um our first axis here so that's why we have this and then we have to um select this other axis right here so sorry that our output now you see if we if we do this if we have zero here we do this see this is not going to be transformed to 28 by 28 so basically that's why we do this now once we have that we do the aim show and then we have our output our map is uh we there we go plot axis um off and then k plus equal one okay so we basically increase the value of one of k sorry sorry that uh we could have this as different subplots so let's run this now and then see what we get getting this error here so this is this is actually because we have in many more values as compared to what we're defining here in this subplots so what we should have here should be n and here should be n let's set let's run this again and see what we get now as you can see we are able to generate these digits making use of just this z vector right here which is composed of two numbers now one thing you can notice here in this latent space is that as we go from values of negative one to one in this two-dimensional latent space the outputs are created or generated such that in each line as you could see we have one digit which is being slowly morphed into another see here we start with a nine but as we change values in this latent space see slowly we get to eight and then here you see eights and at this point you start with a nine and then you slowly get into fives and so on and so forth you can look at this horizontally as well as vertically see that we get to six and that and then you could also look at this diagonal you see nine eight three two six and then you get to zeros so for this first line or this first lines you can look at this first lines as going from nine to eight then here it's like from nine to five passing through three see and so on and so forth so basically this is how we generate images using variational auto encoders at this point one question you may ask yourself is we are having this outputs which have been generated from the making use of this z vector right here but how does the z vector this z vector here of the different images vary with the particular digits in those images so how does the z vector in say let's take this for example in six vary with that of zero with two and all other digits to experiment with this we'll copy out this code we had already and then get back here and now try to pass in the different digits into our encoder and then plot out the positions of our z vectors and then see whether there is some correlation with the z vector position and the input we have your right your other specific input digit that said we're gonna not make use of this so we have that and then here we're going to have our y train so this time around we'll be needing the y train because we need to know exactly what digit we're dealing with unlike here where we just needed to have the input images okay so that's it we have x train y train we're going to reload this and then we do the usual pre-processing from here we run this cell and then we move straight away to pass in those images in our encoder now similar to what we had here now we're going to be making use of our VAE layers and then picking out the first index so here we have output we could call this or basically we have z then we have the mean and the variance and then this is equal let's copy this here um there we go we have that then this is now one see that and then from here we pass in this x train so that's it we pass in the x train and then we could go ahead to start with a plot in uh plot figure specify the figure size figure size equals 12 by 12 there we go and then we do a scatter plot now the scatter plot will show us the different positions of the values of z that's the z points in our two-dimensional space let's take this off this different z points and then also we're going to color this such that digits belonging to the same level have the same color there we go we're going to plot this as our x and then our y so we have that and then i will specify this y train as our levels so that's it and then we can now go ahead to plot out this color bar and show let's run that and here's what we get as you could see the levels here go from zero right up to nine and then you could notice these different clusters we have in here see these different clusters you see that this shows that the encoder path of our variational auto encoder has been trained such that now it's able to generate values of z where this point is outputted such that two inputs belonging to the same digit will be closer to each other as compared to two inputs belonging to two different digits so what we've been doing so far is we've been training the variational auto encoder model with our own custom training block you see right here we made use of this gradient tape and we carry out the gradient descent manually as compared to when you just have say model feet and then you pass in your training data set and basically this takes care of doing the training now the advantage of working with this feet method is that it's quite simple to use but compared to this custom training loops it doesn't give you that much freedom and that's why we're making use of this gradient tape here to train our model now with tensorflow it's possible to actually get the best of both worlds now this means that we can be able to train or get this custom training right here and then still make use of the feet method the way this is done is by overriding the train step method of our model which we shall define so that's set let's get right here and define this VAE model then we go ahead and define this model as we would do with a usual Keras model but just as we had done right here that's in this VAE model we had built here we need to take into consideration or take this argument here the encoder model and the decoder model so that's it now once we have these two models we have here our encoder which is our encoder model and we do the same for the decoder now once we're done with this decoder let's type this out we have your decoder as we're saying once we're done with this decoder the next thing we want to do is define some loss so here we have our loss it's actually a loss tracker now we'll get to understand better why we call it this a loss tracker so here we have the average or the mean of the different losses we shall get and then from here we'll go ahead to the main section where we shall override the train step method so just like let's add this here just like we used to working with methods like model like the compile method like the feet method here we have this train step method which is called each time we call on this feet method right here but since we are not going to make use of the default feet or the default train step method we'll have to override this and write out our own training block which is basically this block we have here so what we could do here is copy this out so we could simply copy this out and then paste it out here now we let's just copy all this there we go and that's it we paste this out so here we have our training block we'll call this train step so we override this train step method which is a Keras model method so here we have our train step we're taking in our batch there we go and then it's going to be similar to what we've had already in the training block so you see here we have z mean log bar which is gotten from the encoder model and from here we pass z and then we obtain y print we have our y true we obtain our loss we've already defined our custom loss method and that's it then from here we're going to update our loss metric so here we have loss tracker which we defined already here we're going to update its state so we update our loss state with the value of this loss so that's it now we are going to return the result so we have this dictionary we have loss and then we have our loss tracker dot result see that we have that so there we go we have updated our loss we output a loss but one important method we need to call again is the metrics method so here we have metrics you could define several metrics but here we just have this loss and then we return our loss tracker so we have loss tracker returned now notice how this is actually a list so you see this is a list because we could have say different losses which we could return and then we have the property decorator which we place right here so once we have this set you see we do not need any more to make use of this or to write this as as it was written and so this means that we we are not going to be writing out for example training start for epoch this the training loss is this and all of that now all we need to do is just call on our model dot fit method so right here let's do let's run this first let's run this right here there we go and then we go ahead we're getting an error now from here we we define our VAE so it's quite similar to to what we had already seen here see that now we have this VAE which is now this VAE we should create a year so let's have that VAE equal VAE and then it takes in the encoder model and the decoder model that's it so once we have this now we could do just as we were used to doing let's let's even say model let's just say model and then here we have model compile as usual and then here we're going to pass in our optimizer so here we have optimizer i guess we defined this optimizer already so let's check here yeah we have defined optimizer so here we just have optimizer equal optimizer we should define already and then we do model dot fit see that so that's all you just need to now make or call this fit method and then we have our train data set which we pass in the number of epochs we'll define this to be 20 and then the batch size equal say 128 so that's it so here unlike before where we needed for example to specify the optimizer in this way and then actually get in depth to understand that this gradient descent process needs for example the partial derivatives and the trainable weights now you'll see that unlike before where if you were for example needing to make use of some callbacks you would have to write that or include that in here with some custom code but now all you need to do is just have callbacks so we we we both will now to write our own custom training loops while still taking advantage of everything that comes with the fit method so right here you could have some callbacks and i'll be it anyway let's take this off and then let's get back here where there's this slight error you see here we have this VAE VAE dot trainable weights and that now this was in the case where we had defined our VAE this way we had this model and then get into our custom training loop right here let's get down here we had to actually uh we the way we got the trainable weights was from our VAE model which we had defined now given that we we have this VAE model right here the way we access the trainable weights is by using self so we have self dot trainable weights and then here we have self dot trainable weights and this is simply possible because we are inheriting from a Keras model which already has these different attributes which have been defined now that said let's run this that should be fine trainer complete you could see that we get in our usual outputs when we making use of the fit method and then we could go ahead and start with a testing so we run the cells and then in here we have some modifications to make so right here or previously we had this layers and predict and we took in the inputs and we got some output now here we have our model model that predicts you see that but this model is the is a whole VAE model so since we want to make use of our decoder we're going to say model dot decoder model there we go model dot decoder actually the decoder because here you define this decoder attribute so we have model dot decoder there we go dot predict and that should be fine okay so let's run this and then see what we get there we go you see we have exact same kind of outputs as when we were having our own custom training loop right here and that's it for the section hope you enjoyed it see you in the next section hi there and welcome to this new session in which we are going to delve deep into generative adversarial neural networks in the previous session we have seen how we made use of the variational auto encoders to generate these kinds of outputs which are in fact MNIST digits from just an input noise so what we had was this kind of encoder decoder structure where we had some input images here like this one for example some inputs and then we train the model such that the outputs look like this input and then once training is done we could take off this encoder and then just pass in a noise signal in here and then generate new outputs like what we have here which look like those from the original data set and in this section we'll be seeing how to build a new set of or a new category of generative models known as the GANs and we'll use them to generate images like this one which we have here you should note that all those images are images of people who do not actually exist but before we dive into practice and see how to build models which can build this kind of realistic looking images we should start by understanding how this generative adversarial neural networks that's GANs work now this GANs were first introduced in this paper by Goodfellow et al where the GAN architecture was first proposed and to understand how the GANs work let's make use of this figure from Louise Bouchard's post now let's suppose we have the bank who produces real money you see here we have this real $100 bill and then on this other end we have the thief who produces fake money you see here this $100 bill has this man with a mustache which is in the case of the real dollar bill and because differences like this and say for example this compared to this are very clear or can be easily seen this police officer is able to detect that this money is fake but when this police officer detects that the the the bank notice fake he or she says that it's fake because we have a mustache for example because we have this word fake written on it because this $100 has a fake around it and stuff like that so it tells the the forger or the thief what needs to be ameliorated in order to make sure that the next time this thief presents this fake money to the police officer the officer thinks it is maybe this real money and so if we suppose that real money takes a value of one and fake money takes a value of zero let's change the color fake money takes a value of zero then let's say in the first years of production of this fake dollar bills the police officer correctly says okay this is a zero and this is a one that's it's very evident at the beginning but then with time this thief gains experience and now produces fake money which looks just like the real money and this now pushes the police officer not to be able to distinguish between the real and the fake any longer now although we do not advocate for these kinds of malpractices it turns out that this is the way the gains actually work and in our case we'll replace this generator or we replace this thief by a generator model which is a neural network so this is going to be a neural network and we replace this police officer by a discriminator which also is a neural network specifically a binary classifier which takes in some input and says whether it is real or not whereas our generator here takes in some random inputs and then learns to output this bank notes such that the discriminator thinks that it is real now we shall head on to the GAN lab which is a project by Minsukang et al and we'll consider some very simple example so here we'll suppose all that data distribution is this year see that and then notice how we have a generator we have a discriminator and then the generator takes in some random noise and then outputs the sample and then the discriminator takes in the sample and says whether it is real or it's fake now apart from the fake samples it also takes in the real samples and then also says whether it's real or it's fake now now the weights of the generator and the discriminator have been updated such that after some time this samples which are going to be produced will look very much like this right here so let's go ahead and click on run and we see what we get you see how we start with the training let's take all this off see we start with the training for now the fix let's let's pause and start over let's restart that okay you see initially we're getting these kinds of outputs you see this output you could look at this here so this is this kind of sample is generated and this is the data of our real data right here and one thing we notice is that at times as time goes on you have here both the green and the purples which are considered to be real and then here's only the purples considered to be fake so this discriminator now starts making errors when it comes to saying whether a given sample is real or not whereas on the other hand the generator is now producing samples which look much like the real samples and it's because of this competition between the generator and the discriminator that they actually called the GANS generative adversarial this adversary comes from this competition between the generator and the discriminator and it now leads to the generator producing samples which look very much like the real samples one other point you should notice that as we carrying out this training overall we have two main parts we have this block right here oh let's take this block we have this block which consists of wearing when the discriminator takes in this real and then outputs some value the the output from this is used to update the parameters of the discriminator and then when the discriminator takes in this and then when the discriminator takes in this fake samples this output here is used now to update the generator so we have this block wherein we update the discriminator and we have this other block wherein we update our generator so with that now we could pass this and you could change the data distribution and so again in comparison to with the VAEs where we had this encoder and then the decoder well if we have a distribution like this one so we have some some input we want to have or we want to be able to output or get outputs which will look similar to this input distribution and then after training this encoder decoder we now break this up and then make use of only this decoder to now generate output which are similar our distribution is similar to that of this real inputs right here again another important point to note is the fact that after training after a certain number of epochs at some point where when the reals and the fakes look very similar the discriminator now becomes somehow confused as if you notice here you find some green patches you see you have some green patches purple patches here we have some green patches and purple patches so it's no longer able to distinguish between the reals and the fakes and so instead of as before or as at the beginning where I was able to say this is a one and this was a zero now it sees this as a 0.5 and this as a 0.5 it becomes confused and so in fact the aim of our GAN training process will be to ensure that the generator wins the fight it should be noted that most of very cool applications of GANs are in the domain of image generation and so we'll look at some of these applications in this article by Junotano Hoi GANs can be used in creating anime characters you see here we have this anime characters which have been generated automatically using GANs and so this means that similar to what we had here we are going to have a real data set which produces images similar to what we have right here let's get back here similar to what we have right here and then we're going to have a generator which will learn over time to generate images which will look like the real images and that's how we get to have images like this one now from here we also have pose guided person image generation here for example you'll notice that we have this input image and then if we want to have this same person but with a different pose then we could pass in this pose and then get this output right here so you see that we have this input that's this image with this different pose and then this is being generated another application is in cross-domain translations so you see here we have this input where we have this two or say three zebras and then we are able to transform this input image automatically into this other domain where we instead have horses and so this image or this input or this output rather has been generated from this input and this could be done in the reverse direction as we would see here we see you could go get from zebra to horse as we had here and then from horse to zebra then we have another example this is a star gun which permits us carry out translations or transformations where we modify specific high level features of an image so right here you see we have this input and then we add for example blonde hair so it changes the gender modified so that this this male becomes a female aged and talking about aged you could build an application which tells you or which shows you what you would look like after say 2050 years then here we have pale skin another awesome other modifications we could have here is he angry happy fearful and that so that's it you you see making use of this star again you could carry out these kinds of transformations on an input image the next we have this pixel dt again which creates clothing images and styles from an input image so you see we have the source image and we have this different images which have been generated from this source we have other examples right here it's good see and then we have super resolution now for super resolution gains have been used to increase the resolution of an input image while making this higher resolution images more realistic so you see this image for example this by cubic this one right here which has been gotten using the by cubic method and then we have this other image which is using res net and then now you'll notice that there is some difference with the kind of output we get using again notice how this image here looks more realistic or looks much more like the original as compared to what this other two known gain methods produce to even get a much clearer difference notice this part here where we have this water pouring you'll notice how this looks much more realistic as compared to using a classical neural network like the res net then from here move on to the next application which is that of generating faces by this time around very high definition faces so here you see you have 1024 by 1024 images generated you see this images of people who do not actually exist which have been gotten using a pro gain that's a progressive gain now we move to the next we have style gain which even comes with much better resolution and with some styling so from here we go on to high resolution image synthesis where we could get you see this semantic map and then from here we generate this output we have right here then the next go gains which as we've seen already takes these kinds of semantic maps and then produces this output you see this is the the ground truth this is the exact output and this is what the gain produces see that see from here we were to produce this from this we able to produce this now this kind of technology could be applied in video compression in a sense that during a video call where we have this input so supposing that we have this this is a sender and then this is a receiver we separate it by this dotted lines right here so we have this input right here and then we carry out key point extraction where we get this key points as you could see here and then this is what's been transmitted via the network so instead of transmitting this input we transmit this key points and then making use of some key frame which has been passed initially we combine those key points with the key frame to produce now this output right here which looks like this original input which we wanted to pass and so now at the receiving end we are able to get this here at a much lower bandwidth since we are taking in only the key points and not this whole input image then we also have applications in text to image where we could pass in a text like this flower has long thin yellow petals and a lot of yellow enters in the center and this generates this kind of output now talking about this kind of applications we could check out rayon.com which is in fact a del E mini model where we'll be able to create much more realistic output so let's click on this and while this is loading we could check out on the other applications the next application text to image we've seen this already face synthesis you see right here we get with a single input image we create faces in different viewing angles so for example we can use this to transform images that will be easier for face recognition so if we suppose that we get in this kind of input let's say we get this kind of input we could generate or we could convert this for example into this one here such that it will be easier for a face recognition model to do its job now from here we go to the next one image in painting okay so right here we have this inputs let's increase this take this off so as we're saying we have let's let's pick this one for example we have this input right here but we have this patch which has been taken off and now making use of again we are able to generate an output which will be like this input without the patch so just like the gain takes this patch off and as you can see the gain does its job quite well let's reduce this there we go before we move on let's get back to this output create it or i create on the com and you can see that this dial e mini model produces even much more realistic images then we move on to the disco gains where we could create outputs which match the style of a given input so supposing you want to go out on a little trip you want to say take this back and you wanting to get some ideas of the kind of shoe you could put on then you could make a call on the disco gain and you'll get this kind of output based on your input so this is a model which is similar to the cycle gain which we had already seen right here and where we were able to leave from one domain to another one other fun project will be to generate emojis from input images see here we have this input image for example and we have this emoji which is generated from the input then another very interesting application will be in deblurring so right here we suppose that we have this input image which is clearly blurred and then we want to de blur this image you see that we're able to produce this kind of images making use of gains and you can see from those images here that gains do this job quite well another application is in photo editing and so now you do not need to be some expert in photo editing to carry out some of this photo edits all you need now is some gain and you're good to go apart from image generation gains too can be used in music generation though in this course we shall focus on image generation then the medical domain gains could be used in anomaly detection and that's it for this section in the next section we are going to look at how these gains are actually trained in practice and the type of loss functions we use when training those gains hi there and welcome to this session in this session we're going to look at the gain loss function and also how gains have been trained then finally we'll look at common gain training problems so for a brief summary of what we've seen already we'll consider that we have this two distributions right here this distribution or this one this black dotted lines represents the real data that's similar to what we had here so similar to this real data right here and then this other distribution in green represents the fake data which is going to be generated by our generator so getting back here we have this you see this here which is our discriminator so right here we have the discriminator d then we have as we've said already the real and the fake distributions now once we start training we have this discriminator which sees that most most of the real data is getting a score of one or it classifies with the probability of one that this data is real and then as we approach this generated data as we get samples from generated distribution we find that the discriminator now outputs zero now as we keep on training the generation gets better and then the generated samples now look a bit more like the real data you see they come they get closer this distance here becomes smaller as compared to what we had before so we've trained and we've gotten to this point where now this discriminator still sees the real data to be one but now confuses or sometimes classifies this fake data to to be one so you see there around here see this samples we consider to be ones are classified as one though we'll still have some samples classified as zero and then once we get to convergence we have this half here so we have this classifier now which is unable to differentiate between the real and the generated of fake data because the distributions look very much alike and so that's it for the recap of the previous session now we're going to dive into the gain loss function so right here we have this training algorithm which we could see here expand this we have this training algorithm and if you notice we have this two loss functions here one for the discriminator and the other for the generator but this could be combined into one equation right here let's take this year and now we have our two player min max game with this value function v now when we talk about this min max game right here it makes allusion to what goes on between the generator and the discriminator so our two players in this case are our generator and the discriminator now one thing you could also do with this gain lab is you could put out your own distribution so let's say for example this distribution we apply and we start to train and the the generator now together with the discriminator start to play this game where at the end or at convergence we expect the generator to produce outputs which are similar to that of the discriminator now coming back to this equations you'll notice that we have min and g and max and d so to understand this notation you can consider that we are minimizing this expression right here by updating the parameters of g and then we maximizing this expression by updating the parameters of d which is our discriminator where g is our generator and then if we try to separate this two that's to to get for the to to start with minimizing for example for the generator you'd find that given that in this first expression you see this is this expression right here let's have that so in this expression there is actually no g so we could make use of only this when we're trying to minimize this whole expression with respect to parameters of g so that's why if you if you take in the algorithm given to us right here you find that for loss for g you have only that second expression now for the d let's get back here for the d d is in the both sides or this both this expression and this sort of expression so that's why we have the combination of the two in this our algorithm right here now to better explain or better understand this in depth let's consider this here so we've extracted for the d and for the g you should also note that let's get back here while we talk about updating the discriminator or updating the generator is basically our gradient descent step remember if we have for example tethered all right if we have let's take this off if we have tethered at a particular step let's say step i to get tethered at i i plus one step then we would have tethered i that's a previous step or let's say tethered i minus one let's take this off so we want to get tethered i which equal to the i minus one minus the learning rate times the partial derivative of the loss with respect to our tethered i minus one so this is basically what we have in here and this expression here where we have this reversed triangle that's this right here is actually this section of our gradient descent and so this means that we're finding the partial derivative of our loss this is the loss with respect to the parameters tethered then we have the same for the generator now let's get back here where we're going to understand in depth what all these different expressions actually mean right here we have this real sample and this fake sample so this two here we have real and fake and then with this real sample we pass it into the discriminator which we hope or which we will train to see that this is a one and then for the fake sample because this is a person who doesn't exist whereas this person actually exists and we'll train again which we'll see later on in this course to produce this type of output so let's get back here we have our discriminator then we from here let's take let's instead shift this let's move this this way so let's take this right here yeah okay so we have that and then here we have this right here now this is our fake sample and then we have our generator let's change the color for the generator so our generator produces this and then we have some random noise which we send in here because we just we just want to be able to produce this from some random noise so we we have this random noise right here which produces this our g which produces that fake sample and then we have our discriminator who is able to classify or say whether an input is fake or not okay so getting back here let's take this off getting back here when we want to train our discriminator we're gonna have an input x right here see this input x now this input x is this real sample right here so those are x i which is which is this here and these are discriminator which takes in this input then once the discriminator takes this input it is expected to output a one and if you get back here let's get back here you find you're told update the discriminator by ascending its stochastic gradient and then you update the generator by descending the stochastic gradient now generally when we're trying to ascend or ascending or descending actually is is different like what we've seen already where we had theta equal theta minus linear rate partial derivative of the loss with respect to theta is gradient descent which is actually what we're doing for the generator but when we talk about gradient ascent it's actually this instead we have a plus instead of a minus see that so if we have for let's plot our loss function here respect to theta for our gradient descent that's a classical gradient descent what we want to do is to minimize this loss but for our gradient ascent we want to instead maximize this loss and then getting back here before we move on we'll consider this plot of the log function so here if we have x and we have log of x then we have a plot which looks like this let's have this there we go and this is one so when x equal one the log is zero and then well as x takes smaller values approaching zero the log goes to us negative infinity so as we as x goes to us zero obviously from the right in this direction going in this direction then the log of x goes towards negative infinity so that's it now let's get back here for the discriminator as we've seen we're having a gradient ascent so we're trying to maximize this remember from our min and max loss function we we we're trying to maximize our discriminator so when we when the discriminator takes in a real sample it gives us an output of one and with an output of one the log of one so see here it's going to give us zero so the log of one is zero so this will give us a zero now you should note that all the outputs of our discriminator range between zero and one so our discriminator is our usual classifier so it's going to output values between zero and one and so getting a value of zero is the highest um you could get as an output after passing through the log so let me re-explain we we have this discriminator which outputs value between zero and one oh let's plot it out let's reply it again here so the discriminator outputs values between zero and one see that so this is what the discriminator outputs and so when you when you have the log of this values between zero and one then the maximum value for the log will be a zero see that the maximum value for all these values between zero and one will be a zero so when you have a zero it means you've maximized this and that falls in line with what you expect because we want to maximize this expression we have right here okay so we get back here for the reals we want to output a one and the log of one will give us the maximum possible value which is in this case zero now uh for this year we have a z remember this x is this real image here the zer is our random noise which we have seen already here so when this random noise passes through our generator we output a fake sample see that the generator takes in the random noise and then this generator let's have this here the generator outputs this fake sample and once the outputs this are fake sample we now take this fake sample and pass it into our discriminator and we expect our discriminator to produce instead of a one this time around a zero see that and in our case uh the log the log of one minus a zero is a log of one and the log of one is zero that's the highest possible value we could get when we're dealing with logs in the range zero to one so we maximizing this you see that now it's less understood we could move on to the generator for the generator we want to instead minimize this expression so since we're trying to minimize this expression we would expect the output from this to be negative infinity as the lowest possible value but uh let's get let's put in these values and see how we'll obtain this negative infinity now we have z which is our random noise when the random noise passes through g it outputs this fake sample again and now this time around we expect z to consider this fake sample this time around to be like a real sample so what we're saying here is at one instance we want our discriminator to see this as real another instance we want the discriminator to see this as fake and depending on the instance we're going to get or we're going to update parameters of the corresponding network so in the case where we want to see this as a fake we want to update the parameters of the discriminator such that it sees this as fake and then the case where we want this to be seen as a real by the discriminator want to be updating the parameters of the generator so in fact what's going to be happening here is let's change the scholar when training the discriminator we're going to freeze the generator so we're not going to update these parameters we're going to update the parameters of the discriminator now when training when training let's get back when training our generator we're going to freeze this year oops when training the generator we're going to freeze this and then update its parameters solita is able to fool the discriminator to think that this is a real sample and when he thinks it's a real sample he's going to output a one now log of one minus one is log of zero and log of zero is negative infinity see that and this is the minimum possible value so we're minimizing this expression so getting back here we have for a number of training iterations that basically for the number of ebooks uh we're going to do k steps we're going to update the discriminator we are k steps see four k steps do this we take we sample a mini batch of noise we get the noise uh mini batch of real samples then from here we obtain uh we get the generator outputs and then we obtain the output loss which we use to update the discriminator's parameters and then uh for these k steps i think in here they say we use k equals one the number of steps are applied say use k equal one so you could modify this although in practice k equals one is fine so um getting back here after going through k steps for this we're going to update now the generator so sample mini batch of m noise again so just like this then uh we obtain this output from the generator this time i only expect it to fold the discriminator so we're going to update the generator by descending its stochastic gradient such that it folds the discriminator and that's it for this section in the next section we're going to uh get into some practice and see how to get this kind of results we had here so these are the real samples and then uh what we'll obtain will be something like this see we're able to generate this um fake outputs of these people who do not actually exist and they look pretty realistic then one point we have to note before we move on is that the type of neural network used in this original GAN paper is a classic artificial neural network and these were kinds of outputs they got but in the practice we're going to be making use of the DC GAN so instead of the the simple GAN we'll make use of the DC GAN the DC actually stands for deep convolutional um and that's it DC deep convolutional so DC GAN means deep convolutional generative adversarial neural networks okay so here we're going to be looking at this um neural networks which are convolution based and so with that see you in the next section hi guys and welcome to the session in which we are going to write the code for generating images like this in TensorFlow from the previous session we had looked at the GAN loss function and now we'll see how to adapt this loss such that we could instead use the binary cross entropy loss and then from there we'll go ahead to look at different methods to make our GAN training much more stable you can see from here that this GAN training is in a very stable process since we have two adversaries where one is trying to maximize the loss and the other is trying to minimize the loss so by nature GANs are much more difficult to train as compared to other classical neural networks then after looking at how or looking at different methods to make this GAN training process more stable we'll go ahead to train our GAN from our previous session we had the discriminator loss and the generator loss now we want to introduce the binary cross entropy loss you'll notice that it looks quite similar to what we have here now let's start first with the discriminator for the discriminator it takes in this real image and then we expect it to output one see that we expect to have a wonder now for the binary cross if we have to compare this with binary cross entropy then this y-shuffle this y here is going to be equal d of xi which is practically what we have here so this is our y-shuffle and then y this yi-shuffle is going to be this here now when oh since we want our y to be equal one it means that one minus y will be zero so we will not consider this but we consider only this part right here now if we consider only this part we are left with this expression since y is equal one and then when y equals zero the case where we have this sorry this y-shuffle equals zero then we'll have log of zero which is log of negative infinity minus that with this minus which we didn't have here is now in this year because we're dealing with a binary cross entropy the negative of the negative infinity gives us positive infinity so it means that for y-shuffle equals zero we instead have positive infinity so here since we want to have y-shuffle to be equal one then our aim will be to minimize the binary cross entropy loss since the values we could get will range between zero and positive infinity unlike here where the values were ranging from negative infinity to zero so here we was maximization problem we had gradient ascent but here we're trying to minimize this instead so we minimize this because we want to obtain a zero since it's when our output is zero that or is when our y shuffle is equal one that we get this output of zero so that said we have it for this first part for the next part when we're dealing with our data our fake data which has been generated by our generator we expect d to be equal to zero and since our y in this case is equal to zero since we have y to be equal to zero then this term is taken off and we're left with only this term now for this term we would have one minus the expected value of d which is zero that's one log one minus y-shuffle where y-shuffle is what the model predicts so in the case where we have this y-shuffle to be equal zero as expected then we would have log of zero and that will give us zero so we would have a zero here now in another case or in the other case where our model instead predicts one year if the model instead predicts one then we'll have log of one minus one which is going to be zero log of zero is negative infinity this negative here turns out to positive infinity and so now we have positive infinity and since our aim is to obtain this y-shuffle to be zero it means that we'll minimize this expression here such that it takes the value of zero over the with the aim of having a value of zero and again unlike this original expression where we went from negative infinity to zero now we're going from zero to positive infinity and so this means that when working with the discriminator and making use of the binary cross entropy we would go through a simple gradient descent where we will minimize the loss from here we will move on to the generator for the generator well we expect d to produce a one right here so we expect them to have one year and this means that this expression is taken off since we have one minus one which is zero so here we left only with this now we left with one bit is one so it's just log log of y-shuffle which is what the model predict or what this d the discriminator would predict and so we have negative anyway let's just say negative let's take up meet the one on n and the sum so we have this expression right here and then in the case where the model actually predicts a one so the model is predicted to predict a one and it actually predicts a one in that case we would have a zero in the case where the model predicts a zero when when it's supposed to predict a one in that case we will have a log of zero that is negative infinity negative negative infinity is positive infinity and again here our aim is to obtain uh this zero right here so we will again be minimizing our bce loss now given that we'll be implementing the model or the architecture in this paper that is this again paper we should note some of these guidelines here we told replace any pulling layers with strided convolutions so instead of using pulling we use strided convolutions and fractional strided convolutions for the generator um the next use batch norm in both the generator and the discriminator then remove fully connected hidden layers for deeper uh architectures basically here we're using the convolutional layers instead of the fully connected hidden layers uh use relu activations in generator for all layers except for the output which uses the tanh activation then use leaky relu activation in the discriminator for all layers but before we move on to look at some of these details we should note that there is this github repo here by one of the authors authors of the paper that's um there we go let's get up here by Sumit Shintala and what he proposes here is this um list of tips and tricks used to make GANs work now as we said already it's it's not that evident to make GANs work so uh taking advantage of the experience from one of the authors of the original DC or of the DC GAN paper not the original GAN paper will be very interesting for us since we will not uh get to make the same mistakes which maybe he had made uh before discovering all those different tricks now that said we have here the very first one you should note that this list is no longer maintained i'm not sure how relevant it is in 2020 so this is this has been for a while uh six years six years anyways we have uh normalized the first one normalized the input so normalized the images between negative one and one then use tanh as the last layer of the generator again this was already in the paper the next tip will be to modify the loss function now it should be noted that we've already explained this but maybe the transition from this previous loss function to this other loss function wasn't made very clear now let's get back to that call for our loss function we had um we had a log of one minus d of g of z this for the generator remember so we had only this year only this expression in the original GAN paper they expected the discriminator to produce a zero although previously we had mentioned that we expect the discriminator to produce a one right from the beginning we've been speaking of this but in the original paper what they actually wanted was the discriminator to produce a zero for the generator before we go on to look at how to compute this BCE loss we are going to take into consideration this modification on this original loss right here and this modification comes in because of the following problem when the training just starts it's difficult for this discriminator to output a one unlike here where it's easier for it to output a zero when it sees fake data because remember let's get back to this and uh let's restart here let's restart this and let's pick another distribution oh let's let's pick this one which is slightly more complicated so when you just start with the training the the the generated outputs you see uh the generated outputs do not look very much like the real data you see this year it's very different from this and so because of this great difference at the start of the training it's difficult for us to make the generator fool the discriminator to output a one right here and also because of the fact that classifying whether an input is real or not is easier than generating new inputs the generator here will experience vanishing gradients and so instead of as we've seen already trying to minimize this we can instead maximize the sum of log dg of z so instead of log one minus dg of z we're going to have log dg of z the reason why is preferable for us to use this expression where we maximizing this year instead of minimizing this other expression is simply because when we make use of this expression we are being more lenient on the generator at the beginning of the training especially so when we were minimizing the log of one minus dg of z we had levels or if we had to make use of the binary cross entropy loss we would have levels which are equal zero and so if our level equals zero obviously this expression is left out and we left only with this part now we left with y equals zero and then we have log of one minus y just similar to what we have here but then the fact that at the very beginning we are expecting our discriminator to output y equals zero when it sees fake data from the generator is a problem because it's a very easy task for the generator for the discriminator especially because right here or at the very beginning this year is obviously not going to look like the real data and so the discriminator will find it very easy in predicting that this is fake data and because of this ease the generator's weights wouldn't be updated any further to ensure that we could produce even more realistic looking fake images and so what we do is we flip the levels and flipping the levels matches with this expression that is let's take this off we are having now y equal one so instead of y equals zero we now have y equal one so we're expecting the discriminator to output one when it sees fake data from the generator now doing this we have y equal one so obviously this is left out you see that this now matches with maximizing this expression and so when y equal one now we are telling the discriminator to output a one year when it sees fake data and now this will permit the generator to be able to update its weights and so make the training much more stable as compared to when we're dealing with or working with labels yi which are equal to zero you should also note that when we're talking about levels here we're talking about this yi's and when we're talking about what the model predicts we're talking about this y hats or this y shappels our next tip is one which we've discussed already so we told in the GAN paper is the last function to optimize g is the minimization of log one minus d but in practice folks practically use max of log d this uh what we have seen already where we had the minimization of one minus log d g of z so we had um log sorry log of one minus d g of z right here so we minimize this expression and then now instead of this we maximizing um the log of d g of z and the reason why we prefer to work this way as we've said already is because if you are expecting the discriminator to take in some fake data from here at the beginning and say that it is fake then this is going to be a very easy task especially at the beginning since the fake data here is going to look very much different from the real data and so because we our aim here is to make this discriminator output zero is going to be difficult for the generator to update its parameters such that the discriminator can start uh getting fooled and so instead of this as we've said already we flip the levels and uh instead aim for the discriminator to output once for the next tip we said uh they say don't sample from a uniform distribution so for the noise here for a generator noise we're going to sample from a Gaussian distribution or normal distribution okay so from there we're encouraged to use batch normalization avoid sparse gradients and unlike in the paper if we get back to the DCGAN paper and we check let's get to this we check here we we told use leaky relu activation in the discriminator for all layers and then use relu activation in the generator for all layers but what the say is uh the leaky relu is good in both the g and the d that's both the generator and the discriminator then one point you should notice the fact that the stability of the GAN game suffers if you have space sparse gradients now what does it mean by this uh if you have a relu activation if you have some input all negative inputs are sent to zero and all positives remain the same then for the leaky relu we have all negatives we should take up some value depending on what we pick the value to be if the value is 0.2 for example then uh an input of negative one will give us an output of negative 0.2 uh and an input of negative two will give us for example uh negative 0.4 an input of say whatever value stems negative 0.2 to get output now that said the this the positive section remains the same but now when we talk about those sparse gradients here it comes due to the fact that if we have relus in our network then we'll tend to have many zeros and the zeros uh or will cause this sparsity in the gradient and so because we do not want to have this and because we want to train the GANs in the most stable manner then we'll make use of the leaky relu then for down sampling we're told to use average pooling or conv2d and strides then for up sampling you speak the shuffle conv transpose convolutional transpose 2d with strides okay then from here we're told to use soft and noisy levels so what does this mean this means that if we have a discriminator and let's say we're training our discriminator where we have an input from a generator we train now we update the parameters of the generator where we pass in our fake data from this output of the generator into our discriminator and the discriminator has to compare the the output from the model let's let's put the output from the model in this color let's say the output from the model is 0.4 for example so what we'll be comparing will be this correct level with the model's prediction you see that now what the sayers apply level smoothing so instead of taking one we could take a random value around one so instead of this we could take for example say 1.2 or 0.8 or 0.9 or whatever value just around one so instead of having hard levels just like some strict levels either we have zero or we have one would take values around so around exchange the color will take values around zero and then values around one you see that so we use smooth leveling instead of some hard leveling then we also total make the level make the levels the noisy make the levels noisy for the discriminator that's occasionally flip the levels when training the discriminator okay so that's fine now next use this again when you can it works so if you if you can use this again and the model table use hybrid model like for example the VAE and GAN now here we have been told that if we are to generate images we should not use the original GANs that we should not use the simple neural networks that are fully connected neural networks we should go in for convolutional neural networks okay the next is to use stability tricks from reinforcement learning now we we get to treat your reinforcement learning so we're going to skip out this now from here use the Adam optimizer optimum Adam rules so the after Adam optimizer rules and then track failures early so if you want to be able to train your GANs without maybe upgrading at the end that you you haven't or you you can get the kind of results we expected you should try to ensure that you make sure your GAN isn't doing any one of this right here so if you're training and you and your loss the loss of the discriminator goes to zero then it's failure mode because your discriminator is proving to be too good at doing its job and then if the norms of the gradients are over a hundred things are screwing up when things are working the loss has low variance and goes down over time versus having huge variance and spiking so what the same year is if we have this the loss for the discriminator remember we have the discriminator on the generator what we expect to have is something like this which should go down slowly over time instead of having this kind of high variance and spiking now if the loss of the generator is steadily decreases then it's fooling the D that's a discriminator with garbage and so this means that we do not expect the generator to be so good that during training its loss just drops steadily and so as we said already you should track all those failures early on now the next we have don't balance loss via statistics unless you have a good reason to now oh they say they've tried it's hard and they've tried it all um let's take this off so what they're saying is when try to balance the training of the generator and the discriminator based on some loss value so if you are to try this you should have a principled approach to it rather than just intuition now if you have levels use them now talking about levels this means that if you have say we have this our discriminator and then we have our real data and then you have maybe some data set of fake data then you could train your discriminator like the usual classifier in supervised learning then the next point is to add noise to the inputs and then decay over time from here we have this tips where they're actually not sure so well we may just keep them actually uh this is for conditional gains we'll not take this into consideration use dropouts and g in both training test phase so provide noise in the form of dropout as this generally leads to better results much thanks to the authors smith emily martin and michelle then apart from the vanishing gradient uh problem another very common problem would be that of mode collapse where the generator produces output or produces the same outputs even after training for several epochs and so now we're going to start with building all this again while taking into consideration the tips and tricks which we've just seen hi there and welcome to the session in which we shall practically train again to produce images like this one here we'll start with the imports and then we'll move on to prepare our data the data set we shall be using will be a celeb a data set which signifies celeb faces attributes data set now this is over 200 000 images of celebrities with 40 binary attribute annotations let's uh open up some of this here you could have some of these images from this file up and there we go you see we have this faces right here and so what we'll be doing will be to train our discriminator alongside with our generator such that our generator can generate image of faces which can be able or which can be realistic enough to be able to fool the discriminator to think that they are actually real faces this notebook is provided by Jessica Lee on Kegel and can be downloaded so let's uh go straight away to download this data set and then start with uh uh DCGAN modeling in order to download a data set from Kegel we'll be needing this Kegel.json file right here now this Kegel.json file can be gotten from Kegel by getting to your account and then creating a new API token so once you have that you'll get right here and then click on copy API command which when you paste out here you see you have Kegel data set download and you have the the the user name of the person who uploaded this data set to Kegel platform and then you have the data set name right here but before carrying out this data set download we'll start by installing Kegel we'll make this directory we'll copy this Kegel.json into this directory then we can now go ahead and download the data set from this command API command which we downloaded or which we copied rather and then we can now unzip this into some data set folder or directory which we specify so that said let's simply run the cell and everything should move on well let's take this off as you can see the data set has been downloaded and now we're extracting the files into this data set folder right here now that we have this successfully extracted into our data set folder as we could see we specify the batch size the image shape and the learning rate now from here we let's run the cell and then we move on to create our tensorflow data data set so here we have let's call this data set and then we specify this path now to get this path you could click open right here and then you see if you click open this it's going to take a while since we have 200 000 of different images here so let's just let that and then we copy this path there's a path that you specify in here and once you specify that you have you have a labeling mode which is known we have the image size which was specified already we have the batch size anyway let's let we could have your batch size but it doesn't matter as we'll see shortly anyway we have that and then from here we run this cell you see you could have data set we're getting an error tensorflow not defined let's run this oops um that's fine um next this we've run this already and then we run this now this should be fine let's check out our data set it should use that so as we can see we have 202 599 files belonging to one class and our data set has been bashed so we have a batch data set you can see the shape right here we have 64 by 64 by 3 images now the default is um 256 by 256 so if we if we do not specify this let's see what we get let's take this off take that off and run this again and check out on the image size you see here when we don't specify anything you have 256 by 256 okay let's get back and run this again then we're moving to process our data so right here what we're going to do is we're going to make sure this data lies between negative one and one and so that's why we have in here the image divided by 127.5 minus one and so this means that any value we get between zero and 255 let's say for example we have the value 255 we'll take 255 divided by 127.5 which is two then minus one which gives us one so that's how we we pre-process these images and then after pre-processing we're going to un-batch because we need to reshuffle or because we need to drop the remainder so we un-batch and then we use the batching of our tensorflow data i then from here we carry out some prefetching for a more efficient way of loading the data now from here you can visualize a single element in our data set there we go we have 4d in our train data set let's take a single element we could print out its shape the shape and there we go now we have this shape which is 128 by 64 by 64 by 3 as expected and we could go ahead and visualize some elements here so let's visualize four elements we could increase this definitely so we should visualize four elements of this for now now here we have the subplot um plot image and we could take off the axis so let's run that and then see what we get see here we have this um four different images let's reduce this a little now uh one thing we could do too is modify this here this um value of our array now the reason we want to modify this is because this value ranges between negative one and one whereas this plot that image takes in values of range zero to one so we're going to modify this so we move from negative one one to the range zero one and to do that we need to take whatever value we have in this range add one to it and then divide by two so let's take this off let me get back here we just have plus one then divided by two there we go let's run that again and there we go you see you have now the images are much clearer and you you do not get the messages which we're getting previously now we'll go ahead with the modeling and we're going to use this same architecture presented in the dis again paper so right here we have this 100 dimensional latent vector and then this is projected and reshaped into this four by four by 1024 tensor and then from here we apply the up sampling that's actually the conf 2d transpose to then get this other vector right here notice how we're getting from four by four to eight by eight and then from here again repeat the same process 16 by 16 32 by 32 and then finally 64 by 64 also notice that while the size of the outputs keep increasing from eight six from eight to 16 32 64 the depth is reduced and so we'll go from 1024 to 512 to 256 to 128 and finally we have three so we get back to the code and we specify our latent dimension which is equal to 100 let's rerun this cell right here there we go and then that should be fine and then we go ahead and build our generator so here we have our generator or which we'll build with a sequential model we have tf keras sequential model and then we start to pass in our different layers so here we have our input layer input which has a shape of the latent dime or latent dimension so here we have latent dimension there we go and that's fine so that's our first layer and then the next is our dense layer so this is our projection so we project this such that the output is having four times four times the latent deem number of out units so just as we had seen in the paper now once we have this we move to the next layer which is going to be the reshape layer so we go ahead and reshape such that because at this point we're having latent dim is 100 so we're having 16 times 100 at 16 100 outputs now we reshape this such that it is a three-dimensional tensor so here we have four by four by latent latent deem and that's it there we go so this is our next layer reshape from the reshape we go ahead to do the conv2d or the up sampling with the conv2d transpose as is the paper we have conv2d transpose and then we have 512 number of filters represent the number of output channels then the kernel size kernel size equal four now if we get back to the paper here let's get back to the paper you'll see that the kernel size isn't necessarily exactly equal four but one very important rule to follow when picking out the kernel size is that the kernel size has to be divisible by the number of strides so when we pick kernel size equal four we could have the strides to be equal to and the reason why we generally want that the kernel size is divisible by the number of strides is simply because of the quality of outputs will get generated by the generator when this isn't the case so always ensure that we have the kernel size divisible by number of strides now from here on we go on to apply batch normalization as suggested in the paper and also in the tips and tricks github repo so here we have batch norm and then from after the batch norm we have our leaky relu now for the leaky relu we have it takes in value of 0.2 so here we have 0.2 and that's it for this first part so we have this first block here which you could see in the paper this very first conv layer now once we have this here it could be repeated again so we just um copy this and then paste it out but modifying this depth so here we have 256 and then again oops here get back um and then here we paste this out and then here finally we have 128 okay so we have that uh for now we're not going to apply any dropout you could always feel free to apply that and see um the kind of results you will get so here we have this and then uh now we have that we we already set that's we have the let's get to the paper we have the first the second and the third conv layer now this final conv layer is to get an output which is like an image so we have 64 by 64 by three and so let's get back here um uh we will have no leaky relu or whatever like that just copy this out and this is out here okay so we have that and now we just come to the transpose we have um activation which is a tang so after the strikes here we specify the activation activation and this activation is uh um tang activation so just have tang and that's it pattern um equal same we'll also copy this out here so here we have pattern same and right here we have pattern same okay so that's it uh that should be for the generator i guess we've respected what we had in the tip and tricks and also um right here let's see we're told use relu activation and generator for all layers except for the output which is the tang we'll call this model the generator so we have our generator model and then let's run that and the next thing we want to do is summarize this so let's get a summary we check this out here and you see how we get in this year instead of three so let's go ahead and modify this right here uh this should be three let's run that again and get the summary okay so that's what we have then now we can move ahead to our discriminator so instead of generator right here we have this screaminator and the input we're gonna have here is gonna be 64 by 64 by three so it's actually in shape so in shape um the index in shape there we go by three and then instead of the conf 2d uh transpose layers we'll be using the conf 2d layer so here we have conf 2d and then the the depth increases instead here instead of decreasing as we had with the generator so here we go from 64 so we start with 64 and then we move on to 128 and so on and so forth for now let's take this off since we're dealing with a conf 2d and then again we have the kernel size which is divisible by the number of strides we have the leaky relu as we've seen already in the tips and tricks um take this off here but we'll make use of the batch num still so let's let's place this out and then we have batch batch normalization we have the batch normalization there we go um here is conf 2d but we increase the depth so we have 128 now that we have this depth increased we just simply copy this out and paste it over the next layers so uh for the next blocks because we consider this would be a block and this a block and this a block um now we move on to 256 batch norm leaky relu still and that's it now for the final or for the last conf layer let's just paste this out here let's take this off uh we have this last conf layer right here we'll give it a depth of one and then given that our discriminator call that our discriminator is a usual classifier which takes in the 64 by 64 by 3 input and then outputs a single value whether one or zero or a value between zero and one actually so you output a single value here so at this point we should be thinking of using some dense layer and then specifying that its output is going to have only one unit okay so let's take this up and then now from here we could um flatten so we flatten um what we get is output from the conf 2d layer and then after flattening we could uh have our dense layer one see here we have just one output and then uh the activation activation is sigmoid so that's it recall with a sigmoid with a sigmoid we have values or our inputs from negative infinity to positive infinity which have been mapped in the range zero zero to one actually and that's exactly what we need right here so we have that um that's fine let's take this off now now before we move on it should be noted that unlike the sigmoid which maps values between zero and one the tense function maps values between negative one and zero so this is from zero to one and then the tense maps values between negative one and one and that's what we use in the final layer for the for the discriminate for the generator and now for the discriminator we're using the sigmoid okay so we have that understood now we have here our discriminator okay so we have that let's run this cell and then finally we're gonna have our summary discriminator summary let's run this and then see what we get there we go we have our summary um everything looks fine and now we could go ahead start with our training and just like we had done previously we're gonna overwrite the training step so here we have a VAE model which we had built previously where we override at this train step right here and with this we're able to make use of methods like the model.fit so here instead of the VAE we have again let's get back we have again model this again model is made of a discriminator discriminator and a generator so let's replace the encoder and decoder by the discriminator and the generator respectively then here we have our discriminator discriminator and our generator self generator and self discriminator we can modify the compile method let's modify this compile method the compile method actually will take in the optimizer for the discriminator the optimizer for the generator and then the loss function so we have uh the optimizer let's say the optimizer g optimizer and then the loss function so that's our compiler method and then we also go ahead and define our discriminator loss metric and our generator loss metric we've taken all these three and then we've put this out here okay so that's it now let's have our d loss metric and our g loss metric and then from here we move on now to the training step for the training let's recall that we have a discriminator we have a generator and then this generator takes in a fake data here takes in a vector fake this uh this noise here and then generates fake data so takes noise our g takes noise generates fake data and then this fake data is then passed onto the discriminator which says whether it's a one or zero or gives the value between zero and one okay uh we also have our real data right here which is also going to be passed into our discriminator and it's also going to give a value between zero and one now we should be noted that we'll start with training the discriminator and when you're training the discriminator we're going to freeze the generator that's we do not update its parameters so we're going to just update the weights of the discriminator just like we had seen previously so the first thing we want to do here is to get our noise the noise is tf random normal so we have normal and then we specify since we have a normal distribution we want to specify its shape now the shape of this here would be um the shape will be our latent deem now let's let's have our latent deem we should have defined already so here we'll have latent dimension now given that we'll be working in batches we need to add the batch dimension so here we have batch size by latent deem now to obtain the batch size all we need to do here batch size is equal tf dot shape of our x batch x batch and then we get the zero value now from here we have the batch size we have the noise and then we're ready to fit this into a generator and then obtain the fake data then also make use of the real data which is basically this year because this is our real data remember our data set is made of 200 000 different images or faces of celebrities and what we've done is we've broken this up into batches of 128 so for every batch we're going to take this x batch here which is basically the real data and then we're also going to use make use of this noise generated fake data and then train our discriminator and so with that we have our fake data or let's say fake images fake images equals self generator which takes in the noise or let's just say some random noise here we have random noise random noise vector actually okay so we have this now we have our fake images that we have this year now the next and we also have the real so we can now dive into training a discriminator now as you may know that the discriminator's last function will take in the output from here that's output from the real data and compare it with one and then taking the output from those fakes and compare it with zero so we'll take the output from the real let's call this r and compare with one and then we take the output from the fakes let's call it f and then compare it with zero so getting back into the code right here let's change this let's call this real images and here we have real images real images there we go so here we want to have the predicted output or better still let's say real predictions real predictions so you have our real predictions here now the real predictions are gotten from taking in our discriminator discriminator and then this discriminator actually takes in real images so we're presenting what's going on right here so we have this real which gets into our discriminator and then we output the real predictions which would then compare with the value one so that's it we can now go ahead and compute the loss so we'll have discriminator loss for real is equal our last function which we're going to pass in and then this last function takes in the real predictions and uh once so oh yeah we will have both the real predictions and the real levels uh the real levels as we've said already are the ones so it's basically this one year since we we haven't batches of images we we have several ones of size the batch size so here we have real um we there we go we have real um levels see tf ones and then we specify its size or better still its shape so here we have the the shape which is batch size by one see that batch size by one and the reason why we have batch size by one is simply because we have an output which takes in just one single value while this output will all for the real level will be equal one and we have the batch size okay so we have that now the next thing we want to have here are the fake levels now this fake levels is going to be this zeros right here so we have uh sorry we have zeros and then we have batch size and one okay so that's it now we we have our real levels and we have our fake levels and we will take the real levels we'll take the real predictions that is we take we we get what the model thinks about a particular input that's all the classification and then compare it with the real levels the real levels is once because we expect that the model should take in a real input and then know that it is a real input and that and that means that it should output a one when it takes in a real data and if it outputs a value different from one then the loss is going to be greater than zero whereas if it's exactly equal one the loss is going to be zero and our m years minimizes loss now we have that and the next thing we want to do is repeat this but this time around for the fake predictions so here we have fake predictions now the the first step is we have real data getting into the discriminator and the other we have fake data getting into the discriminator so here we have fake predictions and this time around it doesn't just take the real images but it takes the generated images so we have self generator and it takes in noise actually so we should take in a random noise which is this one right here so it takes some random noise and then it outputs fake images but since we'll define this already here which we could just make use of it here so let here we have fake images and there we go so here we have the discriminator which takes in a fake image and then gives us a fake prediction and we're going to compare this fake prediction with zero so we expect that the the fake predictions should be zero if not the loss isn't going to be equal to zero so here instead of real levels we have fake levels so we comparing zero with what the model is going to predict or the output of the discriminator oh yeah we have discriminator fake we have here fake that's it i think that's okay we the same loss function actually is a binary cross entropy loss and once we have this now we could have could define the loss to be equal the loss real plus the loss fake since that's basically a combination of this two losses right here now before you move on you could recall in the tips and tricks we saw the level smoothing now here we have our levels our real and our fake levels let's separate this and now what we're going to do is instead of taking a one we'll take values around one so we take we add plus 0.25 times some random value between negative one and one so basically what we're saying here is we want to take this one and then add it plus a value in the range of negative 0.25 and 25 on 0.25 so this means that now instead of having the level to be fixed at one we would have the level to be between because one minus 0.25 is 0.75 so we between 0.75 and 1.25 instead of just one so that's how what our level will be now and then for the zeros since we we don't want to have negative values we'll take the zero plus some random value between zero and 0.25 so instead of zero we'll have some random value between zero and 0.25 okay so that said what we'll do now is we'll get right here we have tf random uniform and then we specify the mean vowel which is negative one and then the max vowel which is one now specifying this means we're going from negative one to one and then multiplying by 0.25 means we're going from negative 0.25 to 0.25 so that's basically it and then also we specify its shape so we have um the batch size by one there we go now we're just going to copy this out and then paste this right here so we don't want to have negative numbers so we start from zero instead so here we have zero to one but by default the values are already from zero to one so we could take this off okay so that's it everything looks fine i think everything is done for our discriminator now we have in this let's take this off we have our loss and that's it okay so that's basically it we now move on to our partial derivatives here we take in our d loss and then we are going to update um the discriminator so here we have this screen this screen meter dot trainable weights then for the optimizer is the optimizer which we have specified already right here so here we have instead of this optimizer we have our d optimizer which we're going to specify so we have the optimizer uh takes in the partial derivatives and our trainable weights again here we are training only the discriminator so we have discrete meter that's it so here we go we have our model our again the discriminator the trainable weights and we repeat the same process here now uh this is self okay so that's it uh this should be fine now and then next step we do is we're going to do this same but for the generator so what we'll do now is we're going to again sample some noise random noise here we copy this code out and then paste it out after this okay there we go so we have this random noise right here random noise and then from this random noise we have our fake images um cell generator and it takes in the random noise okay now we're gonna have the same again so we're gonna uh make use of the gradient tape as we've done already with the discriminator so paste that out here and then here we have we'll be working instead with a generator um this is discriminator discriminate let's go back discriminator okay so as we've said this is let's let's just comment this let's find out generator and then right here we have the discrete discriminator okay so that's a discriminator and now for the generator so as we were saying we have those fake images we have our random noise and we make use of the random noise to generate the fake image but this time around we want to follow the discriminator so instead of expecting our fake levels to be zeros as we had here this time around our fake levels will be ones and we'll obviously not have anything to do with the real data so let's take this off and then get back here now we have the fake levels which which is equal one and it happens that they have actually been flipped so let's let's um get down here and then paste this out here so here we have flipped flipped fake levels which are actually ones instead of zeros remember we've seen this already so we have that and we're not going to do any level smoothing right here so we have that and the next thing we want to do is start with our recording of the gradients now yeah let's take this off we have our fake predictions we have um yeah we go discriminator takes in the fake images and that's it now here we have the flipped flipped fake levels and then we have our fake predictions fake predictions i guess if we we should have had we should have made an error here this is actually fake fake predictions we're comparing the fake predictions with the fake levels and then here we're comparing the fake predictions with the flipped fake levels so that's it um that should be fine here we have g so this is our g loss g loss that's not like g loss fake or g loss real we just have g loss and that should be it partial derivatives self generator that's it radar and then here we have updated our generator so we're not making we're not updating the parameters of the discriminator here now here we have g optimizer that's it okay so um if that's okay we have now to update the different states so we have d loss metric we update the state and we pass in the d loss and then we repeat the same for the g loss that's the generator loss so here we have g and then here we have g then for loss we have g loss g loss metric result metric that's it and then here we have d loss self d loss metric and then result okay let's now run the cell and normally everything should work fine now we move on to define the number of epochs so we're gonna work for 20 epochs and then let's get back here we define our gain which is those gain we've just defined and then takes in the discriminator and the generator which we defined already uh from here we go ahead and compile the model so we have again compile and then we specify the optimizer the d optimizer optimizer which is uh the atom optimizer optimizers and then atom now this atom optimizer will be or with a learning rate learning rate which was specified already at the beginning learning rate equal two times standard negative four which is specified at the beginning so we have your learning rate and then beta one beta one equal zero point five now we'll put the same for the generator let's get back here copy that and then paste it out here so this is for our generator now now you notice that there's no there's no major difference actually there's no difference we just use the same optimizer now we have that the next thing we want to do is pass in our loss function so we have your loss function which is equal our binary cross entropy loss so we have losses dot or binary cross entropy and that's it so we have this set we could run this now and then go ahead to train our model by calling on gain dot fit method or just model of fit method so we have history equal gain dot fit and then we pass in our train data set we'll start with say 10 or let's let's take just 100 elements first and then here the number of epochs equal epochs we should define already and then we have some callback so will you make use of this callback and you already see the advantage of overriding the train step method as now we could just define our callback and then pass it in here and the job is done so we will have this callback which is going to show us the generated images at the end of an epoch so let's call it show image and it's going to take in latent deem okay so that's it everything looks fine now the next thing we want to do is define this show image callback right here now here let's let's define this callback just above here so we have our show image callback and then we get the latent dimension and then we specify on epoch and so at the end of every epoch we are going to run this code right here now what's going on here is simple we have our model and then we have the generator we take in some random noise we pass it into our generator and then we try to see what the model is generating so this means that when we're training or initially we have some output or some fake data generated by the generator and then after an epoch we want to see what the model is generating as we keep on training our whole complete model so this that this is very important as we could already be able to debug and understand what's going on so this means that in a case where the model is say for example generating the same kinds of output or generating some outputs which are clearly not the type of output which we expect to get or whose distribution is very far away from that of the real data then we will have to take some measures so it's very important to work with these kinds of callbacks as they already permit us to debug our whole model training process so that said we just have to specify the figure size here and then what we're doing is we we have in this different subplots because we see we have 64 we could you could reduce this or it could increase just depends on you you could just pick whatever you want to pick here so you could generate a certain number of images here n is equal to 6 so we're generating 6 by 6 that's 36 images so we could change this to 36 and then now for each and every subplot we're going to show the image you see this out this outcome from here so it's from the generator and that's it now we're going to save this figure in some in some directory which we could visualize so let's modify this let's take here to be 36 um that's fine let's run this we have that show image that's fine um everything looks fine now let's go ahead and uh start with the train so that we could even reduce this to just stand so that we could be able to notice any errors quickly and then now train on the full data set we're getting this error unexpected keyword argument mean vowel this actually mean let's get back here without the underscore so this is here you have this main vowel and here we have max vowel now you could feel free to check out the documentation and you should find the exact syntax so let's run this again and then start with the training oh getting these arrows well we are told that no no gradients are provided for any variable and when you look at this you will notice that we have come to the transposes this means that most probably this arrow is coming from the generator so let's get back here and make sure everything is okay uh one thing we notice here already is that this should be g so here should be g and what do we have again here um yeah it looks fine so let's run this again and then see what we get we still get this arrow which again shows that it's coming from the generator we get back here and what do we notice we notice here that we we get these fake images out of the scope of our gradient tape so what we have to do is instead directly call this in here so we we we want to update the parameters of the generator so it has to be in this gradient tape scope and not outside so let's take this off take that off and one question you may be asking yourself is why is it possible to just do this here let's have this out but not in the generator and the simple answer is for the discriminator this generator isn't updated so it doesn't matter if it's in here or not whereas for the generator it actually matters so we we have to make sure it's in here so that's it let's run this again and then see what we get so that's it training started we told no such file directory generated and all of that anyway let's let's make this directory generated and then we run this again there we go didn't seem to be working fine you see that struggling to generate some images let's open up our generated year you see we have the different files let's check out on say the 18th output you see this already so you see it's struggling already to produce you see this image here this one year looks already like a human face though it's still struggling a lot so we have that let's let's check out on say the 19th there we go okay so that's it let's close this hopefully there's no issue with connection let's get back up and then if those images aren't very visible you can retrain and go for or take many more samples so here we're dealing with 100 we take 100 and then after let's check this out after six epochs you see the kind of results we get it's the images we get already look like humans now let's modify this code let's stop the training and then let's modify the code such that we do not flip the levels there we go we could get back to the flip thick levels and then instead of ones we just take the thick levels so let's let's take zeros so if we have this we're going to compare it with when we actually do the flip leveling so let's run this um let's uh run the cells and then we start with the train after nine epochs we can now check out those generated images let's let's open up the zero open up the third the feet and then say the eight um let's see what we get here see see this our generators experiencing vanishing ingredients and that's why we're getting these kinds of horrible outputs so let's um again stop the training let's stop the training and then let's get back to our models what we'll do now is instead of using the leaky relu we just use the relu so let's let's change this activation and set it to relu there we go let's um simply places out everywhere and then take off the leaky relu and then see the kind of output we would get in case we we we used the the the relu itself instead of the leaky relu um let's take that off everything looks fine now let's get back into the generator and then repeat the same so space is out here relu activation take off leaky relu take off leaky relu and then rerun this again um this should be fine now let's run that and then see what we get okay now that train has been going on for a while now we could open up the zero let's say open up the second the feet the seventh um and then the tent so we could uh look at this now what do you notice you see we're getting practically nothing as output right here and that's simply because we using the relu instead of the leaky relu and as you can see sparsity isn't great for generating images so you have to be very careful with that now let's stop the straining um stop the training and then uh get back to what we had before um we have this yeah this should be fine uh no discriminator then right here get back again and oops let's get back then we restart the train again and then see what we get now that train has been going on for a while let's go ahead and open this up one there's zero take out two for example five um seven and say eight so let's check out what the model is outputting see um there we go we can see that with the normal relu it isn't doing that bad either so in this specific example using the leaky relu isn't maybe oh that necessary okay so now we've looked at the effect of not using this leaky relu um you could also or take off the batch norm and see how that affects the kinds of outputs you get from the generator um here we get an error let's run this again okay that's fine now let's go ahead and uh restart the training by this time around with a full data set so let's take this off and then start with the training after training for over 20 epochs here are the kinds of outputs we shall be getting in our deep learning for image generation course we delve deep into how to create even much higher quality outputs like this one for example which was created with a diffusion model or this other one which we created with a program that's it for this section on image generation with the variational auto encoders and the generative adversarial neural networks so we've come to the end of this course and if there's one thing we suggest you're doing is working on as many projects as possible on your own so you could make use of tools like tensorflow hugging face 1b onyx just to name a few to build and deploy image segmentation models object content models text detection text recognition depth estimation image search engines pose estimation face recognition drawsness detection license plate recognition object tracking and video classification and if you want to take a deep dive into natural language processing image generation or object detection you could check out those different courses on the neural learn.ai platform with that said we should very best as you move forward in your career