Fine-Tuning MobileNet on Custom Data Set with TensorFlow's Keras API

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey and maybe from V Blizzard in this episode we'll go through the process of fine-tuning mobile net for a custom data set alright so we are jumping right back into our Jupiter notebook from last time so make sure your code is in place from thin since we will be building it directly on that now so the first thing we're going to do is we are going to import mobile net just as we did in the first mobile net episode by calling TF Harris the applications not mobile in that mobile net remember if this is your first time running this line then you will need an internet connection to download it from the internet now let's just take a look at the model that we downloaded so by calling model dot summary we have this output here that is showing us all of these lovely layers included in mobile net so this is just to get a general idea of the model because we will be fine-tuning it so the fine tuning process now is going to start out with us getting all of the layers up to the sixth to last layer so if we scroll up and look at our output one two three four five six so we are going to get all of these layers up to this layer and everything else is not going to be included so all these layers are what we are going to keep and transfer into a new model our new fine-tuned model and we are not going to include these last five layers and this is just a choice that I came to after doing a little experimenting and testing the number of layers that you choose to include versus not include whenever you're fine-tuning a model is going to come through experimentation and personal choice so for for us we are getting everything from this layer and above and we are going to keep that in our new fine-tuned model so let's scroll down so we're doing that by calling mobile layers Pass in that sixth to last layer and output then we are going to create a variable called output and we're going to set this equal to a dense layer with ten units so this is going to be our output layer that's why it's called output and ten units due to the nature of our classes ranging zero through nine and this as per usual is going to be followed by a softmax activation function to give us a probability distribution among those ten outputs now this looks a little strange so we're calling this and then we're like putting this X variable next to it so what's this about well the mobile net model is actually a functional model so this is from the functional API from Charis not the sequential API so we kind of touched on this a little bit earlier whenever we fine-tuned at vgg 16 we saw that vgg 16 was also indeed a functional model but when we fine-tuned it we iterated over each of the layers and added them to a sequential model at that point because we weren't ready to introduce the functional model yet so here we are going to continue working with a functional model type so that's why we are basically taking all of the layers here up to the sixth to last and whenever we create this output layer and then call this the previous layers stored in X here that is the way that the functional model works we're basically saying to this output layer pass all of the previous layers that we have stored in X up to the sixth to last layer and mobile net and then we can create the model using these two pieces X and output by saying by calling model which is indeed a functional model when specified this way and specifying inputs equals mobile input so this is taking the input from the original mobile net model and outputs equals output so at this point output is all of the mobile map model up until the six to last layer plus this dense output layer all right so let's run these two cells to create our new model all right so our new models now been created so the next thing we're going to do is we're going to go through and freeze some layers so through some experimentation of my own I have found that if we freeze all except for the last 23 layers this appears to yield some decent results so 23 is not a magic number here play with this yourself and let me know if you get better results but basically what we're doing here is we're going through all the layers in the model and by default they are all trainable so we're saying that we want only the last 23 layers to be trainable all the layers except for the last 23 make those not trainable and just so that you understand relatively speaking there are 88 total layers in the original mobile net model and so we're saying that we don't want to train or that we only want to train the last 23 layers in our new model that we built just above recall this is much more than we earlier trained with our fine-tuned vgg 16 model where we only trained the output layer so let's go ahead and run that now and now let's look at a summary of our new fine-tuned model so here if we just glance it looks basically the same as what we saw from our original summary but we will see here that now all now our model ends with this global average pooling 2d layer which recall before was the sixth to last layer where I said that we would include that layer and everything above it so all the layers below the global average pooling layer that we previously saw in the original mobile net summary are now gone and instead of an output layer with 1,000 classes we now have an output layer with ten classes from the or corresponding to the ten potential output that we have for our new sign language stitches dataset if we compare the total parameters and how they're split amongst trainable and non trainable parameters in this model with the original mobile mobile that model then we will see a difference there as well all right so now this model has been built we are ready to train the model so the code here is nothing new we are compiling the model in the same exact fashion using the atom optimizer zero point zero zero zero point zero zero zero one a learning rate categorical categorical cross entropy loss and accuracy as our metric so this we have probably seen two million times up to this point in this course so that's exactly the same additionally we have exactly the same fit function that we are running to train the model so we're passing in our train batches as our data set we are passing in validation batches as our validation data and we are running this for ten a pouts actually we're going to go ahead and run this for 30 I had 10 here just to save time earlier from testing but we're going to run this for 30 box and we are going to set verbose equal to two to get the most verbose output now let's see what happens all right so our model just finished training over 30 epochs so let's check out the results and if you see this output and you're wondering why the first output to 90 seconds and then we got for the first epoch took 90 seconds and then we got it down to 5 seconds just a few later it's because I realized that I was running on battery and not on my laptop being plugged in so once we plugged the laptop in it beefed up and started running much quicker so let's scroll down and look basically just like how we ended here so we are at 100% accuracy on our training set and 92 percent accuracy on our validation set so that is pretty frickin great considering the fact that this is a completely new data set not having images that were included in the original image net model so these are pretty good results a little bit overfitting here since our validation accuracy is lower than our training accuracy so if we wanted to fix that then we can take some necessary steps to combat that overfitting issue but if we look only at the earlier epochs to see what kind of story is being told here on our first epoch our training accuracy actually starts out at 74% among ten classes so that is not bad for a starting point and we quickly get to 100% on our training accuracy just within four epochs so that's great but you can see that at that point we're only at 81% accuracy for our validation set so we have a decent amount of overfitting going on earlier on here and then as we progress through the training process that overfitting is becoming less and less of a problem and you can see that we actually at this point if we just look at the last eight Apoc set of run here we've not even stalled out yet on our validation loss it's not stalled out in terms of decreasing and our valve our validation accuracy has not sought out in terms of increasing so perhaps just running more epochs on this data will eradicate the overfitting problem otherwise you can do some tuning yourself changing some hyper parameters around do a different structure of fine-tuning on the model so freeze more or less than the last 23 layers for during the fine-tuning process or just experiment yourself and if you come up with something that yields better results than this then put it in the comments and let us know so we have one last thing we want to do with our fine-tuned mobile net model and that is use it on our test set so we are familiar you know the drill with this procedure at this point we have done it several times so we are now going to get predictions from the model on our test set and then we are going to plot those predictions to a confusion matrix so we are first going to get our true labels by calling tests Baptist up classes or then going to gain predictions from the model by calling model dot predict and passing in our test set stored in test batches here setting verbose equal to zero because we do not want to see any output from the predictions and now we are creating our confusion matrix using scikit-learn its confusion matrix that we imported earlier we are setting our true labels equal to the test labels that we define just here above we are setting our predicted labels to the Arg max of our predictions across axis 1 and now we are going to check out our class indices of the test batches just to make sure they are what we think they are and they are of course classes labeled as 0 through 9 so we define our labels for our confusion matrix here accordingly and then we call our plot confusion matrix that we brought in earlier in the notebook and that we have used 17,000 times up to this point in this course and we are passing in our confusion matrix for what to plot we are passing in our labels that we want to correspond to our confusion matrix and giving our confusion matrix the very general title of confusion matrix because hey that's what it is so let's plot this oh no so plot confusion matrix is not define well it definitely is just somewhere in this notebook I must have skipped over it here we go nope here we are alright so here's where plot confusion matrix is defined let's bring that in now it is defined and run back here so looking from the top left to the bottom right diagonal we see that the model appears to have done pretty well so we have ten classes total with 5 samples per class and we see that we have mostly we have all fours and fives across this diagonal meaning that most of the time the model predicted correctly so for example for a 9 5 times out five the model predicted an image was a nine when it actually was for an eight however only four out of five times did the model correctly predict looks like one of the times the model let's see predicted a one when it should have been an eight but in total we've got one two three four five incorrect predictions out of fifty total so that gives us a 90 percent accuracy rate on our test set which is not surprising for us given the accuracy that we saw right above on our validation set so hopefully this series on a mobile net insight to how we can fine tune models for custom data set and use transfer learning to use the information that a model gained from its original training set on a completely new task in the future by the way we are currently in Vietnam filming this episode if you didn't know we also have a vlog channel where we document our travels and share a little bit more about ourselves so check that out at people's our vlog on YouTube also be sure to check out the corresponding blog for this episode along with other resources available on V Plus or comm and check out the people's r-type mind where you can gain exclusive access to perks and rewards thanks for contributing to collective intelligence I'll see you next time [Music] you [Music]

Info

Channel: deeplizard

Views: 22,088

Rating: undefined out of 5

Keywords: deep learning, pytorch, cuda, gpu, cudnn, nvidia, training, train, activation function, AI, artificial intelligence, artificial neural network, autoencoders, batch normalization, clustering, CNN, convolutional neural network, data augmentation, education, Tensorflow.js, fine-tune, image classification, Keras, learning, machine learning, neural net, neural network, Python, relu, Sequential model, SGD, supervised learning, Tensorflow, transfer learning, tutorial, unsupervised learning, TFJS

Id: Zrt76AIbeh4

Channel Id: undefined

Length: 15min 10sec (910 seconds)

Published: Mon Sep 28 2020