Hey, I'm Andy from deep lizard. And in this course, we're going to learn how
to use Kerris, and neural network API written in Python and integrated with TensorFlow. Throughout the course, each lesson will focus
on a specific deep learning concept, and show the full implementation in code using the
keras API. We'll be starting with the absolute basics
of learning how to organize and pre processed data, and then we'll move on to building and
training our own artificial neural networks. And some of these networks will be built from
scratch, and others will be pre trained and state of the art models for which we'll fine
tune to use on our own custom data sets. Now let's discuss the prerequisites needed
to follow along with this course. From a knowledge standpoint, we'll give brief
introductions of each deep learning concept that we are going to work with before we go
through the code implementation. But if you're an absolute beginner to deep
learning, then we first recommend you go through the deep learning fundamentals scores on the
tables or Comm. Or if you're super eager to jump into the
code, then you can simultaneously take this course and the deep learning fundamentals
course, the deep learning fundamentals course will give you all the knowledge you need to
know to get acquainted with major deep learning concepts for which then you can come back
and implement and code using the keras API. And this course, in regard to coding prerequisites,
just some basic programming skills and some Python experience or all that's needed. On peoples.com. You can also find the deep learning learning
path. So you can see where this Kerris course falls
amidst all the deep lizard deep learning content. Now let's discuss the Course Resources. So aside from the videos here on YouTube,
you will also have video and text resources available on peoples are calm. And actually each episode has its own corresponding
blog and a quiz available for you to take and test your own knowledge. And you can actually contribute your own quiz
questions as well. And you can see how to do that on the corresponding
blog for each episode. Additionally, all of the code resources used
in this course, are regularly tested and maintained, including updates and bug fixes when needed. Download access to code files used in this
course, are available to members of the deep lizard hive mind. So you can check out more about that on tables
or calm as well. Alright, so now that we know what this course
is about, and what resources we have available, along with the prerequisites needed to get
started, let's now talk a little bit more about Kerris itself. Kerris was developed with a focus on enabling
fast user experimentation. So this allows us to go from idea to implementation
and very few steps. Aside from this benefit, users often wonder
why choose Kerris as the neural network API to learn or in general, which neural network
API should they learn? Our general advice is to not commit yourself
just to only learning one and sticking with that one forever, we recommend to learn multiple
neural network API's. And the idea is that once you have a fundamental
understanding of the underlying concepts, then the minor syntactical and implementation
differences between the neural network API's shouldn't really be that hard to catch on
to once you have at least one under your belt already, especially for job prospects. Knowing more than one neural network API will
show your experience and allow you to compare and contrast the differences between API's
and share your opinions for why you think that certain API's may be better for certain
problems, and others for other problems. Being able to demonstrate this will make you
a much more valuable candidate. Now, we previously touched on the fact that
Kerris is integrated with TensorFlow. So let's discuss that more. Now. Historically, Kerris was a high level neural
network API that you could configure to run against one of three separate lower level
API's. And those lower level API's were TensorFlow
viano and cntk. Later, though, Kerris became fully integrated
with the TensorFlow API, and is no longer a separate library that you can choose to
run against one of the three back end engines that we discussed previously. So it's important to understand that cares
is now completely integrated with the TensorFlow API. But in this course, we are going to be focusing
on making use solely of that high level Kerris API without necessarily making much use of
the lower level TensorFlow API. Now, before we can start working with cares,
then we first have to obviously get it downloaded and installed onto our machines. And because Kerris is fully integrated with
TensorFlow, we can do that by just installing TensorFlow Kerris will come completely packaged
with the TensorFlow installation. So the installation procedure is as simple
as running pip install TensorFlow from your command line, you might just want to check
out the system requirements on tensor flows website to make sure that your specific system
we The requirements needed for TensorFlow to install. Alright, so we have one last talking point
before we can get into the actual meat of the coding. And that is about GPU support. So the first important thing to note is that
a GPU is not required to follow this course, if you're running your machine only on a CPU,
then that is totally fine for what we'll be doing in the course. If however, you do want to run your code on
a GPU, then you can do so pretty easily. After you get through the setup process for
setting up your GPU to work with TensorFlow, we have a full guide on how to get the GPU
setup to work with TensorFlow on deep lizard Comm. So if you are interested in doing that, then
head over there to go through those steps. But actually, I recommend just going through
the course with a CPU if you're not already set up with a GPU. And like I said, the all the code will work
completely fine, run totally fine using only a CPU. But then after the fact, after you go through
the course successfully, then get the steps in order to work with the GPU if you have
one, and then run all the code that you have in place from earlier, run it on the GPU,
the second go round and just kind of look and see the kind of efficiency and speed ups
that you'll see in the code. Alright, so that's it for the Kerris introduction. Now we're finally ready to jump in to the
code. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the dibbles at hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now, let's move on to the next episode. Hey, I may be from deep lizard. In this episode, we'll learn how to prepare
and process numerical data that will later use to train our very first artificial neural
network. To train any neural network and a supervised
learning task, we first need a data set of samples along with the corresponding labels
for those samples. When referring to the word samples, we're
simply just talking about the underlying data set, where each data point in the set is referred
to as a sample. If we were to train a model to do sentiment
analysis on headlines from a media source, for example, then the labels that correspond
to each headline sample would be positive or negative. Or say that we were training an artificial
neural network to identify images as cats or dogs, well, then that would mean that each
image would have a corresponding label of cat or dog. Note that in deep learning, you may also hear
samples referred to as inputs or input data. And you may hear labels referred to as targets
or target data. When preparing a data set, we first need to
understand the task for which the data will be used. In our example, we'll be using our data set
to train an artificial neural network. So once we understand this, then we can understand
which format the data needs to be in, in order for us to be able to pass the data to the
network. The first type of neural network that we'll
be working with is called a sequential model from the cares API. And we'll discuss more details about the sequential
model in a future episode. But for now, we just need to understand what
type of data format the sequential model expects, so that we can prepare our dataset accordingly. The sequential model receives data during
training whenever we call the fit function on it. And again, we're going to go into more details
about this function in the future. But for now, let's check out what type of
data format the fit function expects. So if we look at the fit documentation here,
on tensor flows website, we can see the first two parameters that the fit function expects
are x and y. So x is our input data, our samples, in other
words, so this function expects x our input data to be in a NumPy array, a TensorFlow
tensor, a dict, mapping, a TF dot data data set or a karass generator. So if you're not familiar with all of these
data types, that's okay, because for our first example, we are going to organize our data
to be in a NumPy array. So the first option here in the documentation,
so this is for our input samples, but also the Y parameter that is expected by the fit
function is our data set that contains the corresponding labels for our samples. So the target data. Now the requirement for why is that it is
formatted as one of the above formats that we just discussed for x, but y needs to be
in the same format as x. So we can't have our samples contained in
a NumPy array, for example, and then have y Our target data or our labels for those
samples and the TensorFlow tensor. So the format of x and y, both need to match. And we're going to be putting both of those
into NumPy arrays. Alright, so now we know the data format that
the model expects. But there's another reason that we may want
to transform or process our data. And that is to put it in a format that will
make it easier or more efficient for the model to learn from. And we can do that with data normalization
or standardization techniques, data processing, and deep learning will vary greatly depending
on the type of data that we're working with. So to start out, we are going to work with
a very simple numerical data set to train our model. And later, we'll get exposure to working with
different types of data as well. Alright, so now we're ready to prepare and
process our first data set. So we are in our Jupyter Notebook. And the first step is to import all the packages
that we'll be making use of including NumPy random and some modules from scikit learn. Next, we will create two list one called train
samples, one called train labels, and these lists will hold the corresponding samples
and labels for our data set. Now about this data set, we are going to be
working with a very simple numerical data set. And so for this task, we're actually going
to create the data ourselves. Later, we'll work with more practical examples
and realistic ones where where we won't be creating the data, but instead downloading
it from some external source. But for now, we're going to create this data
ourselves for which will train our first artificial neural network. So as a motivation for this kind of dummy
data, we have this background story to give us an idea of what this data is all about. So let's suppose that an experimental drug
was tested on individuals ranging from age 13 to 100. In a clinical trial, this trial had 2100 participants
total, half of these participants were under the age of 65. And half were 65 years or older. And the conclusions from this trial was that
around 95% of the patients who were in the older population, so 65 or older, the 95%
of those patients experienced side effects. And around 95% of patients who were under
65 years old, experienced no side effects. Okay, so this is a very simplistic data set. And so that is the background story. Now in this cell, here, we're going to go
through the process of actually creating that data set. So what this is, is this first for loop is
going to generate both the approximately 5% of younger individuals who did experience
side effects, and the 5% of older individuals who did not experience side effects. So within this first for loop, we are first
generating a random number or a random integer rather between 13 and 64. And that is constituting as a younger individual
who is under 65 years of age, or yet who's under 65 years of age. And we are going to then append this number
to the train samples list. And then we append a one to the train labels
list. Now a one is representing the fact that a
patient did experience side effects. And a zero would represent a patient who did
not experience side effects. So then similarly, we jumped down to the next
line. And we are generating a random integer between
65 in 100, to represent the older population. And we are doing that Remember, this is in
our first for loop. So it's only running 50 times. So this is kind of the outlier group, these
5% of older individuals who did not experience side effects. So we then take that sample appended to the
train samples list, and append a zero to the corresponding train labels list. Since these patients were the older patients
who did not experience side effects. So then if we jumped down to the next for
loop, very similarly, we have pretty much the same code here, except for this is the
bulk of the group. And this for loop, we're running this for
the 95% of younger individuals who did not experience side effects, as well as the 95%
of older individuals who did experience side effects. So generating a random number between 13 and
six before and then appending that number, representing the age of the younger population
to the train samples list, and appending the label of zero. Since these individuals did not experience
side effects, we're appending zero to the train labels list. Similarly, we do the same thing for the older
individuals from 65 to 100. Except for since the majority of these guys
did experience side effects, we are appending a one to the train The labels list. So just to summarize, we have the samples
list that contains a bunch of integers ranging from 13 to 100. And then we have the train labels list that
has the labels that correspond to each of these individuals with the ages 13 to 100. The labels correspond to whether or not these
individuals experience side effects. So a samples list containing ages, a labels
list containing zeros and ones representing side effects, or no side effects. And just to get a visualization of the samples,
here, we are printing out all of the samples in our list. And we see that these are just all integers,
like we'd expect, ranging from 13 to 100. And then correspondingly, if we run our train
labels list and print out all of the data there, then we can see this list contains
a bunch of zeros and ones. Alright, so now the next step is to take these
lists and then start processing them. So we have our data generated. Now we need to process it to be in the format
for which we saw the fit function expects, and we discussed the fact that we are going
to be passing this data as NumPy arrays to the fit function. So our next step is to go ahead and do that
transformation here, where we are taking the train labels list and making that now a NumPy
array. Similarly, doing the same thing with the train
samples list. And then we use the shuffle function to shuffle
both are trained labels and trained samples respective to each other so that we can get
rid of any imposed order from the data generation process. Okay, so now the data is in the NumPy array
format that is expected by the fit function. But as mentioned earlier, there's another
reason that we might want to do further processing on the data. And that is to either normalize or standardize
it so that we can get it in such a way that the training of the neural network might become
quicker or more efficient. And so that's what we're doing in this cell. Now we are using this min max scalar object
to create a feature range ranging from zero to one, which we'll then use in this next
line to rescale our data from the current scale of 13 to 100, down to a scale of zero
to one, and then this reshaping that we're doing here is just a formality because the
fit transform function doesn't accept one D data by default. Since our data is one dimensional, we have
to reshape it in this way to be able to pass it to the fit transform function. So now if we print out the elements in our
new scaled train samples variable, which that's what we're calling our new scaled samples,
then we can print them out and see that now the individual elements are no longer integers
ranging from 13 to 100. But instead, we have these values ranging
anywhere between zero and one. So at this point, we have generated some raw
data, and then processed it to be in a NumPy array format that our model will expect. And then rescale the data to be on a scale
between zero and one, and an upcoming episode, we'll use this data to train our first artificial
neural network, be sure to check out the blog and other resources available for this episode
on deep lizard calm, as well as the deep lizard hive mind where you can gain access to exclusive
perks and rewards. Thanks for contributing to collective intelligence. Now, let's move on to the next episode. Hey, I'm Andy from Blizzard. And this episode will demonstrate how to create
an artificial neural network using a sequential model from the keras API integrated within
tensor flow. In the last episode, we generated data from
an imagined clinical trial. And now we'll create an artificial neural
network for which we can train on this data. Alright, so first things first, we need to
import all of the TensorFlow modules that we'll be making use of to build our first
model. And that includes everything that you see
here except for actually the last two of atom and categorical cross and vitry. categorical cross entropy, rather, those two
are going to be used when we train the model, not when we build it. But we're going ahead and bringing all the
imports in now. And then next, if you are running this code
on a GPU, then you can run this cell which will allow you to make sure that TensorFlow
is correctly identifying your GPU as well as enable memory growth. And there's a few lines on the blog where
you can check out what exactly that means and why you might want to do this. But if you are running a GPU then go ahead
and run this cell. So next, this is the The actual model that
we are building this is a sequential model. And that is kind of the most simplest type
of model that you can build using Kerris or TensorFlow. And a sequential model can be described as
a linear stack of layers. So if you look at how we're creating the model
here, that's exactly what it looks like. So we are initializing the model as an instance
of the sequential class. And we are passing in a list of layers here. Now, it's important to note that this first
dense layer that we're looking at, that is actually the second layer overall. So this is the first hidden layer. And that's because the input layer, we're
not explicitly defining using Kerris, the input data is what creates the input layer
itself. So the way that the model knows what type
of input data to expect, or the shape of the input data, rather, is through this input
shape parameter that we pass to our first dense layer. So through this, the model understands the
shape of the input data that it should expect. And then therefore it accepts that shape,
have input data, and then passes that data to the first hidden layer, which is this dense
layer here in our case, now, we are telling this dense layer that we want it to have 16
units. These units are also otherwise known as nodes
or neurons. And the choice of 16 here is actually pretty
arbitrary. This model overall is very simple. And with the arbitrary choice of nodes here,
it's actually going to be pretty hard to create a simple model, at least, that won't do a
good job at classifying this data, just given the simplicity of the data itself. So we understand that we're passing in, or
that we're specifying 16 units for this first hidden layer, we are specifying the input
shape, so that the model knows the shape of the input data to expect and then we are stating
that we want the relu activation function to follow this dense layer. Now, this input shape parameter is only specified
for our first hidden layer. So after that, we have one more hidden, dense
layer. This time, we are arbitrarily setting the
number of units for this dense layer to be 32. And again, we're following the layer by the
activation function value. And then lastly, we specify our last output
layer or our last layer, which is our output layer. This is another dense layer, this time with
only two units, and that is corresponding to the two possible output classes. Either a patient did experience side effects,
or the patient did not experience side effects. And we're following this output layer with
the softmax function, which is just going to give us probabilities for each output class. So between whether or not a patient experience
side effects or not, we will have an output probability for each class, letting us know
which class is more probable for any given patient. And just in case, it's not clear, this dense
layer here is what we call a densely connected layer or a fully connected layer, probably
the most well known type of layer and an artificial neural networks. Now in case you need any refresher on fully
connected layers, or activation functions, or anything else that we've discussed up to
this point, then just know that all of that is covered in the deep learning fundamentals
course, if you need to go there and refresh your memory on any of these topics here. Alright, so now we'll run this cell to create
our model. And now we can use model dot summary to print
out a visual summary of the architecture of the model we just created. So looking here, we can just see the visual
representation of the architecture that we just created in the cell above. All right, so now we have just finished creating
our very first neural network using this simple and intuitive sequential model type. In the next episode, we will see how we can
use the data that we created last time to train this network. Be sure to check out the blog and other resources
available for this episode on depot's or.com as well as the tables at hive mind where you
can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. In this episode, we'll see how to train an
artificial neural network using the keras API integrated with TensorFlow. In previous episodes, we went through steps
to generate data and also build an Artificial neural network. So now we'll bring these two together to actually
train the network on the data that we created and processed. Alright, so picking up where we were last
time in our Jupyter Notebook, make sure that you still have all of your imports included
and already ran so that we can continue where we were before. So first we have after building our model,
we are going to call this model compile function. And this just prepares the model for training. So it gets everything in order that's needed
before we can actually train the model. So first, we are specifying to the compile
function, what optimizer that we want to use, and we're choosing to use the very common
optimizer atom with a learning rate of 0.0001. And next we specify the type of loss that
we need to use, which is in our case, we're going to use sparse categorical cross in but
we're going to use sparse categorical cross entropy. And then lastly, we specify what metrics we
want to see. So this is just for the model performance,
what we want to be able to judge our model by and we are specifying this list, which
just includes accuracy, which is a very common way to be able to evaluate model performance. So if we run this cell, alright, so the model
has been compiled and is ready for training. And training occurs whenever we call this
fit function. Now recall earlier in the course, we actually
looked at the documentation for the split function, so that we knew how to process our
input data. So to fit the first parameter that we're specifying
is x here, which is our input data, which is currently stored in this scaled at train
samples variable, then y, which is our target data, or our labels are labels are currently
stored in the train labels variable. So we are specifying that here. Next, we specify our batch size that we want
to use for training. So this is how many samples are included in
one batch to be passed and processed by the network at one time. So we're setting this to 10. And the number of epochs that we want to run,
we're setting this to 30. So that means that the model is going to process
or train on all of the data in the data set 30 times before completing the total training
process. Next, we're specifying this shuffle, shuffle
parameter which we are setting to true. Now, by default, this is already set to true,
but I was just bringing it to your attention to show or to make you aware of the fact that
the data is being shuffled by default, when we pass it to the network, which is a good
thing, because we want any order that is inside of the dataset to be kind of erased before
we pass the data to the model so that the model is not necessarily learning anything
about the order of the data set. So this is true by default. So we don't necessarily have to specify that
I was just letting you know. And actually, we'll see something about that
in the next episode about why this is important regarding validation data, but we'll see that
coming up. The last parameter that we specify here is
verbose, which is just an option to allow us to see output from whenever we run this
fit function. So we can either set it to 01, or two, two
is the most verbose level in terms of output messages. So we are setting that here so that we can
get the highest level of output. So now let's run this cell so that training
can begin. All right, so training has just stopped. And we have run for 30 epochs. And if we look at the progress of the model,
so we're starting out on our first epoch, our loss value is currently point six, eight
and our accuracy is 50%. So no better than chance. But pretty quickly looking at the accuracy,
we can tell that it is steadily increasing all the way until we get to our last a POC,
which we are yielding 94% accuracy. And our loss has also steadily decreased from
the point six five range to now being at point two seven. So as you can see this model train very quickly,
with each epoch taking only under one second to run. And within 30 epochs, we are already at a
94% accuracy right. Now although this is a very simple model,
and we were training it on a very simple data, we can see that without much effort at all,
we were able to yield pretty great results in a relatively quick manner of time as well. In subsequent episodes, we'll demonstrate
how to work with more complex models as well as more complex data. But for now, hopefully this example served
the purpose of encouraging you on how easy it is to get started with Kerris. Be sure to check out the blog and other resources
available for this episode on V bowser.com. As well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. In this episode, we'll demonstrate how we
can use tensor flows keras API to create a validation set on the fly during training. Before we demonstrate how to build a validation
set using Kerris, let's first talk about what exactly a validation set is. So whenever we train a model, our hope is
that when we train it that we see good results from the training output, that we have low
loss and high accuracy. But we don't ever train a model just for the
sake of training it, we want to take that model and hopefully be able to use it in some
way on data that it wasn't necessarily exposed to during the training process. And although this new data is data that the
model has never seen before, the hope is that the model will be good enough to be able to
generalize well on this new data and give accurate predictions for it, we can actually
get an understanding of how well our model is generalizing by introducing a validation
set during the training process to create a validation set. Before training begins, we can choose to take
a subset of the training set, and then separate it into a separate set labeled as validation
data. And then during the training process, the
model will only train on the training data, and then we'll validate on the separated validation
data. So what do we mean by validating? Well, essentially, if we have the addition
of a validation set, then during training, the model will be learning the features of
the training set, just as we've already seen. But in addition, in each epoch, after the
model has gone through the actual training process, it will take what it's learned from
the training data, and then validate by predicting on the data in the validation set, using only
what it's learned from the training data, though. So then during the training process, when
we look at the output of the accuracy and loss, not only will we be seeing that accuracy
and loss computed for the training set, we'll also see that computed on the validation set. It's important to understand though, that
the model is only alerting on or training on the training data. It's not taking the validation set into account
during training. The validation set is just for us to be able
to see how well the model is able to predict on data that it was not exposed to during
the training process. In other words, it allows us to see how general
our model is how well it's able to generalize on data that is not included in the training
data. So knowing this information will allow us
to see if our model is running into the famous overfitting problem. So overfitting occurs when the model has learned
the specific features of the training set really well, but it's unable to generalize
on data it hasn't seen before. So if while training, we see that the model
is giving really good results for the training set, but less than good results for the validation
set, then we can conclude that we have an overfitting problem, and then take the steps
necessary to combat that specific issue. If you'd like to see the overfitting problem
covered in more detail, then there is an episode for that in the deep learning fundamentals
course. Alright, so now let's discuss how we can create
and use a validation set with a karass sequential model, there's actually two ways that we can
create and work with validation sets with a sequential model. And the first way is to have a completely
separate validation set from the training set. And then to pass that validation set to the
model in the fit function, there is a validation data parameter. And so we can just set that equal to the structure
that is holding our validation data. And there's a write up in the corresponding
blog for this episode that contains more details about the format that that data needs to be
in. But we're going to actually only focus on
the second way of creating and using a validation set. This step actually saves us a step because
we don't have to explicitly go through the creation process, the validation set, instead,
we can get Kerris to create it for us. Alright, so we're back in our Jupyter Notebook
right where we left off last time. And we're here on the model dot fit function. And recall, this is what we use last time
to train our model. Now, I've already edited this cell to include
this new parameter, which is validation split. And what validation split does is it does
what it sounds like it splits out a portion of the training set into a validation set. So we just set this to a number between zero
and one. So just a fractional number to tell Kerris
How much of the training set we need to split out into the validation set. So here I'm splitting out 10% of the training
set. So it's important to note that whenever we
do this, the validation set is completely held out of the training set. So the training samples that we remove from
the training set into validation set are no longer contained within the training data
any longer. So using this approach, the validation set
will be created on the fly whenever we call the fit function. Now, there's one other thing worth mentioning
here. And remember last time, I discussed this shuffle
equals true parameter. And I said that by default, the training set
is shuffled whenever we call fit. So this shuffle equals true is already set
by default. But I was just bringing it up to let you know
that that the training set is being shuffled. So that is a good thing, we want the training
set to be shuffled. But whenever we call validation split in this
way, this split occurs before the training set is shuffled, meaning that if we created
our training set and say, we put all of the sick patients first and then the non sick
patients second, and then we say that we want to split off the last 10% of the training
data to be our validation data, it's going to take the last 10% of the training data. And therefore it could just take all of the
the second group that we put in the training set and not get any of the first group. So I wanted to mention that because although
the training data is being shuffled with the fit function, if you haven't already shuffled
your training data before you pass it to fit, then you also use the validation split parameter,
it's important to know that your validation set is going to be the last X percent of your
training set and therefore may not be shuffled and may yield some strange results because
you think that everything has been shuffled when really, it's only the training set has
been shuffled after the validation set has been taken out. So just keep that in mind the way that we
created our training set. Before this episode, we actually shuffled
the training data before it's ever passed to the fit function. So in the future, whenever you're working
with data, it's a good idea to make sure that your data is also shuffled beforehand, especially
if you're going to be making use of the validation split parameter to create a validation set. Alright, so now we'll run this cell one more
time calling the fit function. But this time, not only will we see loss and
accuracy metrics for the training set, we'll also see these metrics for the validation
set. Alright, so the model has just finished running
it's 30 epochs. And now we see both the loss and accuracy
on the left hand side, as well as the validation loss and validation accuracy on the right
hand side. So we can see, let's just look at the accuracy
between the two. They're both starting at around the same 50%
Mark, and going up gradually around the same rate. So we just scroll all the way to our last
epoch, we can see that the accuracy and validation accuracy are pretty similar with only 1% difference
between the two. And yet the loss values are similar as well. So we can see in this example that our model
is not overfitting, it is actually performing pretty well or just as well rather on the
validation set as it is on the training set. So our model is generalizing well. If however, we saw that the opposite case
was true, and our validation accuracy was seriously lagging behind our training accuracy,
then we know that we have a overfitting problem and we would need to take steps to address
that issue. Alright, so we've now seen how to train the
model how to validate the model, and how to make use of both training and validation sets. In the next episode, we're going to see how
to make use of a third data set that test data set. To use the model for inference. Be sure to check out the blog and other resources
available for this episode on depot's or.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. In this episode, we'll see how we can use
a neural network for inference to predict on data from a test set using TensorFlow keras
API. As we touched on previously, whenever we train
a model, the hope is that we can then take that model and use it on new data that it
has not seen before during training, and hopefully the model is able to generalize well. And give us good results on this new data. So as a simple example, suppose that we trained
a network to be able to identify images of cats or dogs. And so during the training process, of course,
we had a training set that, say, we downloaded from a website with 1000s of images of cats
and dogs. So the hope is later that if we wanted to,
we could maybe build a web app, for example. And we could have people from all over the
world, submit their dog and cat photos and have our model, tell them with high accuracy,
whether or not their animal is a cat or a dog. So I don't know why anyone would actually
make that web app. but you get the point. The hope is that even though the images that
are being sent in from people around the world have their own cats and dogs, even though
those weren't included in the training set that the model was originally trained on. Hopefully, the models able to generalize well
enough to understand from what it's learned about dog and cat features, that it can predict
that Mandy's dog is actually a dog and not a cat, for example, we call this process inference. So the model takes what it learned during
training, and then uses that knowledge to infer things about data that it hasn't seen
before. In practice, we might hold out a subset of
our training data to put in a set called the test set. Typically, after the model has been trained
and validated, we take the model and use it for inference purposes against the test set,
just as one additional step to the validation to make sure that the model is generalizing
Well, before we deploy our model to production. So at this point, the model that we've been
working with over the last few episodes has been trained and validated. And given the metrics that we saw during the
validation process, we have a good idea that the model is probably going to do a pretty
good job at inference on the test set as well. In order to conclude that though, we first
would need to create a test set. So we're going to do that now. And then after we create the test set, then
we'll use the model for inference on it. Alright, so we are back in our Jupyter Notebook. And now we're going to go through the process
of creating the test set. And actually, if you just glance at this code
here, you can see that this whole process of setting up the the samples and labels list
and then generating the data from the imagine clinical trial that we discussed in a previous
episode, we're then taking that generated data and putting it into a NumPy array format,
then shuffling that data, and then scaling the data to be on a scale from zero to one,
rather than from the scale of 13 to 100. So actually, that is the same exact process
using almost the same exact code, except for we're working with our tests, labels, and
test samples, variables rather than train labels and train samples. So we're not going to go line by line through
this code. If you need a refresher, go check out the
earlier episode where we did the exact same process for the training set. The important thing to take from this process,
though, is that the test set should be prepared and processed in the same format as the training
data was. So we'll just go ahead and run the cells to
test and are to create and process the test data. And now, we are going to use our model to
predict on the test data. So to obtain predictions from our model, we
call predict on the model that we created in the last couple of episodes. So we are calling model dot predict. And we are first passing in this parameter
x which we're setting equal to our scaled test samples. So that is what we created in just the line
above, where we scaled our test samples to be on a scale from zero to one. So this is the data that we want our model
to predict on. Then we specify the batch size, and we are
setting the batch size equal to 10, which is the exact same batch size that we use for
our training data whenever we train the model as well. And then the last parameter we're specifying
is this verbose parameter, we're setting this equal to zero, because during predicting there
is not any output from this function that we actually care about seeing or that is going
to be any use to us at the moment. So we're setting that equal to zero to get
no output. Alright, so then if we run this, so then our
model predicts on all of the data in our test set. And if we want to have a visualization of
what each of these predictions from the model looks like for each sample, we can print them
out here. So looking at these predictions, the way that
we can interpret this is for each element within our test set. So for each sample in our tests that we are
getting a probability that maps to either The patient not experiencing a side effect,
or the patient experiencing a side effect. So for the first sample in our test set, this
prediction says that the model is a 92, or is assigning a 92% probability to this patient,
not experiencing a side effect, and just a around 8% probability of the patient experiencing
a side effect. So recall that we said, no side effect experience
was labeled as a zero, and a side effect experienced was labeled as a one. So that is how we know that this particular
probability maps to not having a side effect because it's in the zeroeth index. And this specific probability maps to having
a side effect because it is in the first index. So if we're interested in seeing only the
most probable prediction for each sample in the test set, then we can run this cell here,
which is taking the predictions and getting the index of the prediction with the highest
probability. And if we print that out, then we can see
that these are a little bit easier to interpret than the previous output. So we can see for the first sample that the
prediction is zero, the second sample is a one. And just to confirm, if we go back up here,
we can see that the first sample indeed has the higher probability of a label of zero,
meaning no side effects. And the second sample has a higher probability
of one meaning that the patient did experience a side effect. So from these prediction results, we're able
to actually see the underlying predictions. But we're not able to make much sense of them
in terms of how well the model did add these predictions, because we didn't supply the
labels to the model during inference in the same way that we do during training. This is the nature of inference. A lot of times inference is occurring once
the model has been deployed to production. So we don't necessarily have correct labels
for the data that the model is inferring from. If we do have corresponding labels for our
test set, though, which in our case, we do, because we are the ones who generated the
test data, then we can visualize the prediction results by plotting them to a confusion matrix. And that'll give us an overall idea at how
accurate our model was at inference on the test data. We'll see exactly how that's done in the next
episode. Be sure to check out the blog and other resources
available for this episode on depot's or.com as well as the tables at hive mind where you
can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. In this episode, we'll demonstrate how to
use a confusion matrix to visualize prediction results from a neural network during inference. In the last episode, we showed how we could
use our train model for inference on data contained in a test set. Although we have the labels for this test
set, we don't pass them to the model during inference, and so we don't get any type of
accuracy readings for how well the model does on the test set. Using a confusion matrix, we can visually
observe how well a model predicts on test data. Let's jump right into the code to see exactly
how this is done. We'll be using scikit Learn to create our
confusion matrix. So the first thing we need to do is import
the necessary packages that we'll be making use of next week create our confusion matrix
by calling this confusion matrix function from scikit learn. And we pass in our test labels as the true
labels. And we pass in our predictions as the predictions
and that the confusion matrix at specs and recall this rounded predictions variable,
as well as the test labels. These were created in the last episode. So rounded predictions recall was when we
use the arg max function to select only the most probable predictions. So now our predictions are in this format. And then our labels are zeros and ones that
correspond to whether or not a patient had side effects or not. So next we have this plot confusion matrix
function. And this is directly copied from socket learns
website. There's a link to the site on the corresponding
blog where you can copy this exact function. But this is just a function that socket learn
has created to be able to easily plot in our notebook, the confusion matrix, which is going
to be the actual visual output that we want to see. So we just run this cell to define that function. And now we create this list that has the labels
that we will use on our computer. In matrix, so we want, the labels have no
side effects and had side effects. Those are the corresponding labels for our
test data, then we're going to call the plot confusion matrix function that we just brought
in and defined above from scikit learn. And to that we are going to pass in our confusion
matrix. And the classes for the confusion matrix which
we are specifying cm plot labels, which we defined just right above. And lastly, just the title that is going to
be the title to display above the confusion matrix. So if we run this, then we actually get the
confusion matrix plot. Alright, so we have our predicted labels on
the x axis and our true labels on the y axis. So the way we can read this is that we look
and we see that our model predicted that a patient had no side effects 10 times when
the patient actually had a side effect. So that's incorrect predictions. On the flip side, though, the model predicted
that the patient had no side effects 196 times that the patient indeed had no side effects. So the this is the correct predictions. And actually, generally reading the confusion
matrix, looking at the top left to the bottom right diagonal, these squares here and blue
going across this diagonal are the correct predictions. So we can see total that the model predicted
200 plus 196. So 396 correct predictions out of a total
of 420, I think, yes. So all these numbers added up equal 420 396
out of 420 predictions were correct. So that gives us about a 94% accuracy rate
on our test set, which is equivalent to what we were seeing for our validation, accuracy
rate during training. So as you can see, a confusion matrix is a
great tool to use to be able to visualize how well our model is doing edits predictions,
and also be able to drill in a little bit further to see for which classes it might
need some work, be sure to check out the blog and other resources available for this episode
on the poser.com as well as the deep lizard hive mind where you can gain access to exclusive
perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. And this episode will demonstrate the multiple
ways that we can save and load a karass sequential model. We have a few different options when it comes
to saving and loading a karass sequential model. And these all work a little bit differently
from one another. So we're going to go through all of the options. Now. We've been working with a single model over
the last few episodes. And we're going to continue working with that
one. Now, I've printed out a summary of that model
to refresh your memory about which one we've been working with. But just make sure that in your Jupyter Notebook
that you have that model already created, because we're going to now show how we can
save that model. Alright, so the first way to save a model
is just by calling the Save function on it. So to save, we pass in a path for which we
want to save our model, and then the model name or the file name that we want to save
our model under with the H five extension. So this h5 file is going to be where the model
is stored. And this code here is just a condition that
I'm checking to see if the model is not already saved to disk first, then save it because
I don't want to continue saving the model over and over again, on my machine if it's
already been saved. So that's what this this condition is about. But this model dot save function is the first
way that we can save the model. Now when we save using this way, it saves
the architecture of the model, allowing us to be able to recreate it with the same number
of learnable parameters and layers and nodes etc. It also saves the weights of the model. So if the models already been trained, then
the weights that it has learned and optimized for are going to be in place within this saved
model on disk. It also saves the training configuration. So things like our loss and our optimizer
that we set whenever we compile the model, and the state of the optimizer is also saved. So that allows us if we are training the model,
and then we stop and save the model to disk, then we can later load that model again and
pick up training where we left off because the state of the optimizer will be in that
saved state. So this is the most comprehensive option when
it comes to saving the model because it saves everything, the architecture, the learnable
parameters and the state of the model. where it left off with training. So if we want to load a model later that we
previously saved to disk, then we need to first import this load model function from
TensorFlow Kerris dot models. And then we create a variable. In this case, I'm calling it a new model,
and then setting it to load model and then pointing to where our saved model is on disk. So then, if we run that, and then look at
a summary of our new model, we can see Indeed, it is an exact replica in terms of its architecture
as the original model up here that we had previously saved to disk. Also, we can look at the weights of the new
model, we didn't look at the weights ahead of time to be able to compare them directly. But this is showing you that you can inspect
the weights to look and comparatively see that the weights are actually the same as
the previous models weights. If you were to have taken a look at those
beforehand. We can also look at the optimizer just to
show you that although we never set an optimizer explicitly for our new model, because we are
loading it from the saved model, it does indeed use the atom optimizer that we set a while
back whenever we compiled our model for training. Alright, so that's it for the first saving
and loading option. And again, that is the most comprehensive
option to save and load everything about a particular model. The next option that we'll look at is using
this function called to JSON. So we call model.to. json, if we only need to save the architecture
of the model. So we don't want to set its weights or the
training configuration, then we can just save the model architecture by saving it to a JSON
string. So I'm using this example here, creating a
variable called JSON string, setting it equal to model.to. json. And remember, model is our original model
that we've been working with so far up to this point. So we call to JSON. And now if we print out this JSON string,
then we can see we get this string of details about the model architecture. So it's a sequential model. And then it's got the layers organized, with
the individual dense layers and all the details about those specific layers from our original
model. Now, if at a later point, we want to create
a new model with our older models, architecture, then if we save it to a JSON string, then
we can import the model from JSON function from TensorFlow Kerris dot models. And now we're creating a new variable called
model architecture. And we are loading in the JSON string using
the model from JSON function. So now we have this new model, which I'm just
calling model architecture. And if we look at the summary of that, then
again, we can see that this is identical to the summary of the original model. So we have a new model in place now, but we
only have the architecture in place. So you would have to retrain it to update
its weights. And we would need to compile it to get an
optimizer and our loss and everything like that defined, this only creates the model
from an architecture standpoint. Before moving on to our third option, I just
wanted to mention a brief point that we can go through this same exact process, but using
a YAML string instead of a JSON string. So the function to create a gamble string
is just model.to a gamble instead of to JSON. And then the function to load a YAML string
is model from a gamma model from a YAML instead of model from JSON. Alright, so our next option to save a model
is actually just to save the weights of the model. So if you only need to save the weights, and
you don't need to save the architecture, nor any of the training configurations, like the
optimizer or loss, then we can save solely models weights by using the Save weights function. So to do that, we just call model dot save
weights. And this looks exactly the same as when we
called model dot save, we're just passing a path on disk to where to save our model
along with the file name ending with an H five extension. So I'm calling this my model weights dot h
five. And again, we have this condition here where
I'm just checking if this H five file has already been saved to disk, otherwise, I'm
not going to keep saving it over and over again. Now, the thing with this is that when we save
only the weights, if we want to load them at a later time, then we don't have a model
already in place because we didn't save the model itself. We only save the weights. So to be able to bring in our weights to a
new model then we would then need to create a second model at that point with the same
architecture. And then we could load the weights. And so that's what we're doing in this cell. I'm defining this model called model two. And it is the exact same model from an architecture
standpoint as the first model. So if we run this, then at that point, we
have the option to load weights into this model. And the shape of these weights is going to
have to match the shape of what this model architecture is essentially. So we couldn't have a model with five layers
here to find, for example, and load these weights in because there wouldn't be a direct
mapping of where these particular weights should be loaded. So that's why we have the same exact architecture
here as our original model. So to load our weights, we call load weights. And we point to the place on disk where our
weights are saved. And then we can call get weights on our new
model, and see that our new model has been populated with weights from our original model. Alright, so now you know all the ways that
we can say various aspects of a karass sequential model. Be sure to check out the blog and other resources
available for this episode on people's are.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now, let's move on to the next episode. Hey, I may be from people's or, in this episode,
we'll go through all the necessary image preparation and processing steps needed to train our first
convolutional neural network. Our goal over the next few episodes will be
to build and train a convolutional neural network that can classify images as cats or
dogs, the first thing that we need to do is get and prepare our data set for which we'll
be training our model. We're going to work with the data set from
the kaggle cats versus dogs competition. And you can find a link to download the dataset
in the corresponding blog for this episode on people's are.com. So we're mostly going to organize our data
on this programmatically. But there's a couple of manual steps that
we'll go through first. So after you've downloaded the data set from
kaggle, you will have this zip folder here. And if we look inside, then this is the contents
we have a zipped train folder, and a zipped test folder along with the sample submission
CSV. So we actually are not going to be working
with this, or this. So you can delete the test that as well as
the CSV file, we're going to only be working with this train zip. So we want to extract the high level cats,
or dogs versus cats at first, and then extract the train zip. So because that takes a while I went ahead
and did that here. So now I have just this train directory because
I moved the test directory elsewhere and deleted the CSV file. So now I have this extracted train folder. And I go in here and I have a nested train
folder. As you can see, I'm already in train once. Now I'm in train again. And in here we have all of these images of
cats, and dogs. Okay, so that is the way that the data will
come downloaded. The first step, or the next step that we need
to do now is to come into here and grab all of these images. And we're going to Ctrl x or cut all of these
and bring them up to the first train directory. So we don't want this nested directory structure. Instead, we're going to place them directly
within the first train directory. Alright, now all of our images have been copied
into our base train directory here. So this nested train directory that the images
previously belong to is now empty. So we can just go ahead and delete this one. Alright, so now our directory structure is
just this dogs vs. Cat top directory within we have a train directory. And then we have all of the images within
our train directory. Both of cats and dogs. The last step is to just move the dogs versus
cats directory that has all the data in it to be in the place on disk where you're going
to be working. So for me, I am in relative to my Jupyter
Notebook, I am within a directory called data and I have placed dogs versus cats here. Alright, so that's it for the manual labor
now everything else that will do to organize the data, and then later process the data
will be programmatically through code. Alright, so we're now here within our Jupyter
Notebook. And First things first, we need to import
all of the packages that we'll be making use of and all of the packages here are not just
This specific episode on processing the image data. But actually, these are all of the packages
that we'll be making use of over the next several episodes as we're working with cn
ns. Alright, so we'll get that taken care of. And now, just this cell here is making sure
if we are using a GPU that TensorFlow is able to identify it correctly, and we are an enabling
memory growth on the GPU as well. If you're not using a GPU, then no worries,
as mentioned earlier, you are completely fine to follow this course with a CPU only. Alright, so now we're going to pick back up
with organizing our data on disk. So assuming that we've gone through the first
steps from the beginning of this episode, we're now going to organize the data and to
train valid and test ders, which correspond to our training validation and test sets. So this group here is going to first change
directories into our dog vs. Cat directory. And then it's going to check to make sure
that the directory structure that we're about to make does not already exist. And if it doesn't, it's going to proceed with
the rest of this script. So the first thing that it's doing as long
as the directory structure is not already in place, is making the following directories. So we already have a train directory. So it's going to make a nested dog and cat
directory within train. And then additionally is going to make a valid
directory that contains dog and cat directories, and a test directory, which also contains
dog and cat directories. Now, this particular data set contains 25,000
images of dogs and cats. And that is pretty much overkill for the tasks
that we will be using these images for. In the upcoming episodes, we're actually only
going to use a small subset of this data, you're free to work with all the data if you'd
like. But it would take a lot longer to train our
networks and work with the data in general if we were using the entire set of it. So we are going to be working with a subset
consisting of 1000 images in our training set 200 in our validation set, and 100 in
our test set, and each of those sets are going to be split evenly among cats and dogs. So that's exactly what this block of code
here is doing. It's going into the images that are in our
dogs vs cat directory, and moving 500 randomly moving 500 cat images into our train cat directory
500 dog images into our train dog directory. And then similarly doing the same thing for
our valid, our valid our validation set, both for cat and dogs and then our test set, both
for cat and dogs with just these quantities differing regarding the amounts that I stated
earlier for each of the sets. And we're able to understand which images
are cats and which are dogs based on the names of the files. So if you saw earlier, the cat images actually
had the word cat in the file names and then the dog images had the word dog in the file
names. So that's how we're able to select dog images
and cat images here with this script. Alright, so after this script runs, we can
pull up our file explorer and look at the directory structure and make sure it is what
we expect. So we have our dogs vs cat directory within
our data directory here. So if we enter, then we have test train and
valid directories. Inside test, we have cat that has cat images,
and inside dog, we have dog images. If we back out and go to train, we can see
similarly. And if we go into valid we can see similarly. And you can select one of the folders and
look at the properties to to see how many files exist within the directory to make sure
that is it is the amount that we chose to put in from our script and that we didn't
accidentally make any type of error. So if we go back to the dogs vs cats the root
directory here, you can see we have all of these cat and dog images leftover. These were the remaining 23,000 or so that
were left over after we moved our subset into our train valid and test directories. So you're free to make use of these in any
way that you want or delete them or move them to another location. All of what we'll be working with are in these
three directories here. Alright, so at this point, we have obtained
the data and we have organized the data. Now it's time to move on to processing the
data. So if we scroll down, first, we are just creating
these variables here where we have assigned our train valid and test paths. So this is just pointing to the location on
disk where our different data sets reside. Now recall earlier in the course, we talked
about that whenever we train a model that we need to put the data into a format that
the model expects. And we know that when we train a karass sequential
model, that the model receives the data whenever we call the fit function. So we are going to put our images and do the
format of a karass generator. And we're doing that in this cell here. We're creating these train valid and test
batches, and setting them equal to image data generator dot flow from directory, which is
going to return a directory iterator. Basically, it's going to create batches of
data from the directories where our datasets reside. And these batches of data will be able to
be passed to the sequential model using the fit function. So now let's look exactly at how we are defining
these variables. So let's focus just on train batches for now. So we're setting train batches equal to image
data generator dot flow from directory. But first to image data generator, we are
specifying this pre processing function and setting that equal to tf Kerris dot applications
that VGG 16 dot pre process input. So I'm just going to tell you for now that
this is a function that is going to apply some type of pre processing on the images
before they get passed to the network that we'll be using. And we're processing them in such a way that
is equivalent to the way that a very popular model known as VGG 16, we're processing our
images in the same format as which images that get passed to this VGG 16 model our process. And we're going to talk about more about this
in a future episode. So don't let it confuse you now just know
that this is causing some type of processing to occur on our images. And we'll talk more about it in a future episode
and not stress on it now, because it's not necessarily very important for us right at
this moment, the technical details of that at least. So besides that, when we call flow from directory,
this is where we are passing in our actual data and specifying how we want this data
to be processed. So we are setting directory equal to train
path, which appear we defined the location on disk where our training set was under the
train PATH variable. And then we're setting target size equal to
224 by 224. So this is the height and width that we want
the cat and dog images to be resized to. So if you're working with an image data set
that has images of varying sizes, or you just want to scale them up or scale them down,
this is how you can specify that to happen. And this will resize all images in your data
set to be of this height and width before passing them to our network. Now we are specifying our classes, which are
just the classes for the potential labels of our data set. So cat or dog, and we are setting our batch
size to 10. We do the exact same thing for the validation
set. And the test set. Everything is the exact same for both of them,
except for where each of these sets live on disk as being specified here under the directory
parameter. And then the only other difference is here
for our test batches. We are specifying this shuffle equals false
parameter. Now, this is because whenever we use our test
batches later for inference to get our model to predict on images of cats and dogs after
training and validation has been completed, we're going to want to look at our prediction
results in a confusion matrix like we did in a previous video for a separate data set. And in order to do that, we need to be able
to access the unsettled labels for our test set. So that's why we set shuffle equals false
for only this set. For both validation and training sets, we
do want the data sets to be shuffled. Alright, so we run this and we get the output
of found 1000 images belonging to two classes. And that is corresponding to our train batches
found at 200 images belonging to two classes, which corresponds to valid valid batches,
and then the 100 belonging to two classes corresponding to our test batches. So that is the output that you want to see
for yourself. That's letting you know that it found the
images on disk that belong to both the cat and dog classes that you have specified here. So if you are not getting this at this point,
if you get found at zero images, then perhaps you're pointing to the wrong place on disk. You just need to make sure that it's able
to find all the images that you set up previously. Right and here we are just verifying that
that is indeed the case. Now Next, we are going to just grab a single
batch of images and the corresponding labels from our train batches. And remember, our batch size is 10. So this should be 10 images along with the
10 corresponding labels. Next, we're introducing this function plot
images that we're going to use to plot the images from our train batches that we just
obtained above. And this function is directly from tensor
flows website. So check the link in the corresponding blog
for this episode on people's or.com. To see, to be able to get to the tensor flows
site where exactly I pulled this off of. So we will define this function here. Alright, so now we're just going to use this
function to plot our images from our test batches here. And we're going to print the corresponding
labels for those images. So if we scroll down, we can see this is what
a batch of training data looks like. So this might be a little bit different than
what you expected, given the fact that it looks like the color data has been a little
bit distorted. And that's due to the pre processing function
that we called to pre process the images in such a way that in the same type of way that
images get pre processed for the famous VGG 16 model. So like I said, we're going to discuss in
detail what exactly that pre processing function is doing technically, as well as why we're
using it in a later video. But for now, just know that it's skewing the
RGB data in some way. So we can still make out the fact that this
is a cat. And this looks like a cat. This is the dog, dog, dog, dog cat. Yeah, so we can still kind of generally make
out what these images are, but the color data is skewed. But don't worry too much about the technical
details behind that. For right now, just know that this is what
the data looks like before we pass it to the model. And here are the corresponding labels for
the data. So we have these one hot encoded vectors that
represent either cat or dog. So a one zero represents a cat, and a zero,
a one represents a dog. Okay, so I guess I was wrong earlier with
thinking that this one was a dog. This one is a cat, because as we can see,
it maps to the 101 hot encoding. And if you don't know what I mean by one hot
encoding, then check out the corresponding video for that in the deep learning fundamentals
course on depot's or.com. But yeah, we can see that 01 is the vector
used to represent the label of a dog. So this one is a dog. And the next two are dogs as well, this one
and this one. Now, just a quick note about everything that
we've discussed up to this point, sometimes we do not have the corresponding labels for
our test set. So in the examples that we've done so far,
in this course, we've always had the corresponding labels for our test set. But in practice, a lot of times you may not
have those labels. And in fact, if we were to have used the downloaded
test directory that came from the kaggle download, then we would see that that test directory
does not have the images labeled with the cat or dog. So in this case, we do have the test labels
for the cat and dog images since we pulled them from the original training set from kaggle
that did have the corresponding labels. But if you don't have access to the test labels,
and you are wondering how to process your test data accordingly, then check the blog
for this episode on dibbles comm I have a section there that demonstrates what you need
to do differently from what we showed in this video if you do not have access to the labels
for your test set. Alright, so now we have obtained our image
data, organized it on disk and processed it accordingly for our convolutional neural network. So now in the next episode, we are going to
get set up to start building and training our first CNN. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. And this episode will demonstrate how to build
a convolutional neural network and then train it on images of cats and dogs using tensor
flows integrated Kerris API. We'll be continuing to work with the cat and
dog image data that we created in the last episode. So make sure you still have all of that in
place, as well as the imports that we brought in in the last episode as well as we'll be
making use of those imports. For the next several videos of working with
CNN, so to create our first CNN, we will be making use of the Kerris sequential model. And recall, we introduced this model in an
earlier episode when we were working with just plain simple numerical data. But we'll continue to work with this model
here with our first CNN. So the first layer to this model that we pass
is a comma to D layer. So this is just our standard convolutional
layer that will accept image data. And to this layer, we're arbitrarily setting
the filter value equal to 32. So this first convolutional layer will have
32 filters with a kernel size of three by three. So the choice of 32 is pretty arbitrary, but
the kernel size of three by three is a very common choice for image data. Now, this first column to D layer will be
followed by the popular relu activation function. And we are specifying padding equals Same
here. And this just means that our images will have
zero padding, padding the outside so that the dimensionality of the images aren't reduced
after the convolution operations. So lastly, for our first layer only, we specify
the input shape of the data. And recall we touched on this parameter previously,
you can think of this as kind of creating an implicit input layer for our model. This comp 2d is actually our first hidden
layer, the input layer is made up of the input data itself. And so we just need to tell the model the
shape of the input data, which in our case is going to be 224. By 224. Recall, we saw that we were setting that target
size parameter to do 24 by 224. Whenever we created our valid test and train
directory iterators, we said that we wanted our images to be of this height and width. And then this three here is regarding the
color channels. Since these. Since these images are an RGB format, we have
three color channels, so we specified this input shape. So then we follow our first convolutional
layer with a max pooling layer, where we're setting our pool size to two by two and our
strides by two. And if you are familiar with Max pooling,
then you know this is going to cut our image dimensions in half. So if you need to know more about max pooling,
or you just need a refresher, same thing with padding here for zero padding, activation
functions, anything like that, then be sure to check those episodes out and the corresponding
deep learning fundamentals course on depot's or.com. So after this max pooling layer, we're then
adding a another convolutional layer, that looks pretty much exactly the same as the
first one except for four, we're not including the input shape parameter, since we only specify
that for our first hidden layer, and we are specifying the filters to be 64 here instead
of 32. So 64 is again, an arbitrary choice, I just
chose that number here. But the general rule of increasing functions
as you go until later layers of the network is common practice, we then follow this second
convolutional layer with another max pooling layer identical to the first one, then we
flatten all of this into a one dimensional tensor before passing it to our dense output
layer, which only has two nodes corresponding to cat and dog. And our output layer is being followed by
the softmax after activation function, which as you know, is going to give us probabilities
for each corresponding output from the model. Alright, so if we run this, we can then check
out the summary of our model. And this is what we have exactly what we built,
along with some additional details about the learnable parameters and the output shape
of the network. So these are things that are also covered
in the in the deep learning fundamentals course as well, if you want to check out more information
about learnable parameters and such. So now that our model is built, we can prepare
it for training by calling model dot compile, which again, we have already used in a previous
episode, when we were just training on numerical data. But as you will see here, it looks pretty
much the same as before, we are setting our optimizer equal to the atom optimizer with
a learning rate of 0.0001. And we are using categorical cross entropy. And we are looking at the accuracy as our
metrics to be able to judge the model performance. Just a quick note before moving on. We are using categorical cross entropy. But since we have just two outputs from our
model, either cat or dog, then it is possible to instead use binary cross entropy. And if we did that, then we would need to
just have a one single output node from our model instead of two And then rather than
following our output layer with the softmax activation function, we would need to follow
it with sigmoid. And both of these approaches using categorical
cross entropy loss with the setup that we have here, or using binary cross entropy loss
with the setup that I just described, work equally well, they're totally equivalent,
and will yield the same results, categorical cross entropy, and using softmax on the app,
as the activation function for the output layer, or just a common approach for when
you have more than two classes. So I like to continue using that approach,
even when I only have two classes just because it's general. And it's the case that we're going to use
whenever there's more than two classes anyway, so I just like to stick with using the same
type of setup, even when we only have two outputs. Alright, so after compiling our model, we
can now train it using model dot fit, which we should be very familiar with up to this
point. So to the fit function, we are first specifying
our training data, which is stored in train batches. And then we specify our validation data, which
is stored in valid batches. Recall, this is a different way of creating
validation data, as we spoke about in an earlier episode, we're not using validation split
here, because we've actually created a validation set separately ourselves before fitting the
model. So we are specifying that separated set here
as the validation data parameter, then we are setting our epochs equal to 10. So we're only going to train for 10 runs this
time and setting verbose, verbose equal to two so that we can see the most verbose output
during training. And one other thing to mention is that you
will see here that we are specifying x just as we have in the past, but we are not specifying
y, which is our target data usually. And that's because when data is stored as
a generator, as we have here, the the generator itself actually contains the corresponding
labels. So we do not need to specify them separately,
whenever we call fit, because they're actually contained within the generator itself. So let's go ahead and run this now. Alright, so the model just finished training. So let's check out the results. Before we do just if you get this warning,
it appears from the research I've done to be a bug within TensorFlow that's supposed
to be fixed in a in the next release actually, is what I read. So you can just safely ignore this warning,
it has no impact on our training. But if we scroll down and look at these results,
then we can see that by our 10th epoch, our accuracy on our training set has reached 100%. So that is great. But our validation accuracy is here at 69%. So not so great, we definitely see that we
have some overfitting going on here. So if this was a model that we really cared
about, and that we really wanted to use and be able to deploy to production, then in this
scenario, we will need to stop what we're doing and combat this overfitting problem
before going further. But what we are going to do is that we'll
be seeing in an upcoming episode, how we can use a pre trained model to perform really
well on this data. And that'll get us exposed to the concept
of fine tuning. Before we do that, though, we are going to
see in the next episode how this model holds up to inference at predicting on images in
our test set. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I may be from deep lizard. And this episode will demonstrate how to use
a convolutional neural network for inference to predict on image data using tensor flows
integrated keras API. Last time, we built and trained our first
cnn against cat and dog image data, and we saw that the training results were great with
the model achieving 100% accuracy on the training set. However, it lagged behind by quite a good
bit at only 70% accuracy on our validation set. So that tells us that the model wasn't generalizing
as well as we hoped. But nonetheless, we are going to use our model
now for inference to predict on cat and dog images in our test set. Given the less than decent results that we
saw from the validation performance. Our expectation is that the model is not going
to do so well on the test set either it's probably going to perform at around the same
70% rate. But this is still going to give us exposure
about how we can use a CNN for inference using the Kerris sequential API. Alright, so we are back in Jupyter Notebook
and we need to make sure that we have all the code in place from the last couple of
episodes as we will be continuing to make use of both our model that we built last time,
as well as our test data from whenever we prepared the data sets. So the first thing we're going to do is get
a batch of test data from our test batches. And then we're going to plot that batch, we're
going to plot the images specifically, and then we're going to print out the corresponding
labels for those images. And we're using before we do that just a reminder,
this plot images function that we introduced in the last couple of episodes. Alright, so if we scroll down, we have our
test batches. Recall, we had the discussion about why the
image data looks the way that it does in terms of the color being skewed last time, but we
can see that just by looking even though we have kind of distorted color we have, these
are all actually cats here. And by looking at the corresponding label
for these images, we can see that they are all labeled with the one hot encoded vector
of one zero, which we know is the label for cat. So if you're wondering why we have all cats
as our first 10 images in our first batch here, that is because we're call whenever
we created the test set, we specified that we did not want it to be shuffled. And that was so that we could do the following. If we come to our next cell, and we run test
batches classes, then we can get a array that has all of the corresponding labels for each
image in the test set. So given that we have access to the ns shuffled
labels for the test set, that's why we don't want to shuffle the test set directly, because
we want to be able to have this one one direct mapping from the uncoupled labels to the test
data set. And if we were to shuffle the test data set,
every time we generated a batch, then we wouldn't be able to have the correct mapping between
labels and samples. So we care about having the correct mapping,
because later after we get our predictions from the model, we're going to want to plot
our predictions to a confusion matrix. And so we want the corresponding labels that
belong to the samples in the test set. Alright, so next, we're actually going to
go ahead and obtain our predictions by calling model dot predict just as we have an earlier
episodes for other data sets. And to x, we are specifying our test batches,
so all of our test data set, and we are choosing verbose to be zero to get no output whenever
we run our predictions. So here, we are just printing out the arounded
predictions from the model. So the way that we can read this is first,
each one of these arrays is a prediction for a single sample. So if we just look at the first one, this
is the prediction for the first sample and the test set. So this one, wherever there is a one with
each prediction is the one is the index for the output class that had the highest probability
from the model. So in this case, we see that the zeroeth index
had the highest probability. So we can just say that the label that the
model predicted for this first sample was a zero, because we see that there is a one
here in the zeroeth index. And if we look at the first element, or if
we look at the first label here, up here, for our first test sample, it is indeed zero. So we can eyeball that and see that the model
did accurately predict all the way down to here, because that's the first 123456 and
123456. Okay, so But then, whenever we see that the
model predicted the first index to be the highest probability, that means that the,
that the model predicted an output label of a one, and so that corresponds to dog. So it's hard for us to kind of draw an overall
conclusion about the prediction accuracy for this test set, just eyeballing the results
like this. But if we scroll down, then we know that we
have the tool of a confusion matrix that we can use to make visualizing these results
much easier, like we've seen in previous episodes of this course already. So we are going to do that. Now we're going to create a confusion matrix
using this confusion matrix function from scikit learn which we've already been introduced
to and we are passing in our true labels using test batches classes. Recall that we just touched on that a few
minute ago. And for our predictive labels, we are passing
in the predictions from our model, we are getting the we're actually passing in the
index of each. We're actually using arg max to pass in the
index of where the most probable prediction was from our predictions list. So this is something that we've already covered
in previous episodes for why we do that. So if we run that, we're now going to bring
in this plot confusion matrix, which we've discussed is directly from psychic Lauren's
website, link to that is, and the corresponding blog for this episode on people's are calm. This is just going to allow us to plot our
confusion matrix in a moment. And now if we look at the class indices, we
see that cat is first and dog is second. So we just need to look at that so that we
understand in which order, we should put our plot labels for our confusion matrix. And next we call plot confusion matrix and
pass in the confusion matrix itself, as well as the labels for the confusion matrix and
a title for the entire matrix. So let's check that out. Alright, so from what we learned about how
we can easily interpret a confusion matrix, we know that we can just look at this diagonal
here running from top left to bottom right, to see what the model predicted correctly. So not that great, the model is definitely
overfitting at this point. So like I said, if this was a model that we
were really concerned about, then we would definitely want to combat that overfitting
problem. But for now, we are going to move on to a
new model using a pre trained state of the art model called VGG 16. In the next episode, so that we can see how
well that model does on classifying images of cats and dogs. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from be blizzard. And this episode will demonstrate how we can
fine tune a pre train model to classify images using tensor flows keras API. The pre train model that we'll be working
with is called VGG 16. And this is the model that won the 2014 image
net competition. In the image net competition, multiple teams
compete against each other to build a model that best classifies images within the image
net library. And the image net library is made up of 1000s
of images that belong to 1000 different classes using Kerris will import this VGG 16 model,
and then fine tune it to not classify on one of the 1000 categories for which it was originally
trained, but instead only on two categories, cat and dog. Note, however, that cats and dogs were included
in the original image net library for which VGG 16 was trained on. And because of this, we won't have to do much
tuning to change the model from classifying from 1000 classes to just the two cat and
dog classes. So the overall fine tuning that we'll do will
be very minimal. in later episodes, though, we'll do more involved
fine tuning and use transfer learning to transfer what a model has learned on an original data
set to completely new data and a new custom data set that we'll be using later. To understand fine tuning and transfer learning
at a fundamental level, check out the corresponding episode for fine tuning in the deep learning
fundamentals course on V blizzard.com. before we actually start building our model, let's
quickly talk about the VGG 16 pre processing that is done. Now recall we are here in our Jupyter Notebook. This is from last time when we plotted a batch
from our test data set. And a few episodes ago, we discussed the fact
that this color data was skewed as a result of the VGG 16 pre processing function that
we were calling. Well, now we're going to discuss what exactly
this pre processing does. So as we can see, we can still make out the
images as these here being all cats, but it's just the data the color data itself that appears
to be distorted. So if we look at the paper that the authors
of VGG 16 detailed the model in under the architecture section, let's blow this up a
bit, we can see that they state that the only pre processing we do is subtract the mean
RGB value computing on the trait computed on the training set from each pixel. So what does that mean? That means that they computed the mean red
value pixel for All of the training data. And then once I had that mean value across
each image in the training set, they subtracted that mean value from the red value, the red
pixel, each pixel in the image. And then they did the same thing for the green
and blue pixel values as well. So they found the green picks the Mean Green
pixel value among the entire training set. And then for each sample in the training set,
every green pixel, they subtracted that green value from it. So and same thing for the blue, of course. So that's what they did for the pre processing. So given that, that's how VGG 16 was originally
trained. That means now whenever new data is passed
to the model, that it needs to be processed in the same exact way as the original training
set. So Kerris already has functions built in for
popular models, like VGG 16, where they have that pre processing in place that matches
for the corresponding model. So that's why we were calling that model whenever
we process our cat and dog images earlier, so that we could go ahead and get those images
processed in such a way that matched how the original training set was processed, when
VGG was originally trained. Alright, so that is what that color distortion
is all about. So now let's jump back into the code. Now that we know what that's about, we have
an understanding of the pre processing. Now, Let's now get back to actually building
the fine tuned model. So the first thing that we need to do is download
the model. And when you call this for the first time,
you will need an internet connection because it's going to be downloading the model from
the internet. But after this, subsequent calls to this function
will just be grabbing this model from the download a copy on your machine. Alright, so the model has been downloaded
it now we are just running this summary. And we can see that this model is much more
complex than what we have worked with up to this point. So total, there are almost 140 million parameters
in this model. And on disk, it is over 500 megabytes. So it is quite large. Now recall I said that this VGG 16 model originally
was predicting for 1000 different image net classes. so here we can see our output layer of the
VGG 16 model has a 1000 different outputs. So our objective is going to simply be to
change this last output layer to predict only two output classes corresponding to cat and
dog, along with a couple other details regarding the fine tuning process that we'll get to
in just a moment. So now, we're actually going to just skip
these two cells here. These are just for me to be able to make sure
that I've imported the model correctly. But it is not relevant for any code here. It's just checking that the trainable parameters. And non trainable parameters are what I expect
them to be after importing. But this is not of concern at the moment for
us. So if we scroll down here, we're now going
to build a new model called sequential. Alright, now before we run this code, we're
actually just going to look at the type of model this is. So this is returning a model called model. So this is actually a model from the Kerris
functional API. We have been previously working with sequential
models. So we are in a later episode going to discuss
the functional API in depth, it's a bit more complicated and more sophisticated than the
sequential model. For now, since we're not ready to bring in
the functional model yet, we are going to convert the original VGG 16 model into a sequential
model. And we're going to do that by creating a new
variable called model here and setting this equal to an instance of sequential object. And we are going to then loop through every
layer and VGG 16. Except for the last output layer, we're leaving
that out, we're going to loop through every layer and then add each layer into our new
sequential model. So now we'll look at a summary of our new
model. And by looking at this summary, if you take
the time to compare the previous summary to this summary, what you will notice is that
they are exactly the same except for the last layer has been not included in this new model. So this layer here we have this fully connected
two layer is what this is here, with the output shape of 4096. here if we scroll Back up,
we can see that this is this layer here. So the predictions layer has not been included,
because when we were iterating over our for loop, we went all the way up to the second
last layer, we did not include the last layer of VGG 16. Alright, so now let's scroll back down. And we're now going to iterate over all of
the layers within our new sequential model. And we are going to set each of these layers
to not be trainable by setting layer dot trainable to false. And what this is going to do is going to freeze
the trainable parameters, or the weights and biases from all the layers in the model so
that they're not going to be retrained. Whenever we go through the training process
for cats and dogs, because VGG 16 has already learned the features of cats and dogs and
its original training, we don't want it to have to go through more training again it
since it's already learned those features. So that's why we are freezing the weights
here, we're now going to add our own output layer to this model. So remember, we removed the previous output
layer that had 10 output or that had 1000 output classes rather. And now we are going to add our own output
layer that has only two output classes for cat and dog. So we add that now since we have set all of
the previous layers to not be trainable, we can see that actually only our last output
layer that's going to be the only trainable layer in the entire model. And like I said before, that's because we
already know that VGG 16 has already learned the features of cats and dogs during its original
training. So we only need to retrain this output layer
to classify two output classes. So now if we look at a new summary of our
model, then we'll see that everything is the same, except for now we have this new dense
layer as our output layer, which only has two classes, instead of 1000. From the original VGG 16 model, we can also
see that our model now only has 8000 trainable parameters, and those are all within our output
layer. As I said that our output layer is our only
trainable layer. So before actually, all of our layers were
trainable. If we go take a look at our original VGG 16
model, we see that we have 138 million total parameters, all of which are trainable, none
of which are non trainable. So if we didn't freeze those layers, then
they would be getting retrained during the training process for our cat and dog images. So just to scroll back down again and check
this out. We can see that now we still have quite a
bit of learnable parameters or a total parameters 134 million, but only 8000 of which are trainable. The rest are non trainable. In the next episode, we'll see how we can
train this modified model on our images of cats and dogs. Be sure to check out the blog and other resources
available for this episode on the bowser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Mandy from Blizzard. In this episode, we'll demonstrate how to
train the fine tuned VGG 16 model that we built last time on our own data set of cats
and dogs. Alright, so we're jumping straight into the
code to train our model. But of course, be sure that you already have
the code in place from last time as we will be building on the code that we have already
run previously. So, we are using our model here to first compile
it to get it ready for training. This model is the sequential model that we
built last time that is our fine tuned VGG 16 model containing all the same layers with
frozen weights except for our last layer which we have modified to output only two possible
outputs. So, we are compiling this model using the
atom optimizer as we have previously used with the learning rate of 0.0001 and a the
loss we are using again categorical cross entropy just like we have before and we are
using the accuracy as our only metric to judge model performance. So, there is nothing new here with this call
to compile. It is pretty much exactly the same as what
we have seen for our previous models. Now we are going to train the model using
model dot fit and we are passing in our training data set which we have stored in train batches. We are passing in our validation set Which
we have stored as valid batches. And we are only going to run this model for
five epochs. And we are setting the verbosity level to
two so that we can see the most comprehensive output from the model during training. So let's see what happens. Alright, so training has just finished, and
we have some pretty outstanding results. So just after five epochs on our training
data and validation data, we have an accuracy on our training data of 99%. And validation accuracy, right on par at 98%. So that's just after five epochs. And if we look at the first epoch, even our
first epoch gives, it gives us a training accuracy of 85%, just starting out, and a
validation accuracy of 93%. So this isn't totally surprising, because
remember, earlier in the course, we discussed how VGG 16 had already been trained on images
of cats and dogs from the image net library. So it had already learned those features. Now, the flight training that we're doing
on the output layer, is just to train VGG 16 to output only cat or dog at classes. And so it's really not surprising that it's
doing such a good job right off the bat in its first epoch, and even an even considerably
better, and it's fifth epoch at 99% training accuracy. Now we're called the previous CNN that we
built from scratch ourselves the really simple convolutional neural network, that model actually
did really well on the training data, reaching 100% accuracy after a small amount of epochs
as well. Where we saw it lagging though was with the
validation accuracy. So it had a validation accuracy of around
70%. Here we see that we are at 98%. So the main recognizable difference between
our very simple CNN and this VGG 16 fine tuned model is how well this model generalizes to
our cat and dog data in the validation set, whereas the model we built from scratch did
not generalize so well on data that was not included in the training set. In the next episode, we're going to use this
VGG 16 model for inference to predict on the cat and dog images in our test set. And given the accuracy that we are seeing
on the validation set here, we should expect to see some really good results on our test
set as well. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. In this episode, we'll use our fine tuned
VGG 16 model for inference to predict on images in our test set. All right, we are jumping right back into
our Jupyter Notebook. Again, making sure that all the code is in
place and has run it from the previous episodes as we will be building on the code that has
been run there. So first things first is that we are going
to be using the model to get predictions from our test set. So to do that we call model dot predict which
we have been exposed to in the past. Recall that this model here is our fine tuned
VGG 16 model. And to predict we are passing our test set
which we have stored in test batches. And we are setting our verbosity level to
zero as we do not want to see any output any output from our predictions. Now recall Previously, we talked about how
we did not shuffle the test set for our cat and dog image data set. And that is because we want to be able to
access the classes here and uncheck old order so that we can then pass in the uncheck old
classes that correspond to the old sorry, the shuffled labels that correspond to the
ns shuffled test data, we want to be able to have those in a one to one mapping, where
the labels actually are the correct and shuffled labels for the unsettled data samples, we
want to be able to pass those to our confusion matrix. So this is the same story as we saw whenever
we were using our CNN that we built from scratch a few episodes back, we did the same process,
we're using the same dataset recall. And so now we are plotting this confusion
matrix in the exact same manner as before. So this is actually the exact same line that
we used to plot the confusion matrix A few episodes back when we plotted it on this same
exact test set for the CN n that we built from scratch. And just a reminder from last time recall
that we looked at the class indices of the test batches. So that we could get the correct order of
our cat and dog classes to use for our labels for our confusion matrix. So we are again doing the same thing here. And now we are calling the psychic learn plot
confusion matrix which you should have defined earlier in your notebook from a previous episode. And to plot confusion matrix we are passing
in our confusion matrix, and our labels defined just above as well as a general title for
the confusion matrix itself. So now, let's check out the plot so that we
can see how well our model did on these predictions. So this is what the third or fourth time that
we've used a confusion matrix in this course so far, so you should be pretty normalized
to how to read this data. So recall, the quick and easy way is to look
from the top left to the bottom right along this diagonal, and we can get a quick overview
of how well the model did. So the model correctly predicted a dog for
49 times for images that were truly dogs. And it correctly predicted a cat 47 times
for images that truly were cats. So we can see that one time it predicted a
cat when it was actually a dog. And three times it predicted a dog when images
were actually cats. So overall, the model incorrectly predicted
four samples, so that gives us 96 out of 100. Correct. Or let's see 96 correct predictions out of
100 total predictions. So that gives us an accuracy rate on our test
set of 96%. Not surprising given what we saw in the last
episode for the high level of accuracy that our model had on the validation set. So overall, this fine tuned VGG 16 model does
really well at generalizing on data that it had not seen during training a lot better
than our original model for which we build from scratch. Now recall that we previously discussed that
the overall fine tuning approach that we took to this model was pretty minimal since cat
and dog data was already included in the original training set for the original VGG 16 model. But in upcoming episodes, we are going to
be doing more fine tuning, more fine tuning than what we saw here for VGG 16. As we will be fine tuning another well known
pre trained model, but this time for a completely new data set that was not included in the
original data set that it was trained on. So stay tuned for that. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. And this episode we'll introduce mobile nets
a class of a lightweight deep convolutional neural networks that are much smaller and
faster and size than many of the mainstream popular models that are really well known. neural nets are a class of small, low power,
low latency models that can be used for things like classification, detection, and other
things that cn ns are typically good for. And because of their small size, these models
are considered great for mobile devices, hence the name mobile nets. So I have some stats taken down here. So just to give a quick comparison, in regards
to the size, the size of the full VGG 16 network that we've worked with in the past few episodes
is about 553 megabytes on disk. So pretty large, generally speaking, the size
of one of the currently largest mobile nets is only about 17 megabytes. So that's a pretty huge difference, especially
when you think about deploying a model to run on a mobile app, for example, this vast
size difference is due to the number of parameters or weights and biases contained in the model. So for example, let's see VGG 16, as we saw
previously has about 138 million total parameters. So a lot. And the 17 megabytes mobile net that we talked
about, which was the largest mobile net currently has only about 4.2 million parameters. So that is much much smaller on a relative
scale than VGG 16 with 138 million. Aside from the size on disk being a consideration
when it comes to comparing mobile nets to other larger models. We also need to consider the memory as well. So the more parameters that a model has The
more space in memory it will be taking up also. So while mobile nets are faster and smaller
than the competitors of these big hefty models like VGG 16, there is a catch or a trade off,
and that trade off is accuracy. So, mobile nets are not as accurate as some
of these big players like VGG 16, for example, but don't let that discourage you. While it is true that mobile nets aren't as
accurate as these resource heavy models like VGG 16, for example, the trade off is actually
pretty small, with only a relatively small reduction in accuracy. And in the corresponding blog for this episode,
I have a link to a paper that goes more in depth to this relatively small accuracy difference. If you'd like to check that out further, let's
now see how we can work with mobile nets and code with Kerris. Alright, so we are in our Jupyter Notebook. And the first thing we need to do is import
all of the packages that we will be making use of which these are not only for this video,
but for the next several videos where we will be covering mobilenet. And as mentioned earlier in this course, a
GPU is not required. But if you're running a GPU, then you want
to run this cell. This is the same cell that we've seen a couple
of times already in this course earlier, where we are just making sure that TensorFlow can
identify our GPU if we're running one, and it is setting the memory growth to true if
we do have a GPU. So again, don't worry, if you don't have a
GPU, don't worry. But if you do, then run this cell, similar
to how we downloaded the VGG 16 model, when we were working with it in previous episodes,
we take that same approach to download mobile net here. So we call TF dot cares applications dot mobile
net dot mobile net, and that the first time we call it is going to download mobile net
from the internet. So you need an internet connection. But subsequent calls to this are just going
to be getting the model from a saved model on disk and loading it into memory here. So we are going to do that now and assign
that to this mobile variable. Now mobile net was originally trained on the
image net library, just like VGG 16. So in a few minutes, we will be passing some
images to mobile net that I've saved on this. They are not images from the image net library,
but they are images of some general things. And we're just going to get an idea about
how mobile net performs on these random images. But first, in order to be able to pass these
images to mobile net, we're going to have to do a little bit of processing first. So I've created this function called prepare
image. And what it does is it takes a file name,
and it then inside the function, we have an image path, which is pointing to the location
on disk, where I have these saved image files that we're going to use to get predictions
from mobile net. So we have this image path defined to where
these images are saved, we then load the image by using the image path, and appending, the
file name that we pass in. So say if I pass in image, one dot png here,
then we're going to take that file path, append one PNG here, pass that to load image, and
pass this target size of 224 by 224. Now, this load image function is from the
keras API. So what we are doing here is we are just taking
the image file, resizing it to be of size 224 by 224, because that is the size of images
that mobile net expects, and then we just take this image and transfer it to be in a
format of an array, then we expand the dimensions of this image, because that's going to put
the image in the shape that mobilenet expects. And then finally, we pass this new processed
image to our last function, which is TF Kerris, that application stop mobile net, that pre
process input. So this is a similar function to what we saw
a few episodes back when we were working with VGG 16, it had its own pre process input. Now mobile net has its own pre process input
function, which is processing images in a way that mobile net expects, so it's not the
same way as VGG 16. Actually, it's just scaling all the RGB pixel
values to be on a scale instead of from zero to 255, to be on a scale from minus one to
one. So that's what this function is doing. So overall, this entire function is just resizing
the image and putting it into an array format with expanded dimensions and then mobile net,
processing it and then returning this processed image. Okay, so that's kind of a mouthful, but that's
what we got to do two images before we pass them to mobile net. Alright, so we will just Define that function. And now we are going to display our first
image called one dot png, from our mobile net samples directory that I told you, I just
set up with a few random images. And we're going to plot that to our Jupyter
Notebook here. And what do you know, it's a lizard. So that is our first image. Now, we are going to pass this image to our
prepare image function that we defined right above that we just finished talking about. So we're going to pass that to the function
to pre process the image accordingly, then we are going to pass the pre processed image
returned by the function to our mobile net model, we're going to do that by calling predict
on the model, just like we've done in previous videos, when we've called predict on models
to use them for inference, then after we get the prediction for this particular image,
we are then going to give this prediction to this image net utils dot decode predictions
function. So this is a function from cares, that is
just going to return the top five predictions from the 1000 possible image net classes. And it's going to tell us the top five that
mobile net is predicting for this image. So let's run that and then print out those
results. And maybe you can have a better idea of what
I mean once you see the printed output. So we run this, and we have our output. So these are the top five in order results
from the imagenet classes that mobile net is predicting for this image. So it's assigning a 58% probability to this
image of being a an American chameleon 28% probability to green lizard 13% to a gamma. And then we have some small percentages here
under 1% for these other two types of lizards. So it turns out, if you're not aware, and
I don't know how you could not be aware of this, because everyone should know this. But this is an American chameleon. And I don't know I've always called these
things green and OLS. But I looked it up. And they're also known as American chameleon. So mobilenet got it right. So yeah, it assigned at 58% probability that
was the highest, most probable class. Next was green lizard. So I'd say that that is still really good
for that to be your second place. I don't know if green lizard is supposed to
be more general. But and then a gamma, which is also a similar
looking lizard, if you didn't know. So between these top three classes that it
predicted, this is almost 100%. between the three of these, I would say mobile
net did a pretty good job on this prediction. So let's move on to number two. Alright, so now we are going to plot our second
image. And this is a cup of well, I originally thought
that it was espresso, and then someone called it a cup of cappuccino. So let's say I'm not sure I'm not a coffee
connoisseur, although I do like both espresso and cappuccinos. This looks like it has some cream in it. So hey, now we're going to go through the
same process of what we just did for the lizard image where we are passing that image passing
this new image to our prepare image function, so that they undergoes all of the pre processing,
then we are going to pass the pre processed image to the predict function for our mobile
net model. Then finally, we are going to get the top
five results from the predictions for this model relative in regards to the imagenet
classes. So let's see. All right, so according to mobile net, this
is an espresso, not a cappuccino. So I don't know. But it predicts 99% probability of espresso
as being the most probable class for this particular image. And I'd say that that is pretty reasonable. So let me know in the comments, what do you
think is this espresso or cappuccino? I don't know if mobile or if image net had
a cappuccino class. So if it didn't, then I'd say that this is
pretty spot on. But you can see that the other the other four
predictions are all less than 1%. But they are reasonable. I mean, the second one is cup, third eggnog,
fourth coffee mug. Fifth, wooden spoon gets a little bit weird,
but there is wood, there is a circular shape going on here. But these are all under 1%. So they're pretty negligible. I would say mobilenet did a pretty great job
at giving a 99% probability to espresso for this image. All right, we have one more sample image. So let's bring that in. And this is a strawberry or multiple strawberries
if you call if you consider the background. So same thing we are pre processing the strawberry
image. Then we are getting a prediction For a mobile
net for this image, and then we are getting that top five results for the most probable
predictions among the 1000 image net classes. And we see that mobile net with 99.999% probability
classifies this image as a strawberry correctly, so very well, and the rest are well, well
under 1%. But they are all fruits. So interesting. Another really good prediction from mobile
net. So even with the small reduction in accuracy
that we talked about, at the beginning of this episode, you can probably tell from just
these three random samples that that reduction is maybe not even noticeable when you're just
doing tests like the ones that we just ran through. So in upcoming episodes, we're actually going
to be fine tuning this mobile net model to work on a custom data set. And this custom data set is not one that was
included in the original image net library, it's going to be a brand new data set, we're
going to do more fine tuning than what we've done in the past. So stay tuned for that. Be sure to check out the blog and other resources
available for this episode on deep lizard calm, as well as the deep lizard hive mind
where you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Mandy from deep lizard. And this episode, we'll be preparing and processing
a custom data set that we'll use to fine tune mobilenet using tensor flows keras API. Previously, we saw how well VGG 16 was able
to classify and generalize on images of cats and dogs. But we noted that VGG 16 was actually already
trained on cats and dogs, so it didn't take a lot of fine tuning at all to get it to perform
well on our cat and dog dataset. Now, with mobile net, we'll be going through
some fine tuning steps as well. But this time, we'll be working with a data
set that is completely different from the original image net library for which mobile
net was originally trained on. This new data set that we'll be working with
is a data set of images of sign language digits, there are 10 classes total for this dataset,
ranging from zero to nine, where each class contains images of the particular sign for
that digit. This data set is available on kaggle as grayscale
images, but it's also available on GitHub as RGB images. And for our particular task, we'll be using
the RGB images. So check out the corresponding blog for this
episode on Peebles or.com, to get the link for where you can download the data set yourself. So after downloading the data set, the next
step is to organize the data on disk. And this will be a very similar procedure
to what we saw for the cat and dog data set earlier in the course. So once we have the download, this is what
it looks like. It is a zipped folder called sign language
digits data set master. And the first thing we want to do is extract
all the contents of this directory. And when we do that, we can navigate inside
until we get to the dataset directory. And inside here we have all of the classes. And each of these directories has the corresponding
images for this particular class. So the what we want to do is we want to grab
zero through nine. And we are going to Ctrl x or cut these directories. And we're going to navigate back to the root
directory, which is here, we're going to place all of the directories zero through nine in
this route. And then we're going to get rid of everything
else by deleting. So now, one last thing. So I'm just going to delete this master. I don't necessarily like that. So we have sign language digits data set. And directly within we have our nested directories
consisting of zero through nine, each of which has our training data inside then the last
step is to move the sign language digits data set directory to where you're going to be
working. So for me that is relative to my Jupyter Notebook,
which lives inside of this deep learning with cares directory, I have a data directory and
I have the sign language digits data set located here. Now everything else that will do to organize
and process the data will be done programmatically in code. So we are in our Jupyter Notebook. Make sure that you do have the imports that
we brought in last time still in place because we'll be making use of those Now, this is
just the class breakdown to let you know how many images are in each class. So across the classes zero through nine, the
there are anywhere from 204 to 208 samples in each class. And then here I have just an explanation of
how your data should be structured up to this point. Now the rest of the organization, we will
do programmatically with this script here. So this script is going to organize the data
into train valid and test directories. So recall right now, we just have all the
data located in the corresponding classes of zero through nine. But the data is not broken up yet into the
separate data sets of train, test and validation. So to do that, we are first changing directory
into our sign language digits, data set directory. And then we are checking to make sure that
the directory structure that we're about to setup is not already in place on disk. And if it's not, then we make a train valid
and test directory, right within sign language digits dataset. So next we are then iterating over all of
the directories within our sign language digits dataset directory. So recall, those are directories labeled zero
through nine. So that's what we're doing in this for loop
with this range zero to 10. That's going from directory zero to nine,
and moving each of these directories into our train directory. And after that, we're then making two new
directories, one inside of valid with the directory for whatever place we're at in the
loop. So if we are on run number zero, then we are
making a directory called zero with invalid, and a directory called zero within test. And if we are on run number one, then we will
be creating a directory called one with invalid and one within test. So we do this whole process of moving each
class in to train and then creating each class directory empty, within valid and test. So at this point, let's suppose that we are
on the first run in this for loop. So in this range, here we are following add
a number zero. So here on this line, what we are doing is
we are sampling 30 samples from our train slash zero directory. Because up here we created or we moved the
class directory, zero into train. And now we're going to sample 30 random samples
from the zero directory within train, then, so we're calling these valid samples, because
these are going to be the samples that we move into our validation set. And we do that next. So for each of the 30 samples that we collected
from the training set randomly in this line, we're now going to be moving them from the
training set to the validation, zero set to the validation set in Class Zero. And then we do similarly the same thing for
the test samples, we randomly select five samples from the train slash zero directory. And then we move those five samples from that
train slash zero directory into the test zero directory. So we just ran through that loop using the
Class Zero as an example. But that's going to happen for each class
zero through nine. And just in case you have any issue with visualizing
what that script does, if we go into sign language, digits dataset, now let's check
out how we have organized the data. So recall we previously had classes zero through
nine all listed directly here within the this route folder. Now we have train valid and test directories. And within train, we moved the original zero
through nine directories all into train. And then once that was done, then we sampled
30 images from each of these classes and moved them into the valid directory classes. And then similarly, we did the same thing
for the test directory. And then once we look in here, we can see
that the test directory has five samples for zero, we see that it has five samples for
one. If we go check out the valid directories,
look at zero, it should have 30 zeros so See that here 30. So every valid directory has 30 samples, and
the training directory classes have not necessarily uniform samples because remember, we saw that
the number of samples in each class ranged anywhere from 204 to 209, I think. So the number of images within each class
directory for the training sample will differ slightly by maybe one or two images. But the number of images in the classes for
the validation and test directories will be uniform since we did that programmatically
with our script here. So checking this dataset out on disk, we can
see that this is exactly the same format, for which we structured our cat and dog image
data set earlier in the course, now, we're just dealing with 10 classes instead of two. And when we downloaded our data, it was in
a slightly different organizational structure than the cat and dog data that we previously
downloaded. Alright, so we have obtained the data, we've
organized the data. Now the last step is to pre process the data. So we do that first by starting out by defining
where our train valid and test directories live on disk. So we supply those paths here. And now we are setting up our directory iterators,
which we should be familiar with, at this point. Given this is the same format that we processed
our cat and dog images whenever we build our cnn from scratch, and we use the fine tuned
VGG 16 model. So let's focus on this first variable train
batches. First, we are calling image data generator
dot flow from directory which we can't quite see here. But we'll scroll into a minute to image data
generator, we are passing this pre processing function, which in this case is the mobile
net pre processing function. Recall, we saw that already in the last episode. And there we discussed how this pre processes
images in such a way that it scales the image data to be rather than on a scale from zero
to 255, to instead be on a scale from minus one to one. So then on image data generator, we call flow
from directory, which is blowing off the screen here. And we set the directory equal to train path,
which we have defined just above saying where our training data resides on disk, we are
setting the target size to 224 by 224, which recall just resizes any training data to be
a height or to have a height of 224 and a width of 224, since that is the image size
that mobile net expects, and we are setting our batch size equal to 10. So we've got the same deal for valid batches. And for test batches as well, everything exactly
the same except for the paths deferring to show where the validation and test sets live
on disk. And we are familiar now that we specify shuffle
equals false only for our test set so that we can later appropriately plot our prediction
results to a confusion matrix. So we run this cell, and we have output that
we have 17 112 images belonging to 10 classes. So that corresponds to our training set 300
images to 10 classes for our validation set, and 50 images belonging to 10 classes for
our test set. So I have this cell with several assertions
here, which just assert that this output that we find, or that we have right here is what
we expect. So if you are not getting this, then it's
perhaps because you are pointing to the wrong location on disk. So a lot of times if you're pointing to the
wrong location, then you'll probably get found zero images belonging to 10 classes. So you just need to check your path to where
your data set resides if you get that. Alright, so now our data set has been processed. We're now ready to move on to building and
fine tuning our mobile net model for this data set. Be sure to check out the blog and other resources
available for this episode on depot's or.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Mandy from Blizzard. In this episode, we'll go through the process
of fine tuning mobile net for a custom data set. Alright, so we are jumping right back into
our Jupyter Notebook from last time. So make sure your code is in place from then
since we will be building directly on that now. So first thing we're going to do is we are
going to import mobile net just as we did in the first mobile net episode by calling
TF Kerris, the applications that mobile net dot mobile net, remember, if this is your
first time running this line, then you will need an internet connection to download it
from the internet. Now, let's just take a look at the model that
we downloaded. So by calling model dot summary, we have this
output here, that is showing us all of these lovely layers included in mobile net. So this is just to get a general idea of the
model because we will be fine tuning it. So the fine tuning process now is going to
start out with us getting all of the layers up to the sixth to last layer. So if we scroll up and look at our output
123456. So we are going to get all of the layers up
to this layer, and everything else is not going to be included. So all of these layers are what we are going
to keep and transfer into a new model, our new fine tune model. And we are not going to include the last five
layers. And this is just a choice that I came to after
doing a little experimenting and testing, the number of layers that you choose to include
versus not include whenever you're fine tuning, a model is going to come through experimentation
and personal choice. So for for us, we are getting everything from
this layer and above. And we are going to keep that in our new fine
tuned model. So let's scroll down. So we're doing that by calling mobile dot
layers, passing in that six to last layer and output, then we are going to create a
variable called output. And we're going to set this equal to a dense
layer with 10 units. So this is going to be our output layer. That's why it's called output and 10 units
due to the nature of our classes, ranging zero through nine. And this, as per usual is going to be followed
by a softmax activation function to give us a probability distribution among those 10
outputs. Now, this looks a little strange. So we're calling this and then we're like
putting this x variable next to it. So what's this about? Well, the mobile net model is actually a functional
model. So this is from the functional API from Kerris,
not the sequential API. So we kind of touched on this a little bit
earlier, whenever we fine tuned VGG 16, we saw that VGG 16 was also indeed a functional
model. But when we fine tuned it, we iterated over
each of the layers and added them to a sequential model at that point, because we weren't ready
to introduce the functional model yet. So here, we are going to continue working
with a functional model type. So that's why we are basically taking all
of the layers here up to the sixth the last. And whenever we create this output layer,
and then call this the previous layers stored in x here, that is the way that the functional
model works, we're basically saying to this output layer, pass all of the previous layers
that we have stored in X, up to this six to last layer and mobile net. And then we can create the model using these
two pieces x and output by saying by calling model, which is indeed a functional model
when specified this way, and specifying inputs equals mobile dot input. So this is taking the input from the original
mobile net model, and outputs equals output. So at this point, output is all of the mobile
net model up until the six the last layer plus this dense output layer. Alright, so let's run these two cells to create
our new model. Alright, so our new models now been created. So the next thing we're going to do is we're
going to go through and freeze some layers. So through some experimentation on my own,
I have found that if we freeze, all except for the last 23 layers, this appears to yield
some decent results. So 23 is not a magic number here, play with
this yourself and let me know if you get better results. But basically, what we're doing here is we're
going through all the layers in the model, and by default, they are all trainable. So we're saying that we want only the last
23 layers to be trainable. All the layers except for the last one, a
three, make those not trainable. And just so that You understand, relatively
speaking, there are 88 total layers in the original mobile net model. And so we're saying that we don't want to
train, or that we only want to train the last 23 layers in our new model that we built just
above. Recall, this is much more than we earlier
trained with our fine tuned VGG 16 model where we only train the output layer. So let's go ahead and run that now. And now let's look at a summary of our new
fine tuned model. So here, if we just glance it looks basically
the same as what we saw from our original summary. But we will see here that now now our model
ends with this global average pooling 2d layer, which recall before was the sixth the last
layer, where I said that we would include that layer and everything above it. So all the layers below the global average
pooling layer that we previously saw and the original mobile net summary are now gone. And instead of an output layer with 1000 classes,
we now have an output layer with 10 classes from the or corresponding to the 10 potential
output classes that we have for our new sign language digits dataset. If we compare the total parameters, and how
they're split amongst trainable and non trainable parameters, in this model with the original
mobile, mobile that model, then we will see a difference there as well. Alright, so now this model has been built,
we are ready to train the model. So the code here is nothing new. We are compiling the model in the same exact
fashion using the atom optimizer 0.00, with our little bit of luck, 0.0001 learning rate,
categorical categorical cross entropy loss and accuracy as our metric. So this, we have probably seen 2 million times
up to this point in this course. So that's exactly the same. Additionally, we have exactly the same fit
function that we are running to train the model. So we're passing in our train batches as our
data set, we are passing in validation batches as our validation data. And we are running this for 10 epochs. Actually, we're going to go ahead and run
this for 30. I had 10 here just to save time earlier from
testing, but we're going to run this for 30 epochs. And we are going to set verbose equal to two
to get the most verbose output. Now, let's see what happens. All right, so our model just finished training
over 30 epochs. So let's check out the results. And if you see this output, and you're wondering
why the first output took 90 seconds, and then we get or the first epoch took 90 seconds,
and then we got it down to five seconds, just a few later, it's because I realized that
I was running on battery and not on my laptop being plugged in. So once we plug the laptop in, it beefed up
and started running much quicker. So let's scroll down and look basically just
like how we ended here. So we are at 100% accuracy on our training
set, and 92% accuracy on our validation set. So that is pretty freakin great considering
the fact that this is a completely new dataset, not having images that were included in the
original image net model. So these are pretty good results. A little bit overfitting here since our validation
accuracy is lower than our training accuracy. So if we wanted to fix that, then we could
take some necessary steps to combat that overfitting issue. But if we look earlier at the earlier epochs,
to see what kind of story is being told here, on our first epoch, our training accuracy
actually starts out at 74% among 10 classes, so that is not bad for a starting point. And we quickly get to 100% on our training
accuracy, just within four epochs. So that's great. But you can see that, at that point, we're
only at 81% accuracy for our validation set. So we have a decent amount of overfitting
going on earlier on here. And then, as we progress through the training
process, the overfitting is becoming less and less of a problem. And you can see that we actually at this point,
if we just look at the last eight epochs that have run here, we've not even stalled out
yet on our validation loss. It's not stalled out in terms of decreasing
and our value. Our validation accuracy has not stalled out
in terms of increasing so perhaps just running more epochs on this data will eradicate the
overfitting problem. Otherwise, you can do some tuning yourself
changing some hyper parameters around do a different structure of fine tuning on the
model. So freeze Morrow. Less than the last 23 layers for during the
fine tuning process, or just experiment yourself. And if you come up with something that yields
better results than this, then put it in the comments and let us know. So we have one last thing we want to do with
our fine tuned mobile net model, and that is use it on our test set. So we are familiar, you know the drill with
this procedure at this point, we have done it several times. So we are now going to get predictions from
the model on our test set. And then we are going to plot those predictions
to a confusion matrix. So we are first going to get our true labels
by calling test batches classes. We're then going to gain predictions from
the model by calling model dot predict and passing in our test set stored in test batches
here, setting verbose equal to zero, because we do not want to see any output from the
predictions. And now we are creating our confusion matrix. Using socket learns confusion matrix that
we imported earlier, we are setting our true labels equal to the test labels that we defined
just here above, we are setting our predicted labels to the arg max of our predictions across
x is one. And now we are going to check out our class
indices of the test batches just to make sure they are what we think they are. And they are of course, classes labeled zero
through nine. So we define our labels for our confusion
matrix here, accordingly. And then we call our plot confusion matrix
that we brought in earlier in the notebook and that we have used 17,000 Times up to this
point in this course, and we are passing in our confusion matrix for what to plot, we
are passing in our labels that we want to correspond to our confusion matrix, and giving
our confusion matrix, the very general title of confusion matrix, because hey, that's what
it is. So let's plot this. Oh, no. So plot confusion matrix is not defined? Well, it definitely is just somewhere in this
notebook, I must have skipped over here we go. Nope, here we are. Alright, so here's where plot confusion matrix
is defined. Let's bring that in. Now it is defined and run back here. So looking from the top left to the bottom
right, and now we see that the model appears to have done pretty well. So we have 10 classes total, with five samples
per class. And we see that we have mostly we have all
fours and fives across this diagonal, meaning that most of the time the model predicted
correctly. So for example, for a nine, five times out
of five, the model predicted an image was a nine when it actually was for an eight. However, only four out of five times did the
model correctly predict looks like one of the times the model, let's see, predicted
a one when it should have been an eight. But in total, we've got 12345 incorrect predictions
out of 50 total. So that gives us a 90% accuracy rate on our
test set, which is not surprising for us, given the accuracy that we saw right above
on our validation set. So hopefully this series on a mobile net has
given you further insight to how we can fine tune models for custom data set and use transfer
learning. To use the information that a model gained
from its original training set on a completely new task in the future. Be sure to check out the blog and other resources
available for this episode on the poser.com as well as the deep lizard hive mind where
you can gain access to exclusive perks and rewards. Thanks for contributing to collective intelligence. Now let's move on to the next episode. Hey, I'm Andy from deep lizard. And in this episode, we're going to learn
how we can use data augmentation on images using tensor flows. keras API. Data augmentation occurs when we create a
new data by making modifications to some existing data. We're going to explain the idea a little bit
further before we jump into the code. But if you want a more thorough explanation,
then be sure to check out the corresponding episode in the deep learning fundamentals
course on depot's or.com. For the example that will demonstrate in just
a moment, the data we'll be working with is image data. And so for image data specifically, data augmentation
would include things like flipping the image, either horizontally or vertically. It could include rotating the image, it could
include changing the color of the image, and so on. One of the major reasons we want to use data
augmentation is to simply just get access to more data. So a lot of times that not having access to
enough data is an issue that we can run into. And we can run into problems like overfitting
if our training data set is too small. So that is a major reason to use data augmentation
is to just grow our training set, adding augmented data to our training set can in turn reduce
overfitting as well. Alright, so now let's jump into the code to
see how we can augment image data using Kerris. Alright, so the first thing that we need to
do, of course, is import all of the packages that we will be making use of for this data
augmentation. Next, we have this plot images function, which
we've introduced earlier in the course, this is directly from tensor flows website. And it just allows us to apply images to our
Jupyter Notebook. So check out the corresponding blog for this
episode on deep lizard calm to get the link so that you can go copy this function yourself. Alright, so next we have this variable called
Gen, which is an image data generator. And recall, we've actually worked with image
data generators. earlier in the course, whenever we create our train at test and valid batches
that we were using for training seeing it and with this, though, we are using image
data generator in a different way. Here, we are creating this generator that
we are specifying all of these parameters like rotation, range, width, shift range,
heights of range, sheer range, zoom range channels of drains in horizontal flip. So these are all options that allow us to
augment our image data. So you need to check the documentation on
tensor flows website to get an idea of the units for these parameters, because they're
not all the same. So for example, rotation range here, I believe
is measured in radians. Whereas like this width, shift range is measured
as a percentage of the width of the image. So these are all ways that we can augment
image data. So this is going to be rotating the image
by 10 radians, this is going to be shifting the width of the image by 10%, the height
by 10%. Zooming in shifting the color channels, flipping
the image, so all sorts of things. So just all different ways that we can augment
image data. So there are other options to just be sure
to check out the image data generator documentation if you want to see those options. But for now, these are the ones that we'll
be working with. So we store that in our Jen variable. Next, we are going to choose a random image
from a dog directory that we had set up earlier in the course under the dogs versus cats dataset,
we're going to go into train into dog, then we are going to choose a random image from
this directory. And then we're going to set this image path
accordingly. So we're just going to set this to point to
whatever that chosen image was on disk, then we have this assertion here just to make sure
that that is indeed a valid file before we proceed with the remaining code. Now we're just going to plot this image to
the screen. And I'm not sure what this is going to be
since it is a random image from disk. So that is a cute looking. I don't know. Beagle, Beagle, basset hound mix, I don't
know, what do you guys think. So this is the random dog that was selected
from from our dog trained directory. Now we are creating this new variable called
all etre. And to our image data generator that we created
earlier called Jin, we're calling this flow function and passing our image in to flow. And this is going to generate a batch of augmented
images from this single image. So next, we are defining this og images variable,
which is going to give us 10 samples of the augmented images created by og etre here. Lastly, we are going to plot these images
using the plot images that we defined just above. All right, so let's zoom in a bit. All right, we can see now that first let's
take a look at our original dog Alright, so that is the original image. Now we can look and see that given those things
like rotation and width shift, and everything that we defined earlier, whenever we defined
whenever we defined our image data generator that has now been done to All of these images
in one random way or another so we can see kind of what's happening here. So for example, this particular image looks
like it has been shifted up some because we can see that the head of the dog is being
cut off a little bit. And this image, or let's see which way was
the dog originally facing, so its head is facing to the right. So yeah, so this image here has been flipped,
the dog is now facing to the left. And this image appears to be shifted down
some. And so some of these, like this one looks
like it's been rotated. So we can get an idea just by looking at the
images, the types of data augmentation that have been done to them. And we can see how this could be very helpful
for growing our dataset in general, because for example, say that we have a bunch of images
of dogs, but for whatever reason, they're all facing to the left. And we want to deploy our model to to be some
general model that will classify different dogs. But the types of images that will be accepted
by this model later might have dogs facing to the right as well. Well, maybe our model totally implodes on
itself whenever it receives a dog facing to the right. So through data augmentation, given the fact
that it is very normal for dogs to face left or right, we could augment all of our dog
images to have the data or to have the dogs also face in the right direction, as well
as the left direction to have a more have a more dynamic data set. Now, there is a note in the corresponding
blog for this episode on people's are calm, giving just the brief instruction for how
to save these images to disk if you want to save the images after you augment them, and
then add them back to your training set. So check that out if you're interested in
doing that to actually grow your training set. Rather than just plot the images here and
your Jupyter Notebook. By the way, we are currently in Vietnam filming
this episode. If you didn't know we also have a vlog channel
where we document our travels and share a little bit more about ourselves. So check that out at beetles or vlog on YouTube. Also, be sure to check out the corresponding
blog for this episode, along with other resources available on the blizzard.com. And check out the people's hive mind where
you can gain exclusive access to perks and rewards. Thanks for contributing to collective intelligence. I'll see you next time.