Hello, everybody, and welcome to an absolutely
massive TensorFlow slash machine learning slash artificial intelligence course. Now,
please stick with me for this short introduction, as I am going to give you a lot of important
information regarding the course content, the resources for the course, and what you
can expect after going through this. Now, first, I will tell you who this course is
aimed for. So this course is aimed for people that are beginners in machine learning and
artificial intelligence, or maybe have a little bit of understanding but are trying to get
better, but do have a basic fundamental knowledge of programming and Python. So this is not
a course you're going to take if you haven't done any programming before, or if you don't
know any Python syntax in general, it's gonna be highly advised that you understand the
basic syntax behind Python, as I'm not going to be explaining that throughout this course.
Now, in terms of your instructor for this course, that is going to be me, my name is
Tim, some of you may know me, as tech with Tim, from my YouTube channel, right teach
all kinds of different programming topics. And I've actually been working with Free Code
Camp and posted some of my series on their channel as well. Now let's get into the course
break down and talk about exactly what you're going to learn and what you can expect from
this course. So as this course is geared towards beginners, and people just getting started
in the machine learning and AI world, we're gonna start by breaking down exactly what
machine learning and artificial intelligence is. So talking about what the differences
are, between them, the different types of machine learning, reinforcement learning,
for example, versus neural networks versus simple machine learning, we're gonna go through
all those different differences. And then we're going to get into a general introduction
of TensorFlow. Now, for those of you that don't know, TensorFlow is a module developed
and maintained by Google, which can be used within Python to do a ton of different scientific
computing, machine learning and artificial intelligence applications. We're gonna be
working with that through the entire tutorial series. And after we do that general introduction
to TensorFlow, we're going to get into our core learning algorithms. Now, these are the
learning algorithms that you need to know before we can get further into machine learning,
they build a really strong foundation, they're pretty easy to understand and implement. And
they're extremely powerful. After we do that, we're going to get into neural networks discuss
all the different things that go into how neural networks work, how we can use them,
and then do a bunch of different examples. And then we're going to get into some more
complex aspects of machine learning and artificial intelligence and get to convolution neural
networks, which can do things like image recognition and detection. And then we're going to get
into recurrent neural networks, which are going to do things like natural language processing,
chatbots, text processing, all those different kinds of things, and finally ended off with
reinforcement learning. Now, in terms of resources for this course, there are a ton, and what
we're going to be doing to make this really easy for you, and for me, is doing everything
through Google Collaboratory. Now, if you haven't heard of Google Collaboratory, essentially,
it's a collaborative coding environment that runs an iPython Notebook in the cloud, on
a Google machine where you can do all of your machine learning for free. So you don't need
to install any packages, you don't need to use Pip, you don't need to get your environment
set up, all you need to do is open a new Google Collaboratory window, and you can start writing
code. And that's what we're gonna be doing in this series. If you look in the description
right now, you will see links to all of the notebooks that I use throughout this guide.
So if there's anything that you want to cleared up, if you want the code for yourself, if
you want just text based descriptions of the things that I'm saying, you can click those
links and gain access to them. So with that being said, I'm very excited to get started,
I hope you guys are as well. And let's go ahead and get into the content. So in this first section, I'm going to spend
a few minutes discussing the difference between artificial intelligence, neural networks and
machine learning. And the reason we need to go into this is because we're going to be
covering all of these topics throughout this course. So it's vital that you guys understand
what these actually mean. And you can kind of differentiate between them. So that's what
we're going to focus on now. Now, quick disclaimer here, just so everyone's aware, I'm using
something called Windows Inc. This just default comes with Windows, I have a drawing tablet
down here. And this is what I'm going to be using for some of the explanatory parts, but
there's no real coding, just to kind of illustrate some concepts and topics to you. Now I have
very horrible handwriting, I'm not artistic whatsoever, programming is more definitely
more of my thing than, you know, drawing and doing diagrams and stuff. But I'm going to
try my best. And this is just the way that I find I can convey information the best to
you guys. So anyways, let's get started and discuss the first topic here, which is artificial
intelligence. Now, artificial intelligence is a huge hype nowadays. And it's funny because
a lot of people actually don't know what this means. Or they tried to tell people that what
they've created is not artificial intelligence, when in reality, it actually is. Now the kind
of formal definition of AI and I'm just gonna read it off of my slide here to make sure
that I'm not messing this up, is the effort to automate intellectual tasks normally performed
by humans. Now, that's a fairly big definition, right? What is considered an intellectual
task and, you know, really, that doesn't help us too much. So what I'm going to do is bring
us back to when AI was first created to kind of explain to you how AI has evolved and what
it really started out being so back in 1950 There was kind of a question being asked by
scientists and researchers, can computers think, can we get them to figure things out?
Can we get away from hard coding? And you know, having like, Can we get a computer to
think can it do its own thing? So that was kind of the question that was asked. And that's
when the term artificial intelligence was kind of coined and created. Now back then
AI was simply a predefined set of rules. So if you're thinking about an AI for maybe like
tic tac toe, or an AI for chess, all they would have had back then, is predefined rules
that humans had come up with and typed into the computer in code. And the computer would
simply execute those set of rules and follow those instructions. So there was no deep learning
machine learning crazy algorithms happening, it was simply if you wanted the computer to
do something, you would have to tell it beforehand, say you're in this position. And this happens,
do this. And that's what AI was. And very good AI was simply just a very good set of
rules were a ton of different rules that humans had implemented into some program, you could
have AI programs that are stretching, you know, half a million lines of code, just with
tons and tons and tons of different rules that have been created for that AI. So just
be aware that AI does not necessarily mean anything crazy, complex or super complicated.
But essentially, if you're trying to simulate some intellectual task, like playing a game
that a human would do with a computer, that is considered AI, so even a very basic artificial
intelligence for Tic Tac Toe game where it plays against you, that is still considered
AI. And if we think of something like Pac Man, right, where we have, you know, our little
ghost, and this will be my rough sketch of a ghost, we have our Pac Man guy who will
just be this. Well, when we consider this ghost AI, what it does is it attempts to find
and kind of simulate how we get to Pac Man, right. And the way this works is just using
a very basic pathfinding algorithm. This has nothing to do with deep learning, or machine
learning or anything crazy. But this is still considered artificial intelligence, the computer
is figuring out how it can kind of play and do something by following an algorithm. So
we don't necessarily need to have anything Crazy, Stupid complex to be considered AI,
it simply needs to just be simulating some intellectual human behavior. That's kind of
the definition of artificial intelligence. Now, obviously, today, AI has evolved into
a much more complex field where we now have machine learning and deep learning and all
these other techniques, which is what we're going to talk about now. So what I want to
start by doing is just drawing a circle here. And I want to label this circle and say, A,
I like that. So this is going to define AI, because everything I'm about to put inside
of here is considered artificial intelligence. So now, let's get into machine learning. So
what I'm going to do is draw another circle inside of here. And we're going to label this
circle, ml for machine learning. Now notice I put this inside of the artificial intelligence
circle. This is because machine learning is a part of artificial intelligence. Now, what
is machine learning? Well, what we talked about previously, was kind of the idea that
AI used to just be a predefined set of rules, right? Where what would happen is, we would
feed some data, we would go through the rules by and then analyze the data with the rules.
And then we'd spit out some output, which would be you know, what we're going to do.
So in the classic example of chess, say, we're in check, what we pass that board information
to the computer, it looks at it sets of rules, it determines we're in check, and then it
moves us somewhere else. Now, what is machine learning in contrast to that? Well, machine
learning is kind of the first field that actually figuring out the rules for us. So rather than
us hard coding the rules into the computer, what machine learning attempts to do is take
the data and take what the output should be, and figure out the rules for us. So you'll often hear that, you know, machine learning requires
a lot of data. And you need a ton of examples and, you know, input data to really train
a good model. Well, the reason for that is because the way that machine learning works
is it generates the rules for us. We give it some input data, we give it what the output
data should be. And then it looks at that information and figures out what rules can
we generate, so that when we look at new data, we can have the best possible output for that.
Now, that's also why a lot of the times, machine learning models do not have 100% accuracy,
which means that they may not necessarily get the correct answer every single time.
And our goal when we create machine learning models is to raise our accuracy as high as
possible, which means it's going to make the fewest mistakes possible. Because just like
a human, you know, our machine learning models, which are trying to simulate, you know, human
behavior can make mistakes. But to summarize that, essentially machine learning the difference
between that and kind of, you know, algorithms and basic artificial intelligence is the fact
that rather get that rather than us the programmer giving it the rules. It figures out the rules
for us. And we might not necessarily know explicitly what the rules are when we look
at machine learning and create machine learning models. But we know that we're giving some
input data, we're giving the expected output data, and then it looks at all of that information
does some algorithms, which we'll talk about later on that and figures out the rules for
us. So that later when we give it some input data, and we don't know the output data, it
can use those rules that it's figured out from our examples and all that training data
that we gave it to generate some output. Okay, so that's machine learning. Now we've covered
AI and machine learning. And now it's time to cover neural networks or deep learning.
Now, this circle gets to go right inside of the machine learning right here, I'm just
gonna label this one and n, which stands for neural networks. Now, neural networks get
a big hype, they're usually what the first, you know, when you get into machine learning,
you want to learn neural networks are kind of like neural networks are cool, they're
capable of a lot. But let's discuss what these really are. So the easiest way to define a
neural network is it as a form of machine learning that uses a layered representation
of data. Now, we're not going to really understand this completely right now. But as we get further
in, that should start to make more sense as a definition. But what I need to kind of illustrate
to you is that in the previous example, where we just talked about machine learning, essentially
what we had is we had some input bubbles, which I'm going to define this these, we had
some set of rules that is going to be in between here, and then we had some output. And what
would happen is we feed this input to this set of rules, something happens in here, and
then we get some output. And then that is what you know, our program does, that's what
we get from the model, we pretty much just have two layers, we have kind of the input
layer, the output layer, and the rules are kind of just what connects those two layers
together. Now in neural networks, and what we call deep learning, we have more than two
layers. Now, I'm just trying to erase all this quickly. So I can show you that. So let's
say and all journalists want another color, because why not? If we're talking about neural
networks, what we might have, and this will vary, and I'll talk about this in a second
is the fact that we have an input layer, which will be our first layer of data, we could
have some layers in between this layer, that are all connected together. And then we could
have some output layer. So essentially, what happens is, our data is going to be transformed
through different layers, and different things are going to happen, there's gonna be different
connections between these layers. And then eventually, we will reach an output. Now it's
very difficult to explain neural networks without going completely in depth. So we'll
cover a few more notes that I have here. Essentially, in neural networks, we just have multiple
layers, that's kind of the way to think of them. And as we see machine learning, you
guys should start to understand this more. But just understand that we're dealing with
multiple layers. And a lot of people actually call this a multi stage information extraction
process. Now, I did not come up with that term. I think that's from a book or something.
But essentially, what ends up happening is we have our data at this first layer, which
is that input information, which we're going to be passing to the model that we're going
to do something with, It then goes to another layer, where it will be transformed, it will
change into something else, using a predefined kind of set of rules and weights that we'll
talk about later, then it will pass through all of these different layers were different
kind of features of the data, which again, we'll discuss in a second will be extracted
will be figured out will be found until eventually, we reach an output layer where we can kind
of combine everything we've discovered about the data into some kind of output that's meaningful
to our program. So that's kind of the best that I can do to explain neural networks without
going on to a deeper level, I understand that a lot of you probably don't understand what
they are right now. And that's totally fine. But just know that there are layered representation
of data, we have multiple layers of information. Whereas in standard machine learning, we only
have you know, one or two layers, and then artificial intelligence in general, we don't
necessarily have to have like a predefined set of layers. Okay, so that is pretty much it for neural networks, there's
one last thing I will say about them is that they're actually not modeled after the brain.
So a lot of people seem to think that neural networks are modeled after the brain and the
fact that you have neurons firing in your brain. And that can relate to neural networks.
Now, there is a biological inspiration for the name neural networks in the way that they
work from, you know, human biology, but is not necessarily modeled about the way that
our brain works. And in fact, we actually don't really know how a lot of the things
in our brain operate and work. So it would be impossible for us to say that neural networks
are modeled after the brain, because we actually don't know how information is kind of happens
and occurs and transfers through our brain. Or at least we don't know enough to be able
to say this is exactly what it is a neural network. So anyways, that was kind of the
last point there. Okay, so now we need to talk about data. Now data is the most important
part of machine learning and artificial intelligence, neural networks as well. And it's very important
that we understand how important data is and what the different kinds of parts of it are,
because they're going to be referenced a lot in any of the resources that we're using.
Now what I want to do is just create an example here, I'm going to make a data set that is
about students final grades in like a school system. So essentially, we're gonna make this
a very easy example. We're all we're gonna have for this data set is we're going to have
information about students. So we're gonna have their midterm one grade, their midterm
to grade, and then we're gonna have their final grade. So I'm just gonna say mid term,
one. And again, excuse my handwriting here, it's not the easiest thing to write with this
drawing tablet. And then I'll just do final. So this is going to be our data set. And we'll
actually see some similar data sets to this as we go through and do some examples later
on. So for student one, which we'll just put some students here, we're going to have their
midterm one grade, maybe that's a 70, their midterm to grade, maybe that was an 80. And
then let's say their final was like their final term grade, not just the mark on the
final exam, let's give them a 77. Now for midterm one can give someone a 60, maybe we
give them a 90, and then we determine that the final grade on their exam was, let's say,
an 84. And then we can do something with maybe a lower grade here, so 4050, and then maybe
they got a 38, or something in the final grade. Now, obviously, we could have some other information
here that we're admitting like maybe there was some exams, some assignments, whatever
some other things they did that contributed to their grade. But the problem that I want
to consider here is the fact that given our midterm, one grade and our midterm to grade
and our final grade, how can I use this information to predict any one of these three columns.
So if I were given a student's midterm, one grade, and I were given a student's final
grade, how could I predict their midterm to grade. So this is where we're going to talk
about features and labels. Now, whatever information we have that is the input information, which
is the information we will always have that we need to give to the model to get some output
is what we call our features. So in the example where we're trying to predict midterm two,
and let's just do this and highlight this in red, so we understand what we would have
as our features, our input information are going to be midterm one. And finally, because
this is the information we're going to use to predict something it is the input, it is
what we need to give the model. And if we're training a model to look at midterm one and
final grade, whenever we want to make a new prediction, we need to have that information
to do so. Now, what's highlighted in red, so this midterm two here is what we would
call the label or the output. Now, the label is simply what we are trying to look for or
predict. So when we talk about features versus labels, features is our input information,
the information that we have that we need to use to make a prediction. And our label
is that output information that is just representing you know what we're looking for. So when we
feed our features to a model, he will give to us a label. And that is kind of the point
that we need to understand. So that was the basic here. And now I'm just going to talk
a little bit more about data, because we will get into this more as we continue going, and
about the importance of it. So the reason why data is so important is this is kind of the
key thing that we use to create models. So whenever we're doing AI and machine learning,
we need data, pretty much unless you're doing a very specific type of machine learning and
artificial intelligence, which we'll talk about later. Now for most of these models,
we need tons of different data, we need tons of different examples. And that's because
we know how machine learning works now, which is essentially, we're trying to come up with
rules for a data set, we have some input information, we have some output information or some features
and some labels, we can give that to a model and tell it to start training. And what it
will do is come up with rules such that we can just give some features to the model in
the future. And then it should be able to give us a pretty good estimate of what the
output should be. So when we're training, we have a set of training data. And that is
data where we have all of the features and all of the labels. So we have all of this
information, then when we're going to test the model or use the model later on, we would
not have this midterm two information, we wouldn't pass this in the model, we would
just pass our features, which is midterm one and final and then we would get the output
of midterm two. So I hope that makes sense. That just means data is extremely important.
If we're feeding incorrect data or data that we shouldn't be using to the model that could
definitely result in a lot of mistakes. And if we have incorrect output information or
incorrect input information that is going to cause a lot of mistakes as well, because
that is essentially what the model is using to learn and to kind of develop and figure
out what it's going to do with new input information. So anyways, that is enough of data. Now let's
talk about the different types of machine learning. Okay, so now that we've discussed
the difference between artificial intelligence, machine learning and neural networks, we have
a kind of decent idea about what data is in the difference between features and labels.
It's time to talk about the different types of machine learning specifically, which are
unsupervised learning, supervised learning and reinforcement learning. Now, these are
just the different types of learning the different types of figuring things out. Now, different
kinds of algorithms fit into these different categories from within artificial intelligence
within machine learning and within neural networks. So the first one we're going to
talk about is supervised learning, which is kind of what we've already discussed. So I'll
just write supervised up here. Again, excuse the handwriting. So supervised Learning. Now
what is this? Well, supervised learning is kind of everything we've already
learned, which is we have some features. So we'll write our features like this, right,
we have some features. And those features correspond to some label or potentially labels,
sometimes we might predict more than one information. So when we have this information, we have
the features, and we have the labels, what we do is we pass this information to some
machine learning model, it figures out the rules for us. And then later on, all we need
is the features. And it will give us some labels using those rules. But essentially,
what supervised learning is, is when we have both of this information, the reason that's
called supervised is because what ends up happening when we train our machine learning
model is we pass the input information, it makes some arbitrary prediction using the
rules it already knows. And then it compares that prediction that it made to what the actual
prediction is, which is this label. So we supervise the model and we say, okay, so you
predicted that the color was red, but really, the color of whatever we passed in should
have been blue. So we need to tweak you just a little bit so that you get a little bit
better, and you move in the correct direction. And that's kind of the way that this works.
For example, say we're predicting, you know, student's final grade, well, if we predict
that the final grade is 76, but the actual greatest 77, we were pretty close, but we're
not quite there. So we supervise the model and we say, Hey, we're gonna tweak you just
a little bit, move you in the correct direction. And hopefully we get you to 77. And that is
kind of the way to explain this, right, you have the features. So you have the labels,
when you pass the features, the model has some rules, and it's already built, it makes
a prediction. And then it compares that prediction to the label, and then re tweaks the model
and continues doing this with 1000s upon 1000s upon 1000s of pieces of data until eventually
it gets so good that we can stop training. And that is what supervised learning is, it's
the most common type of learning, it's definitely the most applicable in a lot of instances.
And most machine learning algorithms that are actually used use a form of supervised
machine learning. A lot of people seem to think that this is, you know, a less complicated,
less advanced way of doing things. That is definitely not true. All of the different
methods, I'm going to tell you have different advantages and disadvantages. And this has
a massive advantage when you have a ton of information and you have the output of that
information as well. But sometimes we don't have the luxury of doing that. And that's
where we talk about unsupervised learning. So hopefully that made sense for supervised
learning, tried my best to explain that. And now let's go into or sorry, for supervised
learning. Now let's go into unsupervised learning. So if we know the definition of supervised
learning, we should hopefully be able to come up with a definition of unsupervised learning,
which is when we only have features. So given a bunch of features like this, and absolutely
no labels, no output for these features. What we want to do is have the model come up with
those labels for us. Now, this is kind of weird. You're kind of like Wait, how does
that work? Why would we even want to do that? Well, let's take this for an example. We have
some axis some axes of data, okay. And we have like a two dimensional data point. So
I'm just gonna call this, let's say x, and let's say y, okay, and I'm gonna just put
a bunch of dots on the screen that kind of represents like maybe a scatterplot of some
of our different data. And I'm just going to put some dots specifically closer to other
ones, just so you guys kind of get the point of
what we're trying to do here. So let's do that. Okay, so let's say I have this data
set, this here is what we're working with. And we have these features, the features in
this instance, are going to be x and y, right? So X, and Y are my features. Now we don't
have any output specifically for these data points, what we actually want to do is we
want to create some kind of model that can cluster these data points, which means figure
out kind of, you know, unique groups of data and say, okay, so you during group one, you're
in group two, you're in group three, and you're in group four, we may not necessarily know
how many groups we have, although sometimes we do. But what we want to do is just group
them and kind of say, okay, we want to figure out which ones are similar. And we want to
combine those together. So hopefully, what we could do with an unsupervised machine learning
model is pass all of these features, and then have the model create kind of these groupings.
So like maybe this is a group, maybe this is a group, maybe this is a group if we were
having four groupings, and maybe if we had two groupings, we might get groupings that
look something like this, right. And then when we pass a new data point in, that could
we could figure out what group that was a part of by determining, you know, which one
is closer to. Now this is kind of a rough example. It's hard to again, explain all of
these without going very in depth into the specific algorithms. But unsupervised machine
learning or just learning in general is when you don't have some output information. You
actually want the model to figure out the output for you. And you don't really care
how it gets there. You just want it to get there. And again, a good example is clustering
data points, and we'll talk about some specific Applications of when we might even want to
use that later on, just understand you have the features, you don't have the labels, and
you get the unsupervised model to kind of figure it out for you. Okay, so now our last type, which is very different
than the two types I just explained, is called reinforcement learning. Now personally reinforcement
learning, and I don't even know if I want to spell this because I feel like I'm going
to mess it up. Reinforcement Learning is the coolest type of machine learning in my opinion.
And this is when you actually don't have any data, you have what you call an agent, and
environment and a reward. I'm going to explain this very briefly with a very, very, very
simple example, because it's hard to get too far. So let's say we have a very basic game,
you know, maybe we made this game ourselves. And essentially, the objective of the game
is to get to the fluc. Okay, that's all it is, we have some ground, you can move left
to right, and we want to get to this flag. Well, we want to train some artificial intelligence,
some machine learning model that can figure out how to do this. So what we do is we call
this our agent, we call this entire thing. So this whole thing here, the environment.
So I guess I could write that here. So and by our meant, think I spelt that correctly.
And then we have something called a reward. And a reward is essentially what the agent
gets when it does something correctly. So let's say the agent takes one step over this
way. So let's say he's a new position is here, I just want to keep drawing him. So I'm just
gonna use a dot. Well, he got closer to the flag. So what I'm actually going to do is
give him a plus two reward. So let's say he moves again, closer to the flag, maybe I give
him now plus one, this time, he got even closer. And as he gets closer, I'll give him more
and more reward. Now what happens if he moves backwards? So let's erase this. And let's
say that at some point in time, rather than moving closer to the for the flag, he moves
backwards, well, he might get a negative reward. Now, essentially, what the objective of this
agent is to do is to maximize its reward. So if you give it a negative reward for moving
backwards, it's going to remember that it's going to say, Okay, at this position here,
where I was standing, when I moved backwards, I got a negative reward. So if I get to this
position, again, I don't want to go backwards anymore, I want to go forwards, because that
should give me a positive reward. And the whole point of this is we have this agent
that starts off with absolutely no idea, no kind of, you know, knowledge of the environment.
And what it does is it starts exploring, and it's a mixture of randomly exploring and exploring
using kind of some of the things that's figured out so far, to try to maximize its reward.
So eventually, when the agent gets to the flag, it will have the most the highest possible
reward that it can have. And the next time that we plug this agent into the environment,
it will know how to get to the flag immediately, because it's kind of figured that out, it's
determined that in all these different positions, if I move here, this is the best place to
move. So if I get in this position, move there. Now this is again, hard to explain without
more detailed examples, and going more mathematically and all that, but essentially, just understand
we have the agent, which is kind of what the thing is that's moving around in our environment,
we have this environment wizard, which is just what the agent can move around in. And
then we have a reward. And the reward is what we need to figure out as the programmer a
way to reward the agent correctly so that it gets to the objective in the best possible
way. But the agent simply maximizes that reward. So it just figures out where I need to go
to maximize that reward, it starts at the beginning, kind of randomly exploring the
environment, because it doesn't know any of the rewards it gets at any of the positions.
And then as it explores some more different areas, it kind of figures out the rules and
the way that the environment works. And then we'll determine how to reach the objective,
which is whatever it is that is this a very simple example, you could train a reinforcement
model to do this. And you know, like half a second, right. But there is way more advanced
examples. And there's been examples of reinforcement learning, like of AI is pretty much figuring
out how to play games together how to it's it's actually pretty cool some of the stuff
that reinforcement learning is doing. And it's a really awesome kind of advancement
in the field because it means we don't need all this data anymore. We can just get this
to kind of figure out how to do things for us and explore the environment and learn on
its own. Now this can take a really long time, this can take a very short amount of time
really depends on the environment. But a real application of this is training AI's to play
games, as you might be able to tell by kind of what I was explaining here. And yeah, so
that is kind of the fundamental differences between supervised, unsupervised and reinforcement
learning, we're going to cover all three of these topics throughout this course. And it's
really interesting to see some of the applications we can actually do with this. So with that
being said, I'm going to kind of end what I'm going to call module one, which is just
a general overview of the different topics, some definitions and getting a fundamental
knowledge. And in the next one, what we're gonna be talking about is what TensorFlow
is, we're going to get into code In a little bit, and we're going to discuss some different
aspects of TensorFlow and things we need to know to be able to move forward and do some
more advanced things. So now in module two of this course, what
we're going to be doing is getting a general introduction to TensorFlow, understanding
what a tensor is understanding shapes and data representation, and then how TensorFlow
actually works on a bit of a lower level, this is very important, because you can definitely
go through and learn how to do machine learning without kind of gaining this information and
knowledge. But it makes it a lot more difficult to tweak your models and really understand
what's going on, if you don't, you know, have that fundamental lower level knowledge of
how TensorFlow actually works and operates. So that's exactly what we're gonna cover here.
Now, for those of you that don't know what TensorFlow is, essentially, this is an open
source Machine Learning Library. It's one of the largest ones in the world, it's one
of the most well known and it's maintained and supported by Google. Now, TensorFlow,
essentially allows us to do and create machine learning models and neural networks, and all
of that without having to have a very complex math background. Now, as we get further in,
and we start discussing more in detail how neural networks work, and machine learning
algorithms actually function, you'll realize there's a lot of math that goes into this.
Now, it starts off being very kind of fundamental, like basic calculus and basic linear algebra.
And then it gets much more advanced into things like gradient descent, and some more regression
techniques and classification. And essentially, you know, a lot of us don't know that we don't
really need to know that, so long as we have a basic understanding of it, then we can use
the tools that TensorFlow provides for us to create models. And that's exactly what
TensorFlow does. Now, what I'm in right now is what I call Google Collaboratory. I'm going
to talk about this more in depth in a second. But what I've done for this whole course,
is I've transcribed very detailed everything that I'm going to be covering through each
module. So this is kind of the transcription of module one, which is the introduction to
TensorFlow, you can see it's not crazy long. But I wanted to do this so that any of you
can follow along with kind of the text base and kind of my lecture notes, I almost want
to call them as I go through the different content. So in the description, there will
be links to all of these different notebooks. This is in something called Google Collaboratory,
which again, we're going to discuss in a second, but you can see here that I have a bunch of
text, and then it gets down to some different coding aspects. And what I'm going to be doing
to make sure that I stay on track is simply following along through this, I might deviate
slightly, I might go into some other examples. This will be kind of everything I'm going
to be covering through each module. So again, to follow along, click the link in the description.
Alright, so what can we do with TensorFlow? Well, these are some of the different things
I've listed them here. So I don't forget, we can do image classification, data clustering,
regression, reinforcement learning, natural language processing, and pretty much anything
that you can imagine with machine learning. Essentially, what TensorFlow does, is gives
us a library of tools that allow us to omit having to do these very complicated math operations.
It just does them for us. Now, there is a bit that we need to know about them, but nothing
too complex. Now let's talk about how TensorFlow actually works. So TensorFlow has two main
components that we need to understand, to figure out how operations and math are actually
performed. Now we have something called graphs and sessions. Now, the way that tensor flow
works, is it creates a graph of partial computations. Now, I know this is gonna sound a little bit
complicated, some of you guys just try to kind of forget about the complex vocabulary
and follow along. But essentially, what we do when we write code in TensorFlow is we
create a graph. So if I were to create some variable, that variable gets added to the
graph, and maybe that variable is the sum or the summation of two other variables. What
the graph will define now is say, you know, we have variable one, which is equal to the
sum of variable two and variable three. But what we need to understand is that it doesn't
actually evaluate that it simply states that that is the computation that we've defined.
So it's almost like writing down an equation without actually performing any math, we kind
of just, you know, have that equation there. We know that this is the value, but we haven't
evaluated it. So we don't know that the value is like 7%, we just know that it's the sum
of, you know, vector one and vector two, or it's the sum of this or it's the cross product,
or the dot product, we just defined all of the different partial computations, because
we haven't evaluated those computation yet. And that is what is stored in the graph. And
the reason that's called a graph is because different computations can be related to each
other. For example, if I want to figure out the value of vector one, but vector one is
equal to the value of vector three plus vector four, I need to determine the value of vector
three and vector four, because before I can do that computation, so they're kind of linked
together and I hope that makes a little bit of sense. Now what is a session? Well session
is essentially a way to execute part or the entire graph. So when we start In a session,
what we do is we start executing different aspects of the graph. So we start at the lowest
level of the graph where nothing is dependent on anything else, we have maybe constant values,
or something like that. And then we move our way through the graph, and start doing all
of the different partial computations that we've defined. Now, I hope that this isn't
too confusing. I know this is kind of a lot of lingo you guys should will understand this
as we go through. And again, you can read through some of these components here that
I have in Collaboratory, if I'm kind of skipping through anything you don't truly understand.
But that is the way that graphs and sessions work, we won't go too in depth with them,
we do need to understand that that is the way TensorFlow works. And there's some times
where we can't use a specific value in our code yet, because we haven't evaluated the
graph, we haven't created a session and gotten the values yet, which we might need to do
before we can actually, you know, use some specific value. So that's just something to
consider. Alright, so now we're actually going to get into coding
importing, installing TensorFlow. Now, this is where I'm going to introduce you to Google
Collaboratory and explain how you guys can follow along without having to install anything
on your computer. And it doesn't matter if you have like a really crappy computer, or
even if you're on like an iPhone, per se, you can actually do this, which is amazing.
So all you need to do is Google Google Collaboratory, and create a new notebook. Now what Google
Collaboratory is, is essentially a free Jupyter Notebook in the cloud for you. The way this
works is you can open up this notebook, you can see this is called I, py and B, I yeah,
what is that I py and B, which I think just stands for IPython notebook. And what you
can do in here is actually write code and write text as well. So this in here is what
it's called, you know, Google Collaboratory notebook. And essentially, why it's called
a notebook is because not only can you put code but you can also put notes, which is
what I've done here with these specific titles. So you can actually use markdown inside of
this. So if I open up one of these, you can see that I've used markdown text, to actually
kind of create these sections. And yeah, that is kind of how Collaboratory works. But what
you can do in Collaboratory is forget about having to install all of these modules. They're
already installed for you. So what you're actually going to do when you open a Collaboratory
window is Google is going to automatically connect you to one of their servers or one
of their machines that has all of this stuff done and set up for you. And you can start
writing code and executing it off their machine and seeing the result. So for example, if
I want you to print hello, like this, and I'll zoom in a little bit, so you guys can
read this, all I do is like create a new code block, which I can do by clicking code. Like
that I can delete one like that as well. And I hit run. Now notice, give it a second, it
does take longer than typically on your own machine, and we get Hello popping up here.
So the great thing about Collaboratory is the fact that we can have multiple code blocks,
and we can run them in whatever sequence we want. So to create another code block, you
can just you know, do another code block from up here or but just by looking down here,
you get code and you get text. And I can run this in whatever order I want. So I can do
like print. Yes, for example, I can run Yes, and we'll see the output of Yes, and then
I can print hello one more time. And notice that it's showing me the number on this left
hand side here on which these kind of code blocks were run. Now all these code blocks
can kind of access each other. So for example, I do define funk, and we'll just take some
parameter H and all we'll do is just print H, well, if I create another code block down
here, so let's go code. I can call funk with say, Hello, make sure I run this block first.
So we define the function. Now we'll run funk and notice we get the output Hello. So we
can access all of the variables, all the functions, anything we've defined in other code blocks
from code blocks that are below it, or code blocks that are executed after it. Now another
thing that's great about Collaboratory is the fact that we can import pretty much any
module we can imagine. And we don't need to install it. So I'm not actually going to be
going through how to install TensorFlow completely. There is a little bit on how to install TensorFlow
on your local machine inside of this notebook, which I'll refer you to. But essentially,
if you know how to use Pip, it's pretty straightforward. You can pip install TensorFlow or pip install
TensorFlow GPU if you have a compatible GPU, which you can check from the link that's in
this notebook. Now, if I want to import something, what I can do is literally just write the
import. So I can say import NumPy, like this. And usually NumPy is a module that you need
to install. But we don't need to do that here. It's already installed on the machine. So
again, we hook up to those Google servers, we can use their hardware to perform machine
learning. And this is awesome. This is amazing. And it gives you performance benefits when
you're running on like a lower kind of crappier machine, right. So we can have a look at the
RAM and the disk space of our computer, we can see we have 12 gigs of RAM. We're dealing
with 107 gigabytes of data on our disk space. And we can obviously, you know, look at that
if we want, we can connect connect to our local runtime, which I believe connects to
your local machine, but I'm not going to go through all that. I just want to show you
guys some basic components of Have Collaboratory. Now, some other things that are important
to understand is this runtime tab, which you might see me use. So restart runtime essentially
clears all of your output, and just restarts whatever's happened. Because the great thing
with Collaboratory is since I can run specific code blocks, I don't need to execute the entire
thing of code every time I want to run something, if I've just made a minor change in one code
block, I can just run that code. Sorry, I can just run that code block. I don't need
to run everything before it or even everything after it right. But sometimes you want to
restart everything and just rerun everything. So to do that, you click Restart runtime,
that's just going to clear everything you have. And then restart and run all will restart
the runtime as well as run every single block of code you have in sequential order in which
it shows up in the thing. So I recommend you guys open up one of these windows, you can
obviously follow along with this notebook if you want. But if you want to type it out
on your own and kind of mess with it, open up a notebook, save it, it's very easy. And
these are again, extremely similar to Jupyter Notebooks or Jupyter notebooks. They're pretty
much the same. Okay, so that is kind of the Google Collaboratory aspect how to use that.
Let's get into importing TensorFlow. Now, this is going to be kind of specific to Google
Collaboratory. So you can see here, these are kind of the steps we need to follow to
import TensorFlow. So since we're working in Google Collaboratory, they have multiple
versions of TensorFlow, they have the original version of TensorFlow, which is 1.0, and the
2.0 version. Now to define the fact that we want to use TensorFlow 2.0. Just because we're in this notebook, we need to
write this line of code at the very beginning of all of our notebooks. So percent, TensorFlow
underscore version two point x. Now this is simply just saying, we need to use TensorFlow
two point x. So whatever version that is, and this is only required in a notebook, if
you're doing this on your local machine in a text editor, you're not going to need to
write this. Now once we do that, we typically import TensorFlow as an alias name of TF.
Now to do that, we simply import the TensorFlow module and then we write as TF. If you're
on your local machine, again, you're going to need to install TensorFlow first, to make
sure that you're able to do this, but since we're in Collaboratory, we don't need to do
that. Now, since we've defined the fact we're using version two point x, when we print the
TensorFlow version, we can see here that it says version two, which is exactly what we're
looking for. And then this is TensorFlow. 2.1. Point Oh. So make sure that you print
your version you're using version 2.0. Because there is a lot of what I'm using in this series
that is kind of if you're in TensorFlow 1.0. It's not going to work. So it's new in TensorFlow
2.0. Or it's been refactored and the names have been changed. Okay, so now that we've
done that, we've imported TensorFlow, we've got this here. And I'm actually going to go
to my fresh notebook and just do this. So we'll just copy these lines over just so we
have some fresh code, and I don't have all this text that we have to deal with. So let's
do this TensorFlow, let's import TensorFlow as TF. And then we can print the TF dot version
and have a look at that. So version. Okay, so let's run our code. Here. We can see TensorFlow
is already loaded. Oh, it says 1.0. So if you get this error, it's actually good. I
ran into this where TensorFlow is already been loaded, all you need to do is just restart
your runtime. So I'm going to restart and run all just click Yes. And now we should
see that we get that version 2.0. Once this starts running, give it a second, TensorFlow
2.0 selected, we're going to import that module. And there we go, we get version 2.0. Okay,
so now it's time to talk about tensors. Now, what is a tensor? Now tensor just immediately
seems kind of like a complicated name. You're like, Alright, tensor like this is confusing.
But what is well, obviously, this is going to be a primary aspect of TensorFlow, considering
the name similarities. And essentially, all it is, is a vector generalized to higher dimensions.
Now, what is a vector? Well, if you've ever done any linear algebra, or even some basic
kind of vector calculus, you should hopefully know what that is. But essentially, it is
kind of a data point is kind of the way that I like describe it. And the reason we call
it a vector is because it doesn't necessarily have a certain coordinate. So like, if you're
talking about a two dimensional data point, you have, you know, maybe an x and a y value,
or like an x one value and an x two value. Now a vector can have any amount of dimensions
in it, it could have one dimension, which simply means it just one number could have
two dimensions, which means we're having two numbers, so like an x and a y value. If we're
thinking about a two dimensional graph, we'd have three dimensions if we're thinking about
a three dimensional graph, so that would be three data points, we get a four dimensions,
if we're talking about sometimes some image data and some video data, five dimensions,
and we can keep going going going with vectors. So essentially, what a tensor is, and I'll
just read this formal definition to make sure I haven't butchered anything that's from the
actual TensorFlow website. A tensor is a generalization of vectors and matrices to potentially higher
dimensions. Internally, TensorFlow represented tensors as n dimensional arrays of base datatypes.
Now we'll understand what that means in a second. But hopefully, that makes sense. Now,
since tensors are so important to TensorFlow, they're kind of the main object that we're
going to be working with manipulating and viewing. And it's the main object that's passed
around through our program. Now, what we can see here is each tensor represents a partially
defined computation that will eventually produce a value. So just like we talked about in the
graphs and sessions, what we're going to do is when we create our program, we're going
to be creating a bunch of tensors. And TensorFlow is going to be creating them as well. And
those are going to store partially defined computations in the graph. Later, when we
actually build the graph and have the session running, we will run different parts of the
graph, which means we'll execute different tensors and be able to get different results
from our tensors. Now each tensor has what we call a data type and a shape. And that's
what we're going to get into now. So a data type is simply what kind of information is
stored in the tensor. Now, it's very rare that we see any data types different than
numbers, although there is the data type of strings and a few others as well. But I haven't
included all of them here, because they're not that important. But some examples we can
see are float, 32, and 32, string and others. Now, the shape is simply the representation
of the tensor in terms of what dimension it is. And we'll get to some examples, because
I don't want to explain the shape until we can see some examples to really dial in. But
here is some examples of how we would create different tensors. So what you can do is you
can simply do TF dot variable, and then you can do the value and the datatype that your
tensor is. So in this case, we've created a string tensor, which stores one string,
and it is TF dot strings, we define the data type Second, we have a number tensor, which
stores some integer value. And then that is of type TF, int 16. And we have a floating
point tensor, which stores a simple floating point. Now these tensors have a shape of,
I believe it's going to be one, which simply means they are a scalar. Now a scalar value.
And you might hear me say this a lot simply means just one value. That's all it means.
When we talk about like vector values, that typically means more than one value. And we
talked about matrices, we're having different it just it goes up. But scalar simply means
one number. So yeah, that is what we get for the different datatypes and creating tensors,
we're not really going to do this very much in our program. But just for some examples
here, that's how we do it. So we've imported them. So I can actually run these. And I mean,
we're not going to really get any output by running this code, because well, there's nothing
to see. But now we're going to talk about the rank slash degree of tensors. So another
word for rank is degree. So these are interchangeably. And again, this simply means the the, the
number of dimensions involved in the tensor. So when we create a tensor of rank zero, which
is what we've done up here, we call that a scalar. Now, the reason this has rank zero
is because it's simply one thing, we don't have any dimension to this, there's like zero
dimensionality, if that was even a word, it just one value. Whereas here, we have an array.
Now when we have an array or a list, we immediately have at least rank one. Now the reason for
that is because this array can store more than one value in one dimension, right? So
I can do something like test, I can do okay, I could do, Tim, which is my name. And we
can run this, and we're not going to get any output obviously here. But this is what we
would call a rank one tensor. Because it is simply one list one array, which means one
dimension, and again, you know, that's also like vector. Now, this, what we're looking
at here is a rank two tensor. The reason this is a rank two tensor is because we have a
list inside of a list or in this case, multiple lists inside of a list. So the way that you
can actually determine the rank of a tensor is the deepest level of nested lists, at least
in Python with our representation. That's what that is. So here, we can see we have
a list inside of a list, and then another list inside of this upper list. So this would
give us rank two. And this is what we typically call a matrices. And this again, is going
to be of TF dot strings. So that's the datatype for this tensor variable. So all of these
we've created are tensors. They have a data type, and they have some rank and some shape.
And we're going to talk about the shape and the second. So to determine the rank of a
tensor, we can simply use the method TF dot rank. So notice, when I run this, we get the
shape, which is blank of rank two tensor, that's fine. And then we get NumPy two, which
simply means that this is of rank two. Now, if I go for that rank one tensor and I print
this out. So let's have a look at it, we get NumPy one here, which is telling us that this
is simply of rank one. Now if I want to use one of these ones up here and see what it
is, so let's try it. We can do numbers, so TF dot rank number, so we'll print that here,
and we get NumPy zero because that's rank zero, right? So We'll go back to what we had,
which was rank two tensor. But again, those are kind of the examples we want to look at.
Okay, so shapes of a tensor. So this is a little bit different now, what a shape simply
tells us is how many items we have in each dimension. So in this case, when we're looking
at rank two, tensor dot shape, so we have dot shape here, that's an attribute of all
of our tensors, we get to two. Now let's look up here, what we have is Whoa, look at this
two, and two, so we have two elements in the first dimension, right, and then two elements
in the second dimension. That's pretty much what this is telling us. Now let's look at
the rank of or the shape of rank one tensor, we get three. So because we only have a rank
one, notice we only get one number, whereas when we had rank two, we got two numbers.
And it told us how many elements were in each of these lists, right? So if I go and I add
another one here, like that, and we have a look now at the shape, oops, I gotta run this
first. So that's something Oh, can't convert non square to tensor. Sorry, so I need to
have a uniform amount of elements in each one here, I can't just do what I did there.
So we'll add a third element here. Now what we can do is run this shouldn't get any issues,
let's have a look at the shape. And notice we get now two, three. So we have two lists.
And each of those lists have three elements inside of them. So that's how the shape works.
Now, I could go ahead and add another list in here if I wanted to. And I could say like,
okay, okay. Okay, so let's run this, hopefully, no errors
looks like we're good. Now let's look at the shape again. And now we get a shape of three,
three, because we have three interior lists. And in each of those lists, we have three
elements. And that is pretty much how that works. Now, again, we could go even further
here, we could put another list inside of here, that would give us a rank three tensor.
And we'd have to do that inside of all of these lists. And then what that would give
us now would be three numbers representing how many elements we have in each of those
different dimensions. Okay, so changing shape. Alright, so this is what we need to do a lot
of times when we're dealing with tensors in TensorFlow. So essentially, there is many
different shapes that can represent the same number of elements. So up here, we have three
elements in a rank one tensor. And then here, we have nine elements in a rank two tensor.
Now, there's ways that we can reshape this data so that we have the same amount of elements
but in a different shape. For example, I could flatten this, right, take all of these elements,
and throw them into a rank one tensor. That simply is a length of nine elements. So how
do we do that? Well, let me just run this code for us here and have a look at this.
So what we've done is we've created tensor one, that is TF dot ones, what this stands for is we're going to create
a tensor that simply is populated completely with ones of this shape. So shape 123, which
means you know, that's the shape we're going to get. So let's print this out and look at
tensor one, just so I can better illustrate this. So tensor one, look at the shape that
we have 123, right, so we have one interior list, which we're looking at here. And then
we have two lists inside of that list. And then each of those lists, we have three elements.
So that's the shape we just defined. Now, we have six elements inside of here. So there
must be a way that we can reshape this data to have six elements, but in a different shape.
In fact, what we can do is reshape this into a 231 shape, we're going to have two lists,
right? We're going to have three inside of those. And then inside of each of those, we're
going to have one element. So let's have a look at that one. So let's have a look at
tensor two, actually, what am I doing, we print all we can print all of them here. So
let's just print them and have a look at them. So when we look at tensor one, we saw this
was a shape. And now we look at this tensor two. And we can see that we have two lists,
right? inside of each of those lists, we have three lists. And inside of each of those lists,
we have one element. Now finally, our tensor three is a shape of three, negative one, what
is negative one, when we put negative one here, what this does is infer what this number
actually needs to be. So if we define an initial shape of three, what this does is say, okay,
we're going to have three lists. That's our first level. And then we need to figure out
based on how many elements we have in this reshape, which is the method we're using,
which I didn't even talk about, which will go into a second, what this next dimension
should be. Now, obviously, this is going to need to be three. So three, three, right?
Because we're gonna have three lists inside of each of those lists we need to have are
actually is that correct? Let's see if that's even the shape three to my bat. So this actually
needs to change to three, two, I don't know why I wrote three, three there. But you get
the point. Right, so what this does, we have three lists, we have six elements. This number
obviously needs to be two because well, three times two is going to give us six and that
is essentially how you can determine how many elements are actually in a tensor by just
looking at its shape. Now this is the reshape method where all we need to do is called TF
dot reshape, give the tensor and give the shape we want to change it to so long as that's
a valid shape. And when we multiply all the numbers in here, it's equal to the number
of elements in this tensor that will reshape it for us and give us that new shaped data.
This is very useful. We'll use this actually a lot as we go through TensorFlow. So make
sure you're kind of familiar with how that works. Alright, so now we're moving on to
types of tensors. So there is a bunch of different types of tensors that
we can use. So far, the only one we've looked at is variable. So we've created TF dot variables,
and kind of just hard coded our own tensors, we're not really going to do that very much.
But just for that example. So we have these different types, we have constant placeholder
sparsetensor variable, there's actually a few other ones as well. Now, we're not going
to really talk about these two that much, although constant and variable are important
to understand the difference between so we can read this, with the exception of variable,
all of these tensors are immutable, meaning their value may not change during execution.
So essentially, all of these, when we create a tensor mean, we have some constant value,
which means that whatever we've defined here, it's not going to change, whereas the variable
tensor could change. So that's just something to keep in mind when we use variable. That's
because we think we might need to change the value of that tensor later on. Whereas if
we're using a constant value tensor, we cannot change it. So that's just something to keep
in mind, we can obviously copy it, but we can't change. Okay, so evaluating tensors,
we're almost at the end of this section, I know. And then we'll get into some more kind
of deeper code. So there will be some times for this guide, we need to evaluate a tense,
of course, so what we need to do to evaluate a tensor is create a session. Now, this isn't
really like, we're not going to do this that much. But I just figured I'd mention it to
make sure that you guys are aware of what I'm doing. If I start kind of typing this
later on. Essentially, sometimes we have some tensor object, and throughout our code, we
actually need to evaluate it to be able to do something else. So to do that, all we need
to do is literally just use this kind of default template, a block of code, where we say with
TF dot session, as some kind of session doesn't really matter what we put here, then we can
just do whatever the tensor name is dot eval, and calling that will actually have TensorFlow,
just figure out what it needs to do to find the value of this tensor, it will evaluate
it, and then it will allow us to actually use that value. So I put this in here, you
guys can obviously read through this, if you want to understand some more in depth on how
that works. And the source for this is straight from the TensorFlow website, a lot of this
is straight up copied from there. And I've just kind of added my own spin to it and made
it a little bit easier to understand. Okay, so we've done all that. So let's just go in
here and do a few examples of reshaping just to make sure that everyone's kind of on the
same page. And then we'll move on to actually talking about some simple learning algorithms.
So I want to create a tensor that we can kind of mess with and reshape, so what I'm going
to do is just say t equals and we'll say TF dot ones. Now, what TF dot ones does is just
create, again, all the values to be ones that we're going to have and whatever shape now
we can also do zeros and zeros is just going to give us a bunch of zeros. And let's create
some like crazy shape and just visualize this, let's see like a five by five by five. So
obviously, if we want to figure out how many elements are going to be in here, we need
to multiply this value. So I believe this is going to be 625, because that should be
five to the power of four, so five times five times five times five. And let's actually
print T and have a look at that and see what this is. So we run this now. And you can see
this is the output we're getting. So obviously, this is a pretty crazy looking tensor. but
you get the point, right, and it tells us the shape is 55555. Now watch what happens
when I reshape this tensor. So if I want to take all of these elements and flatten them
out, what I could do is simply say, we'll say t equals TF dot reshape, like that. And
we'll reshape the tensor t to just the shape 625. Now, if we do this, and we run here,
oops, I got to print T. At the bottom, after we've done that, if I could spell the print
statement correctly, you can see that now we just get this massive list that just has
625 zeros. And again, if we wanted to reshape this to something like 125, and maybe we weren't
that good at math, and couldn't figure out that this last value should be five, we could
put a negative one, this would mean that TensorFlow would infer now what the shape needs to be.
And now when we look at it, we can see that we're we're going to get is well just simply
five kind of sets of these. I don't know matrices, whatever you want to call them, and our shape
is 125. Five. So that is essentially how that works. So that's how we reshape. That's how
we kind of deal with tensors. Create variables, how that works in terms of sessions and graphs.
And hopefully with that, that gives you enough of an understanding of tensors of shapes of
ranks a value so that when we move into the next part of the tutorial, we're actually
writing code and I promise we're going to be writing some more advanced code, you'll
understand how that works. So with that being said, let's get into the next section. So welcome to Module Three of this course.
Now what we're going to be doing in this module is learning the core machine learning algorithms
that come with TensorFlow. Now, these algorithms are not specific to TensorFlow, but they are
used within there. And we'll use some tools from TensorFlow to kind of implement them.
But essentially, these are the building blocks. Before moving on to things like neural networks
and more advanced machine learning techniques, you really need to understand how these work
because they're kind of used in a lot of different techniques and combined together, and one
of them but to show you is actually very powerful if you use it in the right way, a lot of what
machine learning actually is and a lot of machine learning algorithms and implementations
and businesses and applications and stuff like that, actually just use pretty basic
models. Because these models are capable of actually doing, you know, very powerful things.
When you're not dealing with anything that's crazy complicated, you just need some basic
machine learning some basic classification, you can use these kind of fundamental core
learning algorithms. Now, the first one we're going to go through is linear regression.
But we will cover classification and clustering in hidden Markov models. And those are kind
of going to give us a good spread of the different core algorithms. Now there is a ton ton like
1000s of different machine learning algorithms. These are kind of the main categories that
you'll cover. But within these categories, there is more specific algorithms that you
can get into, I just feel like I need to mention that because I know a lot of you will have
maybe seen some different ways of doing things in this course might show you, you know, a
different perspective on that. So let me just quickly talk about how I'm going to go through
this, it's very similar to before I have this notebook, as I've kind of talked about, there
is a link in the description, I would recommend that you guys hit that and follow along with
what I'm doing and read through the notebook. But I will just be going through the notebook.
And then occasionally, what I will actually do, oops, I need to open this up here is go
to this kind of untitled tab I have here and write some code in here, because most of what
I'm going to do is just copy code over into here, so we can see it all in kind of one
block. And then we'll be good to go. And the last note before we really get into it, and
I'm sorry, I'm talking a lot. But it is important to make you guys aware of this, you're going
to see that we use a lot of complicated syntax throughout this kind of series and the rest
of the course in general, I just want to make it extremely clear that you should not have
to memorize or even feel obligated to memorize any of the syntax that you see everything
that you see here, I personally don't even have memorized, there's a lot of what's in
here that I can't just come up with the top of my head, when we're dealing with kind of
library and module so big that like TensorFlow, it's hard to memorize all those different
components. So just make sure you understand what's happening. But you don't need to memorize
it, if you're ever going to need to use any of these tools, you're going to look them
up, you're going to see what it is you're like, Okay, I've used this before, you're
going to understand it. And then you can go ahead and you know, copy that code in and
use it in whatever way you need to, you don't need to memorize anything that we do here.
Alright, so let's go ahead and get started with linear regression. So what is linear
regression? What's one of those basic forms of machine learning, and essentially, what
we try to do is have a linear correspondence between data points. So I'm just going to
scroll down here to a good example. So what I've done is use matplotlib, just to plot
a little graph here. So we can see this one right here. And essentially, this is kind
of our data set, this will we'll call our data set, what we want to do is use linear
regression to come up with a model that can give us some good predictions for our data
points. So in this instance, maybe what we want to do is given some x value for a data
point, we want to predict the y value. Now in this case, we can see there's kind of some
correspondence linearly for these data points. Now, what that means is we can draw something
called a line of best fit through these data points that can kind of accurately classify
them, if that makes any sense. So I'm going to scroll down here and look at what our line
of best fit for this data set actually is, we can see this blue line, it pretty much
I mean, is the perfect line of best fit for this data set. And using this line, we can
actually predict future values in our data set. So essentially, linear regression is
used when you have data points that correlate in kind of a linear fashion. Now, this is
a very basic example, because we're doing this in two dimensions with x and y. But oftentimes,
what you'll have is you'll have data points that have you know, eight or nine kind of
input values. So that gives us you know, a nine dimensional kind of data set. And what
we'll do is predict one of the different values. So in the instance, where we were talking
about students before, maybe we have a student's What does it midterm grade and their second
midterm grade, and then we want to predict their final grade, what we can do is use linear
regression to do that, where our kind of input values are going to be the two midterm grades
and the output value is going to be that final grade that we're looking to predict. So if
we were to plot that, we would plot that on a three dimensional graph, and we would draw
a three dimensional line that would represent the line of best fit for that data set. Now,
for any of you that don't know what line of best fit stands for, it says line or this
is just the definition I got from this website here. line of best fit refers to a line through
a scatterplot of data points that best expresses the relationship between those points. So
exactly what I've kind of been trying to explain when we have data that correlates linearly
and I always butcher that word. What we can do is draw a line through it. And then we
can use that line to predict new data points. Because if that line is good, it's a good
line of best fit for the data set, then hopefully, we would assume that we can just, you know,
pick some point, find where it would be on that line. And that'll be kind of our predicted
value. So I'm going to go into an example now where I start drawing and going into a
little bit of math. So we understand how this works on a deeper level. But that should give
you a surface level understanding. So actually, I'll leave this up because I was messing with
this beforehand. This is kind of a data set that I've drawn on here. So we have our x
and we have our y, and we have our line of best fit. Now what I want to do is I want to use this line of
best fit to predict a new data point. So all these red data points are ones that we've
trained our model with the information that we gave to the model so that it could create
this line of best fit, because essentially, all linear regression really does is look
at all of these data points and create a line of best fit for them. That's all it does.
It's pretty, I don't know the word for it's pretty easy to actually do this, this algorithm
is not that complicated is not that advanced. And that's why we start with it here because
it just makes sense to explain. So I hope that a lot of you know in two dimensions,
a line can be defined as follows. So with the equation y equals mx plus b, now B stands
for the y intercept, which means somewhere on this line, so essentially where the line
starts. So in this instance, our b value is going to be right here. So this is going to
be B because that is the y intercept. So we can say that that's like maybe you know, we
go on, we'll do this, we'll say this is like 123, we might say B is something like 0.4,
right? So I can just pencil that in 0.4. And then what is m x and y? Well, X and Y stands
for the coordinates of this data point. So this would have, you know, some x y value.
In this case, we might call it you know, something like, what do you want to say to 2.7 that
might be the value of this data point. So that's our x and y. And then our M stands
for the slope, which is probably the most important part. Now slope simply defines the
steepness of this line of best fit that we've done here. Now the way we calculate slope
is using rise over run, no rise over run essentially just means how much we went up versus how
much we went across. So if you want to calculate the slope of a line, what you can actually
do is just draw a triangle. So right angled triangle anywhere on the line, so just pick
two data points. And what you can do is calculate this distance and this distance, and then
you can simply divide the distance up by the distance across, and that gives you the slope.
Now, I'm not going to go too far into slope, because I feel like you guys probably understand
what that is. But let's just pick some values for this line. And I want to actually show
you some real examples of math and how we're going to do this. So let's say that our linear
regression algorithm, you know, comes up with this line, I'm not going to discuss really
how it does that, although it just pretty much looks at all these data points, and finds
the line that you know, goes, it splits these data points evenly. So essentially, you want
to be as close to every data point as possible. And you want to have as many data points,
you want to have the same amount of data points on the left side and the right side of the
line. So in this example, we have you know, a data point on the left a data point on the
left, we have one two that are pretty much on the line. And then we have two that are
on the right. So this is a pretty good line of best fit, because all of the points are
very close to the line, and they split them evenly. So that's kind of how you come up
with a line of best fit. So let's say that the equation for this line is something like
y equals, let's just give it 1.5 and x plus, and let's say that value is just 0.5 to make
it easy. So this is going to be the equation of our line. Now notice that X and Y don't
have a value, that's because we need to give the value to come up with one of the other
ones. So what we can do is we can say if we have either the y value, or we have the x
value of some point, and we want to figure out, you know where it is on the line, what
we can do is just feed one in do a calculation, and that will actually give us the other value.
So in this instance, let's say that, you know, I'm trying to predict something and I'm given
the the fact that x equals two, I know that x equals two, and I want to figure out what
y would be if x equals two, well, I can use this line to do so. So what I would do is
I'm going to say y equals 1.5 times two plus 0.5. Now all of you quick math majors out
there, give me the value of 3.5. Which means that if x was at two, then I would have my
data point as a prediction here on this line. And I would say okay, so you're telling me
access to my prediction is that y is going to be equal to 3.5. Because given the line
of best fit for this data set, that's where this point will lie on that line. So I hope
that makes sense. You can actually do this the reverse way as well. So if I'm just given
some y value, say I know that you know my y value is at like 2.7 or something. I could
plug that in just rearrange the numbers in this equation and then solve for x. Now obviously,
this is a very basic example, because we're just doing all of this in two dimensions,
but you can do this in higher dimensions as well. So actually, most times, what's gonna
end up happening is you're gonna have, you know, like eight or nine input variables,
and then you're gonna have one output variable that you're predicting. Now, so long as our
data points are correlated linearly, in three dimensions, we can still do this. So I'm going
to attempt to show you this actually, in three dimensions, just to hopefully clear some things
up because it is important to kind of get a grasp and perspective of the different dimensions.
So let's say we have a bunch of data points that are kind of like this. And I'm trying
my best to kind of draw them in some linear fashion using like all the dimensions here.
But it is hard because drawing in three dimensions on a two dimensional screen is not easy. Okay.
So let's say this is kind of like what our data points look like. Now, I would say that
these correlate linearly, like pretty pretty well, they kind of go up in one fashion. And we
don't know the scale of this. So this is probably fun. So the line of best fit for this data
set, and I'll just put my kind of thickness up might be something like this right? Now
notice that this line is in three dimensions, right? This is going to cross our I guess
this is our x, y, and Zed axes. So we have a three dimensional line. Now, the equation
for this line a little bit more complicated, I'm not going to talk about exactly what it
is. But essentially, what we do is we make this line and then we say, Okay, what value
do I want to predict? Do I want to predict y x y Zed. Now, so long as I have two values,
so two values, I can always predict the other one. So if I have, you know, the x y of a
data point, that will give me the Zed. And if I have the Zed y, that will give me the
x. So so long as you have you know, all of the data points, except one, you can always
find what that point is, based on the fact that, you know, we have this line, and we're
using that to predict. So I think I'm going to leave it at that for the explanation. I
hope that makes sense. Again, just understand that we use linear regression when our data
points are correlated linearly. Now, some good examples of linear regression were, you
know, that kind of student predicting the grade kind of thing, you would assume that
if someone has, you know, a low grade, then they would finish with a lower grade, and
you would assume if they have a higher grade, they would finish with a higher grade. Now,
you could also do something like predicting, you know, future life expectancy. Now, this
is kind of a darker example. But essentially, what you could think of here is, if someone
is older, they're expected to live, you know, like, not as long. Or you could look at health
conditions, if someone is in critical illness condition, and they have a critical illness,
then chances are their life expectancy is lower. So that's an example of something that
is correlated linearly, essentially, something goes up and something goes down or something
goes up, the other thing goes up. That's kind of what you need to think of when you think
of a linear correlation. Now the magnitude of that correlation, so you know, how much
does one go up versus how much one goes down, is exactly what our algorithm figures out
for us, we just need to know to pick linear regression when we think things are going
to be correlated in that sense. Okay, so that is enough of the explanation of linear regression.
Now, we're going to get into actually coding and creating a model. But we first need to
talk about the data set that we're going to use in the example we're going to kind of
illustrate linear regression with. Okay, so I'm here and I'm back in the notebook. Now,
these are the inputs, we need to start with to actually start programming and getting
some stuff done. Now, the first thing we need to do is actually install SK learn. Now, even
if you're in a notebook, you actually need to do this because for some reason, it doesn't
come by default with the notebook. So to do this, we just did an exclamation point, pip
install hyphen, Q, sk learn. Now if you're going to be working on your own machine. Again,
you can use PIP to install this. And I'm assuming that you know to use PIP if you're going to
be going along in that direction. Now, as before, since we're in the notebook, we need
to define we're going to use TensorFlow version two point x. So to do that, we're going to
just, you know, do that up here with the percent sign. And then we have all these imports,
which we're going to be using throughout here. So from future import, absolutely important
division, print function, Unicode literals. And then obviously, the big ones. So NumPy,
pandas matplotlib. When we're using ipython, we're gonna be using TensorFlow. And Yep,
so I'm actually just gonna explain what some of these modules are. Because I feel like
some of you may actually not know. NumPy is essentially a very optimized version of arrays
in Python. So what this allows us to do is lots of kind of multi dimensional calculations.
So essentially, if you have a multi dimensional array, which we've talked about before, right
when we had, you know, those crazy shapes, like 5555 NumPy, allows us to represent data
in that form, and then very quickly manipulate and perform operations on it. So we can do
things like cross product dot product, matrix addition, matrix subtraction, element wise,
addition, subtraction, you know, vector operations, that's what this does. For us. It's pretty
complex, but we're going to be using it a fair amount. pandas. Now what pandas does
is it's kind of a data analytics tool. I almost want to say, I don't know the formal definition
of what pandas is, but it allows us to very easily manipulate data so you know, loading
data sets, view data sets, cut off specific columns, or cut out rows from our data set,
visualize the data sets. That's what pandas does for us. Now, matplotlib is actually a
visualization have kind of graphs and charts. So we'll use that a little bit lower when
I actually graph some different aspects of our data set, the ipython display. This is
just specific for this notebook. It's just to clear the output. There's nothing crazy
with that. And then obviously, we know what TensorFlow is this crazy import for TensorFlow
here. So compact v2 feature column as FC we'll talk about later. But we need something called
a feature column when we create a linear regression algorithm or model in TensorFlow, so we're
going to use that. Okay. So now that we've gone through all that, we need to start talking
about the data set that we're going to use for linear regression. And for this example,
because what we're going to do is, you know, actually create this model and start using
it to predict values. So the data set that we're going to use, actually, I need
to read this because I forget exactly what the name of it is, is the Titanic data set,
that's what it is. So essentially, what this does is aimed to predict who's going to survive,
or the likelihood that someone will survive being on the Titanic given a bunch of information.
So what we need to do is a load in this data set. Now I know this seems like a bunch of
gibberish, but this is how we need to load it. So we're going to use pandas, so PD dot
read CSV from this URL. So what this is going to do is take this CSV file, which stands
for comma separated values, and we can actually look at this if we want, I think, so I said
it said Ctrl, click, let's see if this pops up. So let's actually download this. And let's
open this up ourselves and have a look at what it is in Excel. So I'm going to bring
this up here, you can see that link. And this is what our data set is. So we have our columns,
which just stand for, you know, what is it the different attributes in our data set of
the different features and labels of our data set, we have survived. So this is what we're
actually going to be aiming to predict. So we're going to call this our label right where
our output information. So here, a zero stands for the fact that someone did not survive,
and one stands for the fact that someone did survive. Now just thinking about it on your
own for a second, and looking at some of the categories we have up here, can you think
about why linear regression would be a good algorithm for something like this? Well, for
example, if someone is a female, we can kind of assume that they're going to have a higher
chance of surviving on the Titanic, just because of you know, the kind of the way that our
culture works, you know, saving woman and children first, right. And if we look through
this data set, we'll see that when we see females, it's pretty rare that they don't
survive, although as I go through, there is quite a few that didn't survive. But if we
look at it, compared to males, you know, there's definitely a strong correlation that being
a female results in a stronger survival rate. Now, if we look at age, right, can we think
of how age might affect this? Well, I would assume if someone's way younger, they probably
have a higher chance of surviving, because they would be you know, prioritized in terms
of lifeboats or whatever it was, I don't know much about the Titanic. So I can't talk about
that specifically. But I'm just trying to go through the categories and explain to you
why we pick this algorithm. Now, number of siblings that one might not be as you know,
influential, in my opinion, parche. I don't actually remember what parche stands for,
I think it is like what part, I don't know exactly what this column stands for. So unfortunately,
I can't tell you guys that one. But we'll talk about some more the second fare, again,
not exactly sure what fare stands for, I'm going to look on the TensorFlow website after
this and get back to you guys. And we have a class. So class is what class they were
on the boat, right? So first class, second class, third class. So you might think someone
that's in a higher class might have a higher chance of surviving, we have deck. So this
is what deck they were on when it crashed. So unknown is pretty common. And then we have
all these other decks, you know, if someone got hit, if someone was standing on the deck
that had the initial impact, we might assume that they would have a lower chance of survival
bark to is where they were going, and then are they alone? Yes or no. And this one, you
know, this is interesting, we're going to see does this make an effect, if someone has
a loan is that a higher chance of survival? Is that a lower chance of survival? So this
is kind of interesting. And this is what I want you guys to think about is that when
we have information and data like this, we don't necessarily know what correlations there
might be. But we can kind of assume there's some linear thing that we're looking for some
kind of pattern, right. Whereas if something is true, then you know, maybe it's more likely
someone will survive. Whereas like, if they're not alone, maybe it's less likely. And maybe
there's no correlation whatsoever. But that's where we're gonna find out as we do this model.
So let me look actually, on the TensorFlow website and see if I can remember what parche
and I guess what fair was. So let's go up to the top here. Again, a lot of this stuff
is just straight up, copied from the TensorFlow website. I've just added my own stuff to it.
You can see like, I just copied all this. We're just bringing it in there. Let's see
what it says about the different columns if it gives us any exact explanations. Okay,
so I couldn't find what parts were fair stance where for some reason, it's on the TensorFlow
website, either I couldn't really find any information about it. Because no, you know,
leave a comment down below. But it's not that important. We just want to use this data to
do a test. So what I've done here, we've I've loaded in my data set, and notice that I've
loaded a training data set in a testing data set. Now we'll talk about this more later.
This is important, I have two different data sets, one to train the model with, and one
to test the model with no kind of the basic reason we would do this Because when we test
our model for accuracy to see how well it's doing, it doesn't make sense to test it on
data it's already seen, it needs to see fresh data so we can make sure there's no bias and
that it hasn't simply just memorize the data, you know, that we had. Now, what I'm doing
here at the bottom with this y train, and this y eval is I'm essentially popping a column
off of this data set. So if I print out the data set here, and I'm actually I'm going
to show you a cool trick with pandas that we can use to look at this. So I can say df
train.so, by looking at this by just looking at the head, and we'll print this out, I might
need to import some stuff above. We'll see if this works or not yet, so I need to just
do these imports. So let's install. And let's do these imports away for the serrana. Okay,
so I've just selected TensorFlow 2.0. We're just importing this now should be done in
one second. And now what we'll do is we'll print out the data frame here. So essentially,
what this does is load this into a panda's data frame. This is a specific type of object.
Now, we're not going to go into this specifically. But a data frame allows us to view a lot of
different aspects about the data and kind of store it in a nice form, as opposed to
just loading it in and storing it in like a list or a NumPy array, which we might do
if we didn't know how to use pandas, this is a really nice way to do it read CSV loaded
into a data frame object, which actually means we can reference specific columns and specific
rows in the data frame. So let's run this and just have a look at it. Yep, I got need
to print df train dot head. So let's do that. And there we go. So this is what our data
frame head looks like. Now head, what that does is show us the first five entries in
our data set, as well as show us a lot of the different columns that are in it. Now
since we have more than you know, we have a few different columns, it's not showing
us all of them, it's just giving us the dot dot dot. But we can see this is what the data
frame looks like. And this is kind of the representation internally. So we have entries
zero survived zero survived. One, we have male, female, all that. Now notice that this
has the survived column, okay? Because what I'm going to do is I'm going to print the
data frame head again. So df train, dot head, after we run these two lines. Now what this
line does is takes this entire survived column, so all these zeros and ones, and removes it
from this data frame, so the head data frame, and stores it in the variable y train. The
reason we need to do that is because we need to separate the data, we're going to be classifying
from the data that is kind of our input information or our initial data set. Right. So since we're
looking for the survived information, we're gonna put that in its own, you know, kind
of variable store here. Now we'll do the same thing for the evaluation data set, which is
df evaluation or testing data. And notice that here, this was trained as CSV, and this
one was eval dot CSV. Now these have the exact same form, they look the look completely identical.
It's just that, you know, some entries, we've just kind of arbitrarily split them. So we're
gonna have a lot of entries in this training set. And we'll have a few in the testing set
that we'll just used to do an evaluation on the model later on. So we pop them off by
doing this pop removes and returns this column. So if I print out y train, which are actually
let's look at this one, first, just to show you how it's been removed, we can see that
we have the survived column here, we popped, and now the survived column is removed from
that data set. So it's just important to understand. Now we can print out some other stuff too.
So we can look at the Y train, see what that is just to make sure we really understand
this data. So let's look at y train. And you can see that we have 626 or 627 entries, and
just you know, zeros or ones representing whether someone survived or whether they did
not. Now the corresponding indexes in this kind of list or data frame correspond to the
indexes in the testing and training data frame. What I mean by that is, you know, entry zero
in this specific data frame, corresponds to entry zero in our y train variable. So if
someone survived, you know, at entry zero, it would say one here, right, or in this case,
entry zero did not survive. Now, I hope that's clear. I hope I'm not confusing you with that.
But I just want to show one more example to make sure. So we'll say df train zero, I'm
going to print that and then we're going to print y train at index zero, oops, if I didn't
mess up my brackets, and we'll have a look at it. Okay, so I've just looked up the documentation,
because I totally forgot that I couldn't do that. If I want to find one specific row in
my data frame, what I can do is print dot look. So I do my data frame, and then dot
look, and then whatever index i want. So in this case, I'm locating row zero, which is
this. And then on the Y train, I'm doing the same thing. I'm locating row zero. Now what
I had before, right, if I did df train, and I put square brackets inside here, what I
can actually do is reference a specific column. So if I wanted to look at, you know, say the
column for age, right, so we have a column for age, what I can do is do df train age.
And then I can print this out like this, and it gives me all of the different age values.
So that's kind of how we use a data frame. We'll see that as we go further on. Now. Let's
go back to the other example I had, because I just erased it, where I wanted to show you
the rows zero in the data frame. That's training. And then in the y train, you know, output,
whatever that is. So the survival. So you can see here that this is what we get from
printing df train dot loke, zero. So row zero, this is all the information. And then here,
this corresponds to the fact that they did not survive at row zero because it's simply
just the output is value zero. I know this is weird is saying like name zero D type object
zero, don't worry about that. It's just because it's trying to print it with some information.
But essentially, this just means this person who was male 22, and had one sibling did not
survive. Okay, so let's get out of this. Now we can close
this, and let's go to Oh, we've pretty much already done what I've just had down here,
we can look at the data frame head, this is a little bit of a nicer output, when we just
have df train dot head, we can see that we get kind of a nice outputted little graph,
we've already looked at this information. So we know kind of some of the attributes
of the data set. Now we want to describe the data set, sometimes what describe does is
just give us some overall information. So let's have a look at it here, we can see that
we have 627 entries, the mean of age is 29, the standard deviation is you know, 12, point,
whatever. And then we get the same information about all of these other different attributes.
So for example, it gives us you know, the mean fair, the minimum fair, and just some
statistics. Because understand this great, if you don't doesn't really matter, the important
thing to look at typically is just how many entries we have, sometimes we need that information.
And sometimes the mean can be helpful as well, because you can kind of get an average of
like what the average value is in the data set. So if there's any bias later on, you
can figure that out. But it's not crazy important. Okay, so let's have a look at the shape. So
just like NumPy arrays, and tensors have a shape attribute. So do data frames. So we
want to look at the shape, you know, we can just print a df train dot shape, we get 627
by nine, which essentially means we have 627 rows, and nine columns or nine attributes.
So yeah, this is what it says here, you know, 627 entries, nine features, we can interchange
attributes and features. And we can look at the head information for y. So we can see
that here, which we've already looked at before. And that gives us the name, which was survived.
Okay, so now what we can actually do is make some kind of graphs about this data. Now I've
just stolen this code, you know, straight up from the TensorFlow website, I wouldn't
expect you guys to do any of this, you know, like output any of these values, what we're
going to do is create a few histograms and some plots just to look at kind of some correlations
in the data so that when we start creating this model, we have some intuition on what
we might expect. So let's look at age. So this gives us a histogram of the age. So we
can see that there's about 25, people that are kind of between zero and five, there is,
you know, maybe like five people that are in between five and 10. And then the most
amount of people are kind of in between their 20s, and 30s. So in the mid 20s, this is good
information to know, because that's going to introduce a little bit of bias into kind
of our linear correlation graph, right. So just understanding you know that we have like
a large subset, there's some outliers here, like there's one person that's ad right over
here, a few people that are 70, slim, important things to kind of understand before we move
on to the algorithm. So let's look at the sex values now. So this is how many female
and how many male, we can see that there's much, many more males on there as females,
we can have a look at the class. So we can see if they're in first, second or third class,
most people are in third, then followed by first and then second. And then lastly, we
can look at what is this that we're doing Oh, the percentage survival by sex. So we
can see how likely a specific person or specific sex is to survive just by plotting this. So
we can see that males have about a 20% survival rate, whereas females are all the way up to
about 78%. So that's important to understand, that kind of confirms that what we were looking
at before in the data set when we were exploring it, and you don't need to do this every time
that you're looking at a data set, but it is good to kind of get some intuition about
it. So this is what we've learned. So far, majority passengers are in their 20s, or 30s,
the majority passengers are male, they're in third class. And females have a much higher
chance of survival kind of already knew that. Alright, so training and testing data sets.
Now, we already kind of went through this, so I'll skim through it quickly. Something
that we did above is load in two different data sets. The first data set was that training
data set, which had the shape of 627 by nine, what I'm actually going to do is create a
code block here, and just have a look at what was this df eval dot shape to show you how
many entries we have in here. So here in our testing data set, you can see we have significantly
less at 264 entries, or rows, whatever you want to call them. So that's how many things
we have to actually test our model. So what we do is we use that training data to create
the model and then the testing data to evaluate it and make sure that it's working properly.
So these things are important. Whenever we're doing machine learning models, we typically
have testing and training data. And yeah, that is pretty much it. Now I'm just gonna
take one second to copy over a lot of this code into the kind of other notebook I have
just so we can see all of it at once. And then we'll be back and we'll get into actually
making the model. Okay, so I've copied in some code here. I
know this seems A lot of kind of gibberish right now, but I'm gonna break down line by
line what all this is doing and why we have this here. But we first need to discuss something
called the feature columns. And the difference between categorical and numeric data. So get
categorical data is actually fairly common. Now when we're looking at our data set, and
actually I can open I don't have it open in Excel anymore. But let's open this from my
downloads. So let's downloads where is this train? Okay, awesome. So we have this Excel
data sheet here. And we can see what a categorical data or categorical data is, is something
that's not numeric. So for example, unknown see, first third, city and why right, so anything
that has different categories, there's going to be like a specific set of different categories,
there could be so for example, for age, kind of the set of values we could have for age
was is numeric, so that's different. But for categorical, we can have male or female, and
I suppose we could have other but in this data set, we just have mail and we just have
female. For class, we're gonna have first, second third for deck, we can have unknown
CA, I'm sure through all the letters of the alphabet, but that is still considered categorical.
Now, what do we do with categorical data? Well, we always need to transform this data
into numbers somehow. So what we actually end up doing is we encode this data using
integer values. So for the example of male and female, what we might say, and this is
what we're gonna do in a second is that female is represented by zero and male is represented
by one, we do this because although it's interesting to know what the actual class is, the model
doesn't care, right female and male, it doesn't make a difference to it, it just needs to
know that those values are different or different, or those values are the same. So rather than
using strings and trying to find some way to pass that in and do math with that, we
need to turn those into integers, we turn those into zeros and ones right now for class,
right, so first, second, third, you know, you guys can probably assume that we're going
to encode this with we're gonna encode it with 012. Now, again, this doesn't necessarily
need to be an order. So a third could be represented by one and first can be represented by two,
right? It doesn't need to be an order, it doesn't matter. So long as every third has
the same number, every first has the same number, and every second has the same number.
And then same thing with deck. Same thing with embark and same thing with a loan. Now,
we could have an instance where you know, we've encoded every single one of these values
with a different value. So in the, you know, rare occasion where there's one category,
that's categorical, and every single value in that category is different than we will
have, you know, 627, in this instance, different encoding labels that are going to be numbers,
that's fine, we can do that. And actually, we don't really need to do that, because you're
going to see how TensorFlow can handle that for us. So those categorical data, numeric
columns are pretty straightforward. They're anything that just have integer or float values
already. So in this case, age and fair. And yeah, so that's what we've done. We've just
defined our categorical columns here, and our numeric columns here. This is important,
because we're going to loop through them, which we're doing here to create something
called feature columns. feature columns are nothing special, they're just what we need
to feed to our linear estimator or linear model to actually make predictions. So kind
of our steps here that we've gone through so far, is import, load the data set, explore
the data set, make sure we understand it, create our categorical columns and our numeric
columns. So I've just hard coded these in right, like sex parch class deck alone, all
these ones. And then same thing with numeric columns. And then for a linear estimator,
we need to create these as feature columns using some kind of advanced syntax, which
we're going to look at here. So we create a blank list, which is our feature columns,
which will just store our different feature columns, we loop through each feature name
in the categorical columns. And what we do is we define a vocabulary, which is equal
to the data frame at that feature name. So first, we would start with sex, then we go
and siblings, then we go parched, then we go class. And we get all of the different
unique values. So that's actually what this does dot unique gets a list of all unique
values from the feature code. And I can print this out, she'll put this in a different line,
we'll just take this value and have a look at actually what this is, right. So if I run,
I just will have to run all these in order. And then we'll create a new code block while
we wait for that to happen. Let's see if we can get this installing fast enough. Run, run, run. Okay, now we go to df train, and we can see
this is what this looks like. So these are all the different unique values that we had
in that specific feature name. Now that feature name was what? categorical columns. Oh, what
I do feature name of sorry, that's gonna be the unique one. Let's just put rather than
feature name, let's put sex right and let's have a look at what this is. So we can see
that the two unique values are male and female. Now I actually want to do what is it embark
town and I want to see what this one is. So how many different values we Um, so we'll
copy that in and we can see we have Southampton cannot pronounce that and then the other cities
and unknown, and that is kind of how we get the unique value. So that's what that method
is doing. There. Let's actually delete this code block because we don't need anymore. Alright, so that's what we do. And then when we do down here
is we say feature columns dot append. So just add to this list, the TensorFlow feature column
dot categorical column with vocabulary list. Now, I know this is a mouthful, but this is
kind of something again, you're just going to look up when you need to use it, right.
So understand that you need to make feature columns for linear regression, you don't really
need to completely understand how, but you just need to know that that's something you
need to do. And then you can look up the syntax and understand. So this is what this does,
this is actually going to create for us a column, it's going to be in the form of a
like NumPy array, kind of that has the feature name, so whatever one we've looped through,
and then all of the different vocabulary associated with it. Now we need this because we just
need to create this column, so that we can create our model using those different columns,
if that makes any sense. So our linear model needs to have you know, all of the different
columns we're going to use, it needs to know all of the different entries that could be
in that column, and needs to know whether this is a categorical column or a numeric
column. In previous examples, what we might have done is actually changed the data set
manually, so encode it manually, TensorFlow just can do this for us now in touch for 2.0.
So we'll just use that too. Okay, so that's what we did with these feature columns. Now,
for the numeric columns a little bit different, it's actually easier, all we need to do is
give the feature name and whatever the data type is, and create a column with that. So
notice, we don't we can omit this unique value because we know when it's numeric, but you
know, there could be an infinite amount of values. And then I've just printed out the
feature columns, you can see what this looks like. So vocabulary lists, categorical column,
gives us the number of siblings, and then the vocabulary list is these are all the different
encoding values that is created. then same thing, you know, we go down here parch, these
are different encodings. So they're not necessarily an order is like what I was talking about
before. Let's go to a numeric one. What do we have here? Um, yeah, so for numeric comm,
just as the key, that's the shape we're expecting, and this is the data type. So that is pretty
much it. We're actually loading these in. So now it's almost time to create the model.
So what we're going to do to create the model now is talk about first the training process
and training some kind of, you know, machine learning model. Okay, so the training process,
now, the training process of our model is actually fairly simple, at least for linear
model. Now, the way that we train the model is we feed it information, right? So we feed
it that those data points from our data set. But how do we do that? Right? Like, how do
we feed that to the model? Do we just give it all at once? Well, in our case, we only
have 627 rows, which isn't really that much data, like we can fit that in RAM in our computer,
right? But what if we're training a crazy machine learning model, and we have, you know,
25 terabytes of data that we need to pass it, we can't load that into RAM, at least
I don't know, any ram that's that large. So we need to find a way that we can kind of
load it in what's called batches. So the way that we actually load this model is we load
it in batches. Now we don't need to understand really kind of how this process works and
how batching kind of occurs, what we do is give 32 entries at once to the model. Now
the reason we don't just feed one at a time is because that's a lot slower, we can load
you know, a small batch size of 32, that can increase our speed dramatically. And that's
kind of a lower level understanding. So I'm not going to go too far into that. Now that
we understand, we kind of load it in batches, right? So we don't load it entirely all at
once we just load a specific set of kind of elements as we go. What we have is called
epochs. Now, what are epochs? Well, epochs are essentially how many times the model is
going to see the same data. So what might be the case, right, and then when we pass
the data to our model, the first time, it's pretty bad, like it looks at the model creates
a line of best fit, but it's not great, it's not working perfectly. So we need to use something
called an epoch, which means we're just going to feed the model feed the data again, but
in a different order. So we do this multiple times, so that the model will look at the
data, look at the data in a different way, and then kind of a different form and see
the same data a few different times it pick up on patterns, because the first time it
sees a new data point is probably not going to have a good idea how to make a prediction
for that. So we can feed it more and more and more than you know, we can get a better
prediction. Now this is where we talk about something called overfitting, though, sometimes
we can see the data too much of the past too much data to our model to the point where
it just straight up memorizes those data points. And it's it's really good at classifying for
those data points, but we pass it some new data points, like our testing data, for example.
It's horrible at kind of, you know, classifying those. So what we do to kind of prevent this
from happening is we just make sure that we start with like a lower amount of epochs and
then we can work our way up and kind of incrementally change that if we need to, you know, go higher,
right? We need more epochs. So yeah, so that's kind of it for epochs. Now, I will say that
this training process kind of applies to all the different What is it machine learning
models that we're going to look at? We have epochs, we have batches we have a batch size,
and now we have something called an input function. Now, this is pretty complicated.
This is the code for the input function. Don't like that we need to do this, but it's necessary.
So essentially what an input function is, is the way that we define how our data is
going to be broke into epochs and into batches to feed to our model. Now, these, you probably
aren't ever going to really need to code like from scratch by yourself. But this is one
I've just stolen from the tangible website pretty much like everything else that's in
the series. And what this does, is it takes our data and encodes it in a TF data data
set object. Now this is because our model needs this specific object to be able to work,
it needs to see a data set object to be able to use that data to create the model. So what
we need to do is take this panda's data frame, we need to turn it into that object. And the
way we do that is with the input function. So we can see that what this is doing here.
So this is make input function, we actually have a function defined inside of another
function. I know this is kind of complicated for some of you guys, but and what I'm actually
gonna do Sorry, I'm gonna just copy this into the other page, because I think it's easier
to explain without all the text around. So let's create a new code block. let's paste
this in. And let's have a look at what this does. So actually, let me just tap down. Okay,
so make input function. We have our parameters data data frame, which is our panda's data
frame, our labeled data frame, which stands for those labels. So that y train, or that
eval, y eval, right? We have number of epochs, which is how many epochs we're going to do
we set the default 10 shuffle, which means are we going to shuffle our data and mix it
up? Before we pass it to the model in batch size, which is how many elements are we going
to give to that to the model? Well, it's training at once. Now, what this does is we
have an input function defined inside of this function. And we say data set equals tensor
frame dot data dot data set from tensor slices, dict data frame labelled data. Now what this
does, and we can read the comment, I mean, create a TF data data set object with the
data and its label. Now I can't explain to you like how this works on a lower level.
But essentially, we pass a dictionary representation of our data frame, which is whatever we passed
in here. And then we pass the label data frame, which is going to be you know, all those y
values. And we create this object. And that's what this line of code does. So TF data data
set from tensor slices, which is just what you're going to use, I mean, we can read this
documentation, create a data set whose elements are slices of the given tensors. The given
tensors are sliced along the first dimension, this operation preserves the structure of
the input tensors, removing the first dimension of each tensor and using it as the data set
dimension. So I mean, you guys can look at that, like read through the documentation,
if you want. But essentially, what it does is create the desert object for this. Now,
if shuffle, DS equals DS dot shuffle 1000, what this does is just shuffle the data set,
you don't really need to understand more than that. And then what we do is we see data set
equals data set dot batch, the batch size, which is going to be 32. And then repeat for
the number of epochs. So what this is going to do is essentially take our data set and
split it into a number of I don't want to what do I want to call it, like blocks that
are going to be passed to our model. So we can do this by knowing the batch size, it
obviously knows how many elements because that's the data set object itself, and then
repeat number of epochs. So this can figure out you know, how many one how many blocks
Do I need to split it into, to feed it to my model, never returned out a set simply
from this function here, we'll return that data set object, and then on the outside return,
we actually return this function. So what this out exterior function does, and I'm really
just trying to break this down. So you guys understand is make an input function, it literally
makes a function and returns the function object to wherever we call it from. So that's
how that works. Now we have a train input function and an eval input function. And what
we need to do to create these images use this function that we've defined above. So we say
make input function, df train, y train, so our data frame for training and our data frame
for the labels of that. So we can see the comment, you know, here we will call the input
function, right. And then eval train. So it's going to be the same thing except for the
evaluation. We don't need to shuffle the data because we're not training it, we only need
one epoch, because again, we're just training it. And we'll pass the evaluation data set
and the evaluation value from why. Okay, so that's it for making the input function. Now,
I know this is complicated, but that's the way we have to do it. And unfortunately, if
you don't understand after that, there's not much more I can do, you'd might just have
to read through some of the documentation. Alright, creating the model. We're finally
here. I know this has been a while, but I need to get through everything. So linear
estimates, we're gonna copy this and I'm just gonna put it in here. And we'll talk about
what this does. So linear underscore, st equals TF dot estimator dot linear classifier, and
we're giving it the feature columns that we created up here. So this work was not for
nothing. We have this feature column, which defines you know, what is in every single
way. What should we expect for our input data, we pass that to a linear classifier object
from the estimator module from TensorFlow. And then that creates the model for us. Now,
this, again, is syntax. So you don't need to memorize, you just need to understand how
it works, what we're doing is creating an estimator, all of these kind of core learning
algorithms use what's called estimators, which are just basic implementations of algorithms
in TensorFlow. And again, pass the feature columns. That's how that works. Alright, so now let's go to training the model. Okay,
so I'm just going to copy this again, I know you guys think I'm just copying the code back
and forth. But I'm not going to memorize the syntax, I just want to explain to you how
all this works. And again, you guys will have all this code, you can mess with it, play
with it, and learn on your own that way. So to train is really easy. All we need to do,
I say, linear ESP dot train, and then just give that input function. So that input function
that we created up here, right, which was returned from make input function, like this
train input function here is actually equal to a function, it's equal to a function object
itself. If I were to call train underscore input function like this, this would actually
call this function. That's how this works in Python, it's a little bit of a complicated
syntax, but it's how it works, we pass that function here. And then this will use the
function to grab all of the input that we need and train the model. Now, the result
is going to be rather than trained, we're going to evaluate right and notice that we
didn't store this one in a variable, but we're storing the result in a variable, so that
we can look at it. Now clear output is just from what we import above just gonna clear
the console output, because there will be some output while we're training, that we
can print the accuracy of this model. So let's actually run this and see how this works.
This will take a second. So I'll be back once this is done. Okay, so we're back, we've got
a 73.8% accuracy. So essentially, what we've done right is we've trained the model, you
might have seen a bunch of output while you were doing this on your screen. And then we
printed out the accuracy. after evaluating the model. This accuracy isn't very good,
but for our first shot this Okay, and we're gonna talk about how to improve this in a
second. Okay, so we've evaluated the data set, we stored that in result, I want to actually
look at what result is because obviously, you can see we've referenced the accuracy
part, like, you know, as if this was a Python dictionary. So let's run this one more time.
I'm just going to take a second again. So okay, so we printed out results here. And
we can see that we have actually a bunch of different values, we have accuracy, accuracy,
baseline, AUC, and all these different kind of statistical values. Now, these aren't really
going to mean much to you guys. But I just want to show you that we do have those statistics.
And to access any specific one, this is really just a dictionary object. So we can just reference
the key that we want, which is what we did with accuracy. Now notice, our accuracy actually
changed here, we went to 76. And the reason for this is, like I said, you know, our data
is getting shuffled, it's getting put in in a different order. And based on the order
in which we see data, our model will, you know, make different predictions and be trained
differently. So if we had, you know, another epoch, right, if I change epochs to say, 11,
or 15, our accuracy will change. Now it might go up, it might go down, that's something
we have to play with, as you know, our machine learning developer, right, that's what your
goal is, is to get the most accurate model. Okay, so now it's time to actually use the
model to make predictions. So up until this point, we've just been doing a lot of work
to understand how to create the model, you know what the model is how we make an input
function, training, testing data, I know a lot, a lot, a lot of stuff. Now to actually
use this model and like make accurate predictions with it is somewhat difficult, but I'm going
to show you how. So essentially, TensorFlow models are built to make predictions on a
lot of things at what they're not great at making predictions on like one piece of data,
you just want like one passenger to make a prediction for, they're much better at working
in like large batches of data, that you can definitely do it with one, but I'm going to
show you how we can make a prediction for every single point that's in that evaluation
data set. So right now we looked at the accuracy. And the way we determine the accuracy was
by essentially comparing the results that the predictions gave from our model versus
what the actual results were, for every single one of those passengers. And that's how we
came up with an accuracy of 76%. Now, if we want to actually check and get predictions
from the model and see what those actual predictions are, what we can do is use a method called
dot predict. So what I'm going to do is I'm going to say, I guess results like this, equals
and in this case, we're going to do the model name, which is linear ESD dot predict. And
then inside here, what we're going to pass is that input function we use for the evaluation.
So just like you know, we need to pass an input function to actually train the model,
we also need to pass an input function to make a prediction. Now this input function
could be a little bit different. We can modify this a bit if we wanted to. But to keep things
simple, we use the same one for now. So what I'm going to do is just use this eval input
functions, the one we've already created where we did, you know, one epoch we don't need
to shuffle because it's just the evaluation set. So inside here rindu eval input funk.
Now what we need to do though, is convert this to a list, just because we're going to
loop through it. And I'm actually going to print out this value. So we can see what it
is, before we get to the next step. So let's run this and have a look at what we get. Okay,
so we get logistics array, we can see all these different values. So we have, you know,
this array with this value, we have probabilities, this value. And this is kind of what we're
getting. We're getting logistic, all classes, like there's all this random stuff. What you
hopefully should notice, and I know I'm just like whizzing through is that we have a dictionary
that represents the predictions. And I'll see if I can find the end of the dictionary
here. For every single, what is it prediction. So since we passed, you know, 267 input data
from this, you know, eval input function, what was returned to us is a list of all of
these different dictionaries that represent each prediction. So what we need to do is
look at each dictionary so that we can determine what the actual prediction was, is what I'm
going to do is actually just present do result wandering to result zero, because this is
a list. So that should mean we can index it. So we actually look at one prediction. Okay,
so this is the dictionary of one prediction. So I know this seems like a lot. But this
is what we have. This is our prediction. So logistics, we get some array, we have logistic
in here in this dictionary, and then we have probabilities. So what I actually want is
probability. Now since what we ended up having was a prediction of two classes, right, either
zero or one, we're predicting either someone survived, or they didn't survive, or what
their percentage should be, we can see that the percentage of survival here is actually
96%. And the percentage that it thinks that it won't survive is, you know, 3.3%. So if
we want to access this, what we need to do is click do result at some index, so whatever,
you know, one we want. So we're gonna say result. And then here, we're going to put
probabilities, so I'm just going to print that like that. And then we can see the probabilities.
So let's run this. And now we see our probabilities are 96, and 33. Now, if we want the probability
of survival, so I think I actually might have messed this up, I'm pretty sure the survival
probability is actually the last one. Whereas like the non survival is the first one because
zero means you didn't survive, and one means you did survive. So that's my bad, I messed
that up. So I actually want their chance of survival, or index one. So if I index one,
you see, we get 3.3%. But if I wanted their chance of not surviving, I would index zero.
And that makes sense, because zero is, you know, what we're looking at like zero represents,
they didn't survive, whereas one represents they did survive. So that's kind of how we
do that. So that's how we get. Now if we wanted to loop through all of these, we could we
could loop through every dictionary, we could print every single probability of each person,
we could also look at that person stats, and then look at their probability. So let's see,
the probability of surviving is in this case, you know, 3%, or whatever it was 3.3%. But
let's look at the person that we were actually predicting them and see if that makes sense.
So if I go eval are what was it df eval dot Loke. Zero, we print that and then we print
the result, what we can see is that for the person who was male, and 35, that had no siblings,
their fare was this, they're in third class, we don't know what deck they were on. And
they were alone, they have a 3.3% chance of survival. Now, if we change this, we could
go like to two, let's have a look at this second person and see what their chances survival
is, okay, so they have a higher percent chance a 38% chance, they're female, they're a little
bit older. So that might be a reason why their survival rates a bit lower. I mean, we can
keep doing this and look through and see what it is, right? If we want to get the actual
value, like if this person survived, or if they didn't survive. And what I can do is
I can print df eval, actually, it's not going to be eval, it's going to be y underscore eval. Yep. And that's going to be dot Loke. Three.
Now, this will give us if they survived or not. So actually, in this case, that person
did survive, but we're only predicting a 32%. So you can see that that's, you know, represented
in the fact that we only have about a 76% accuracy. Because this model is not perfect.
And in this instance, it was pretty bad. It's saying they have a 32% chance of surviving,
but they actually get survived. So maybe that should be higher, right? So we could change
this number, and go for four. I'm just messing around and showing you guys you know how we
use this. So in this one, you know, same thing, this person survived, although, what is it
they only were given a 14% chance of survival. So anyways, that is how that works. This is
how we actually make predictions and look at the predictions, you understand that now
what's happening is I've converted this to a list just because it's actually a generator
object, which means it's meant to just be looped through rather than just look at it
with a list but that's fine. We'll use a list and then we can just print out you know, result
that whatever index probabilities and then one to represent their chance of survival.
Okay, so that has been it for linear regression. Now, let's get into classification. And now
we are on to classification. So essentially classification Is differentiating between,
you know, data points and separating them into classes. So rather than predicting a
numeric value, which we did with regression earlier, so linear regression, and you know,
the percentage survival chance, which is a numeric value, we actually want to predict
classes. So what we're going to end up doing is predicting the probability that a specific
data point or a specific entry, or whatever we're going to call it is within all of the
different classes it could be. So for example, here, we're gonna use flowers. So it's called
the iris, I think it's the iris flower data set, or something like that. And we're gonna
use some different properties of flowers to predict what species of flower it is. So that's
a difference between classification and regression. Now, I'm not going to talk about the specific
algorithm we're going to use here for classification, because there's just so many different ones
you can use. But yeah, I mean, if you really care about how they work on a lower mathematical
level, I'm not going to be explaining that because it doesn't make sense to explain it
for one algorithm when there's like hundreds, and they all work a little bit differently.
So you guys can kind of look that up. And I'll tell you some resources and where you
can find that. I'm also going to go faster through this example, just because I've already
covered kind of a lot of the fundamental stuff in linear regression. So hopefully, we should
get this one done a little bit quicker, and move on to the next kind of aspects in this
series. Alright, so first steps, load TensorFlow, import TensorFlow, we've done that already.
data set, we need to talk about this. So the data set we're using is that Iris flowers
data set, like I talked about, and this specific data set separates flowers into three different
species. So we have these different species. This is the information we have. So sepal,
length, width, petal length, petal width, we're going to use that information, obviously,
to make the predictions. So given this information, you know, in our final model, can it tell
us which one of these flowers it's most likely to be? Okay. So what we're going to do now
is define the CSV call names and the species. So the column names are just going to define,
what we're going to have in our data set is like the headers for the columns. species,
obviously, it's just the species and we'll throw them there. Alright, so now we're going
to load in our data sets. So this is going to be different every time you're kind of
working with models depending on where you're getting your data from. And our example, we're
going to get it from Kara's, which has kind of been a sub module of TensorFlow has a lot
of useful data sets and tools that we'll be using throughout the series. But Cara's that
utils dot get file, again, don't really focus on this, just understand what this is going
to do is save this file onto our computer as Iris training dot CSV, grab it from this
link. And then what we're gonna do down here is load the train and test and again, notice
this training, and this is testing into two separate data frames. So here, we're going
to use the names of the columns as the CSV column names, we're going to use the path
as whatever we loaded here, header equals zero, which just means row zero is the header.
Alright, so now, we will move down and we'll have a look at our data set. So like we've
done before, oh, I've got to run this code first. csv column nips, okay, so we've just,
we're just running things in the wrong order here, apparently. Okay, so let's look at the
head. So we can see this is kind of what our data frame looks like. And notice that our
species here are actually defined numerically. So rather than before, when we had to do that
thing, where, you know, we made those feature columns, and we converted the categorical
data into numeric data with those kind of weird tensor flow tools. This is actually
already encoded for us. No zero stands for setosa. And then wanting to obviously stand
for these ones, respectively. And that's how that works. Now, these I believe, are in centimeters,
the sepal length, petal length, petal width, that's not super important. But sometimes
you do want to know that information. Okay, so now we're going to pop up those columns
for the species like we did before, and separate that into train y, test y, and then have a
look at the head again. So let's do that. And run this. Notice that is gone. Again,
we've talked about how that works. And then these if we want to have a look at them. And
actually, let's do this, we're just having a new block, let's say train, underscore,
y dot, what is it? dot head? If I could spell head correctly, okay, so we run head, and
we can see, this is what it looks like nothing special. That's what we're getting. Alright,
so let's delete that. Let's look at the shape
of our training data. I mean, we can probably guess what it is already. But we're gonna
have shape for because we have four features. And then how many entries do we have? Well,
I'm sure this will tell us. So 120 entries in shape for awesome. That's our shape. Okay,
input function. So we're moving fast here already, we're getting into a lot of the coding.
So what I'm actually going to do is, again, copy this over into a separate document. And
I'll be back in a second with all that. Okay, so input function time, we already know what
the input function does, because we used it previously. Now, this input function is a
little bit different than before, just because we're kind of changing things slightly. So
here, we don't actually have any. What do you call it? We don't have any epochs and
our batch size is different. So what we've done here is rather than actually, you know,
defining like make input function, we just have input function like this. And what we're
going to do is a little bit different one passes input function. I'll kind of show you
a little bit more complex. But you can see that we've cleaned this up a little bit. So,
exactly, we're doing what we do before, we're converting this data, which is our features,
which we're passing in here into a data set. And then we're passing those labels as well.
And then if we're training, so if training is true, what we're going to do is say data
set is equal to the data set dot shuffled. So we're going to shuffle the information,
and then repeat that. And that is all we really need to do, we can do data set dot batch at
the batch size 256, return that, and we're good to go. So this is our input function.
Again, these are kind of complicated, you kind of have to just get experience seeing
a bunch of different ones to understand how to actually make one on your own. From now
on, don't worry about it too much, you can pretty much just copy the input functions
you've created before and modify them very slightly if you're going to be doing your
own models. But by the end of this, you should have a good idea of how these input functions
work, we will have seen like four or five different ones. And then you know, we can
kind of mess with them and tweak them as we go on. But don't focus on it too much. Okay,
so input function, this is our input function, I'm not really going to go into much more
detail with that. And now our feature columns. So this is, again, pretty straightforward.
For the feature columns, all we need to do for this is since they're all numeric feature
columns is rather than having to for loops, where we were separating the numeric and categorical
feature columns before, we can just loop through all of the keys in our training data set.
And then we can append to my feature columns blank list, the feature column dot numeric
column, in the key is equal to whatever key we've looped through here. Let me show you
what this means in case anyone's confused. Again, you can see when I print my feature
columns, we get key equals sepal length, we get our shape, and we get all of that other
nice information. So let's copy this into the other one, have a look at our output after
this. Okay, so my feature columns for key and train keys. So notice trainers here, train
keys, what that does is actually give us all the columns. So this was a really quick and
easy way to kind of loop through all the different columns, although I could have looped through
CSV column names and just removed the species column to do that, but again, we don't really
need to. So for key and trade keys, my featured columns dot append TF, feature column, numeric
column key equals key, this was just gonna create those feature columns,
we don't need to do that vocabulary thing. And that dot unique because again, these are
all already encoded for us. Okay, awesome. So that was the next step. So let's go back
here, building the model. Okay. So this is where we need to talk a bit more in depth
of what we're actually going to build. So the model for this is a classification model.
Now there is like hundreds of different classification models we can use that are pre made in TensorFlow.
And so far, what we've done with that linear classifier is that's a pre made model that
we kind of just feed a little bit of information to, and it just works for us. Now here, we
have two kind of main choices that we can use for this kind of classification tasks
that are pre built in TensorFlow, we have a dnn classifier, which stands for a deep
neural network, which we've talked about very vaguely, very briefly, and we have a linear
classifier. Now, a linear classifier works very similarly to linear regression, except
it does classification. Rather than regressions, we get actually numeric value, or we get sorry,
you know, the labels like probability of being a specific label, rather than a numeric value.
But in this instance, we're actually going to go with deep neural network. Now, that's
simply because tensor flow on their website like this is all of this is kind of building
off of the TensorFlow website, because all the code is very similar. And I've just added
my own spin and explained things very in depth, they recommend using that deep neural network
for this is a better kind of choice. But typically, when you're creating machine learning apps,
you'll mess around with different models and kind of tweak them. And you'll notice that
it's not that difficult to change models, because most of the work comes from loading
and kind of pre processing our data. Okay, so what we need to do is build a deep neural
network with two hidden later, two hidden layers with 30 nodes and 10 hidden nodes each.
Now I'm going to draw out the architecture of this neural network in just one second,
but I want to show you what we've done here. So we said classifier equals TF dot estimator.
So this estimator module just stores a bunch of pre made models from TensorFlow. So in
this case, dnn classifier is one of those, what we need to do is pass our feature columns,
just like we did to our linear classifier. And now we need to define the hidden units.
Now hidden units is essentially us a building the architecture of the neural network. So
like you saw before, we had an input layer, we had some like middle layers, called our
hidden layers in a neural network. And then we had our output layer, I'm going to explain
neural networks in the next module. So this will all kind of click and make sense for
now we've arbitrarily decided 30 nodes in the first hidden layer 10 in the second, and
the number of classes is going to be three. Now that's something that we need to decide
we know there's three classes for the flowers. So that's what we've defined. Okay, so let's
copy this and go back to the other page here. And that is now our model. And now it is time
to talk about how we can actually train the model which is coming down here. Okay, so
I'm going to copy this. I'm going to paste it over here. And let's just dig through this
because this is a bit more of a complicated piece of code than we usually use to work
with. I'm also going to remove these comments just to clean things up in here. So we've
defined the classifier which is it deep neural network classifier, we have our feature columns,
hidden units classes. Now to train the classifier, so we have this input function here, this
input function is different than the one we created previously, remember when when we
had previously was like make input whatever function I will continue typing in inside
of define another function, it actually returned that function from this function. I know complicated.
If you're not a Python kind of Pro, I don't expect that to make perfect sense. But here,
we just have a function, right, we do not returning a function from another function,
it just one function. So when we want to use this, to train our model, what we do is create
something called a lambda. Now, a lambda is an anonymous function that can be defined
in one line, when you write lambda, what that means is, essentially, this is a function.
So this is a function. And whatever's after the colon is what this function does. Now,
this is a one line function. So like if I create a lambda here, right? And I say lambda,
print, hi, and I said, x equals lambda. And I called x like that. This works. This is
a valid line of syntax. Actually, I want to make sure that I'm not just like messing with
you when I say that, and that this is actually correct. Okay, so sorry, I just accidentally
trained the model. So I just commented that out. You can see we're printing Hi, right
at the bottom of the screen. I know, it's kind of small, but it does say hi, that's
how this works. Okay, so this is a cool thing. If you haven't seen this in Python before,
that's what a lambda does, allows you to define a function in one line. Now, the thing that's
great about this is that we can say like, you know, x equals lambda, and here put another
function, which is exactly what we've done with this print function. And that means when
we call x, it will, you know, execute this function, which will just execute the other
function. So it's kind of like a chain where you call x, x is a function. And inside that
function, it does another function, right? It just like calling a function from inside
a function. So what is lambda doing here? Well, since we
need the actual function object, what we do is we define a function that returns to us
a function. So this actually just like it calls this function, when you put this here,
now, there's no I can't, it's very difficult to explain this, if you don't really understand
the concept of lambdas. And you don't understand the input functions. But just know we're doing
this because of the fact that we didn't embed another function and return the function object.
If we had done that, if we had done that, you know, input function that we created before
where we had the interior function, then we wouldn't need to do this, because what would
happen is we would return the input function, right like that, which
means when we passed it into here, it could just call that directly, it didn't need to
have a lambda. Whereas here, though, since we need to just put a lambda, we need to define
what this is. And then and then this works. That's just this, there's no other way to
really explain this. So yeah, what we do is we create this input function. So we pass
we have train, we have train, why we have training equals true, and then we do steps
equals 5000. So this is similar to an epoch, except this is just defining a set amount
of steps we're going to go through. So rather than saying, like, we'll go through the data
set 10 times, we're just gonna say we'll go through the data set until we've hit 5000
numbers like 5000, things that have been looked at. So that's what this does with that train.
Let's run this and just look at the training output. From our model, it gives us some like,
things here, we can kind of see how this is working. Notice that if I can stop here for
a second, it tells us the current step, it tells us the loss, the lowest, the lower this
number, the better. And then it tells us global steps per second. So how many steps we're
completing per second. Now at the end here, we get final step, loss of 39, which is pretty
high, which means this is pretty bad. But that's fine. This is kind of just our first
test at training a neural network. So this is just giving us output, while it's training
to kind of see what's happening. Now, in our case, we don't really care because this is
a very small model, when you're training models that are massive and take terabytes of data,
you kind of care about the progress of them. So that's when you would use kind of that
output, right? And you would actually look at that. Okay, so now that we've trained the
model, let's actually do an evaluation on the model. So we're just going to say classifier
dot evaluate. And what we're going to do is a very similar thing to what we've done here
is just past this input function, right, like here with a lambda once again, and reason
we add the lambda when we don't have this like, double function going on, like a nested
function, we need the lambda. And then in here, what we do is rather than passing train
and train why we're gonna pass test, I believe, and I think it's, I just call it test why.
Okay, and then for training, obviously, this is false. So we can just set that false like
that. I'm just gonna look at the other screen to make sure I didn't mess this up. Because
again, I don't remember the syntax. Yeah, so class classifier, dot evaluate TEST TEST
why looks good to me. We'll take this print statement just so we get a nice output for
our accuracy. Okay, so let's look at this. Again. We're gonna have to wait for this to
train. But I will show you a way that we don't need to wait for this to train Every time
and one second, and I'll be right back. Okay, so what I'm actually going to do, and I'm just kind of paused like
the execution of this code is throw this in the next block under. Because the nice thing
about Google Collaboratory is that I can run this block of code, right, I can train all
this stuff, which is what I'll run now while we're talking just so it happens. And then
I can have another code block kind of below it, which I have here. And it doesn't matter,
I don't need to rerun that block every time I change something here. So if I change something
in any lower blocks, I don't need to change the upper block, which means I don't need
to wait for this to train every time I want to do an evaluation on it. Anyways, so we've
done this, we can test we got test y, I just need to change this instead of eval result.
Actually, I need to say eval underscore result equals classifier dot evaluate, so that we
can actually store this somewhere and get the answer. And that will print this and notice
this happens much, much faster, we get a test accuracy of 80%. So if I were to retrain the model, chances are this
accuracy would change, again, because of the order in which we're seeing different flowers.
But this is pretty decent, considering we don't have that much test data. And we don't
really know what we're doing right? We're kind of just messing around and experimented
for right now. So to get 80% is pretty good. Okay, so actually, what am I doing? We need
to go back now and do prediction. So how am I going to predict this for specific flowers.
So let's go back to our core learning algorithms. And let's go to predictions. Now, I've written
a script already, just to save a bit of time that allows us to do a prediction on any given
flower. So what I'm going to do is create a new block down here, code block and copy
this function. And now we're going to digest this and kind of go through this on our own
to make sure this makes sense. But what this little script does is allow the user to type
in some numbers, so the sepal, length, width, and I guess, petal length and width, and then
it will spit out to you what the predicted class of that flower is. So we couldn't do
a prediction on every single one of our data points, like we did previously. And we already
know how to do that. I showed you that with linear regression. But here, I just wanted
to do it on one entry. So what do we do? So I start by creating a
input function, it's very basic, we have batch size 256, all we do is we give some features,
and we create a data set from those features. That's a dict. And then dot batch and the
batch size. So what this is doing is notice we don't give any y value, right? We don't
give any labels. reason we do we don't do that is because when we're making a prediction,
we don't know the label, right? Like we actually want that the model to give us the answer.
So here, I wrote down the features, I create a predictive dictionary, just cuz I'm going
to add things to it. And then I just prompted here with a print statement, please type numeric
values as property. So for feature and feature, valid equals true, well, valid Val equals
input feature colon. So this just means what we're going to do is for each feature, we're
going to wait to get some valid response. Once we get some valid response, what we're
going to do is add that to our dictionary. So we're gonna say predict feature. So whatever
that feature was, so sepal, length, sepal, width, petal length, or pept. petal width
is equal to a list that has in this instance, whatever that value was. Now, the reason we
need to do this is because again, the predict method from TensorFlow works on predicting
for multiple things, not just one value. So even if we only have one value, we want to
predict for it, we need to put it inside of a list because it's expecting the fact that
we will probably have more than one value in which we would have multiple values in
the list, right, each representing a different row or a new flower to make a prediction for,
okay, now we say predictions equals classifier dot predict. And then in this case, we have
input function, lambda input function predict, which is this input function up here. And
then we say for prediction dictionaries, because remember, every prediction comes back as a
dictionary in predictions, we'll say the class ID is equal to whatever the class IDs of the
prediction dictionary at zero. And these are simply what? I don't know exactly how to explain
this. We'll look at in a second, I'll go through that. And then we have the probability is
equal to the prediction dictionary probabilities of class ID. Okay? Then we're going to say
print prediction is we're going to do this weird format thing. I just stole this from
TensorFlow. And it's going to be the species at the class ID and then 100 times probability,
which will give us actual integer value. We're gonna digest this but let's run this right
now and have a look. So please type numeric values as prompted. sepal length, let's type
like 2.4 sub the width 2.6 petal width, let's just say that's like 6.5. And yeah, petal
width like 6.3. Okay, so then it calls this and it says prediction is virginica. I guess
that's the the class we're going with. And it says that's an 83 or 86.3% chance that
that is the prediction. So yeah, that is how that works. So that's what this does. I wanted
to give a little script I wrote most of this, I mean, I stole some of this from TensorFlow,
but just to show you how you actually predict on one value, so let's look at these prediction
dictionary because I just want to show you what one of them actually is. So I'm going
to say print pred underscore dict, and then this will allow me to actually walk through
what class IDs our probabilities are and how I've kind of done this. So let's run this
up the length scales just go like 1.4 2.3. I don't know what these values are going to
end up being. And we get prediction is same one with 77.1%, which makes sense because
these values are similar kind of in difference to what I did before. Okay, so this is the
dictionary. So let's look for what we were looking for. So probabilities, notice we get
three probabilities, one for each of the different classes. So we can actually say what you know
the percentages for every single one of the predictions, then what we have is class IDs.
Now, class IDs, what this does is tell us what class ID it predicts is actually the
flower right? So here, it says two, which means that this probability is 77%. That's
at index two in this array, right? So that's why this value is two. So it's saying that
that class is two, it thinks it's class two, like that's whatever was encoded in our system
is two. And that's how that works. So that's how I know which one to print out is because
this tells me it's class two. And I know for making this list all the way back up here
if I could get rid of this output. Whereas when I say species, that number two is virginica,
or I guess that's how you say. So that is what the classification is. That's what the
prediction is. So that's how I do that. And that's how that works. Okay, so I think that
is pretty much it for actually classification. So it's pretty basic, I'm
going to go and see if there's anything else that I did for classification in here. Okay,
so here, I just put some examples. So here's some example input expected classes. So you
guys could try to do these if you want. So for example, on this one, sepal, length sepal
width. So for 5.1 3.3 1.7 and 0.5, the output should be setosa. For 5.9 3.0 4.2 1.5, it
should be this one. And then obviously, this for this, just so you guys can mess with them
if you want. But that's pretty much it for classification, and now on to clustering.
Okay, so now we're moving on to clustering. Now, clustering is the first unsupervised
learning algorithm that we're going to see in this series. And it's very powerful. Now,
clustering only works for a very specific set of problems. And you use clustering when
you have a bunch of input information or features, but you don't have any labels or open information. Essentially, what clustering does, is finds
clusters of like data points, and tells you the location of those clusters. So you give
a bunch of training data, you can pick how many clusters you want find. So maybe we're
going to be classifying digits, right, handwritten digits, using k means clustering. In that
instance, we would have 10 different clusters for the digits zero through nine, and you
pass all this information. And the algorithm actually finds those clusters in the data
set for you, we're gonna walk through an example it'll make sense. But I just want to quickly
explain the basic algorithm behind k means is essentially the set of steps because I'm
going to walk you through them and with a visual example. So we're going to start by
randomly picking k points to place a K centroids. Now a centroid stands for where our current
cluster is kind of defined. And we'll see in a second, the next step is we're going
to assign all of the data points to the centroids by distance. So actually, now that I'm talking
about this, I think it just makes more sense to get right into the example because if I
keep talking about this, you guys are probably just getting be confused, although I might
come back to this just to reference those points. Okay, so let's create a little graph like this in
two dimensions for our basic example. And let's make some data points here. So I'm just
gonna make them all red. And you're gonna notice that I'm gonna make this kind of easier
for ourselves by putting them in like their own unique little groups, right? So actually,
we'll add one up here, then we can add some down here, and down here. Now the algorithm
starts for K means clustering. And you'll understand how this works as we continue by
randomly picking k centroids. I'm going to denote a centroid by a little filled in triangle
like this. And essentially what these are, is where these different clusters currently
exist. So we start by randomly picking K, which is what we've defined. So let me in
this instance, we're going to say k equals 3k, centroid, wherever. So maybe we put one,
you know, somewhere like here, you know what I might not bother filling these in, because
we're going to take a while, maybe we put one here. Maybe we end up putting one over
here. Now, I've kind of put them close to where the clusters are. But these are going
to be completely random. Now what happens next is each group, or each data point, is
assigned to a cluster by distance. So essentially, what we do is for every single data point
that we have, we find what's known as the Euclidean distance, or it actually could be
a different distance you'd use like Manhattan distance if you guys know what that is. To
all of the centroids. So let's say we're looking at this data point here, what we do is find
the distance to all of these different centroids. And we assign this data point to the closest
centroid. So the closest one by distance, now in this instance is looking like it's
going to be a bit of a tie between this centroid and this centroid, but I'm going to give it
to the one on the left. So what we do is we're going to say this is now part of this central.
So I'm calling this like, let's just say this is centered one, this is century two, and
this is centroid three, than this now is going to be a part of centroid one because it's
closest to centroid one. And we can go through and we do this for every single data point.
So obviously, we know all of these are going to be our ones, right. And we know these are
going to be our two, so two to two. And then these are obviously going to be our three. Now, I'm actually just going to add a few
other data points, because I want to make this a little bit more sophisticated, almost,
if that makes any sense. So add those data points here, we've an add one here. And that
will give these labels. So these ones are close. So I'm going to say this one to one,
I'm going to say this one's two, I know it's not closest to it. But just because I want
to do that for now. We'll say two for that. And we'll say three here. Okay, so now that
we've done that we've labeled all these points, what we do is we now move these centroids
that we've defined into the middle of all of their data points. So what I do is I essentially
find it's called center of mass, the center of mass between all of the data points that
are labeled the same. So in this case, these will be all the ones that are labeled the
same. And I take this centroid, which I'm going to have to erase, get rid of it here,
and I put it right in the middle. So let's go back to blue. And let's say the middle
of these data points ends up being somewhere around here. So we put it in here, and this
is what we call center mass. And this again, would be centroid two. So let's just erase
this. And there we go. Now we do the same thing with the other centroid. So let's remove
these ones, these ones. So for three, I'm saying it's probably going to be somewhere
in here. And then for one, our center mass is probably going to be located somewhere
about here. Now what I do is I repeat the process that I just did, and I reassign all
the points now to the closest center. So all these points are labeled one, two, all that,
you know, we can kind of remove their labels, and this is just going to be great me trying
to erase the labels, I shouldn't have wrote them on top. But essentially, what we do is
we're just going to be like reassigning them. So I'm going to say okay, so this is two,
and we just do the same thing as before, find the closest distance. So we'll say you know,
these can stay in the same cluster, maybe this one actually here gets changed to one
now, because it's closest to centroid one. And we just reassign all these points. And
maybe you know this one. Now, if it was to before, let's say like this one's one, and
we just reassign them. Now we repeat this process of finding the closest, or assigning
all the points that are closest centroid, moving the centroid into the center of mass.
And we keep doing this until eventually we reach a point where none of these points are
changing which centroid they're part of. So eventually, we reach a point where I'm just
gonna erase this and draw like a new graph, because it'll be a little bit cleaner. But
what we have is, you know, like a bunch of data points. So we have some over here, some
over here, maybe we'll just put some here. And maybe we'll do like a k equals, for example,
for this one, and we have all these centroids. And I'll just draw these centroids with blue,
again, that are directly in the middle of all of their data points. They're like as
in the middle as they can get, none of our data points have moved. And we call this now
our cluster. So now we have these clusters, we have these centroids, right, we know where
they are. And what we do is when we have a new data point that we want to make a prediction
for or figure out what cluster, it's a part of, what we do is we will plot that data points.
So let's say it's this new data point here, we find the distance to all of the clusters
that exist, and then we assign it to the closest one. So obviously, it would be assigned to
that one. And we can do this for any data point, right. So even if I put a data point
all the way over here, well, it's closest cluster is this so it gets assigned to this
cluster. And my output will be whatever this label of this cluster is. And that's essentially
how this works. So you're just clustering data points, figuring out which ones are similar.
And there's a pretty basic algorithm, I mean, you draw your little triangle, you find distance
from every point in the triangle, or to all of the triangles actually. And then what you
do is just simply assign those values to that centroid, you move that centroid to the center
of mass, and you repeat this process constantly, until eventually you get to a point where
none of your data points you're moving. That means you found the best clusters that you
can, essentially. Now the only thing with this is you do need to know how many clusters
you want for K means clustering, because k is a variable that you need to define, although
there is some algorithms that can actually determine the best amount of clusters for
a specific data set. But that's a little bit beyond what we're going to be focused on focusing
on right now. So that is pretty much clustering. There's not really much more to talk about
it, especially because we can't really code anything for it now. So we're going to move
on to hidden Markov models. Now hidden Markov models are way different than what we've seen
so far. We've been using kind of algorithms that rely on data. So like k means clustering,
we gave a lot of data and we don't clustered all those data points found those centroids.
Use those centroids to find where new data points should be. Same thing with linear regression
and classification. Whereas hidden Markov models, we actually deal with probability
distributions. Now, example we're going to go into here and kind of I have to do a lot
of examples for this, because it's a very abstract concept is a basic weather model.
So what we actually want to do is predict the weather on any given day, given the probability
of different events occurring. So let's say we know, you know, maybe in like a simulated
environment or something like that this might be an application, that we have some specific things about our
environment, like we know, if it's sunny, there's an 80% chance that the next day, it's
going to be sunny again, and a 20% chance that it's going to rain. Maybe we know some
information about sunny days and about cold days. And we also know some information about
the average temperature on those days. Using this information, we can create a hidden Markov
model that will allow us to make a prediction for the weather in future days, given kind
of that probability that we've discovered. Now, you might be like, Well, how do we know
this? Like, how do I know this probability, a lot of the times you actually do know the
probability of certain events occurring, or certain things happening, which makes these
models really good. But there's some times where what you actually do is you have a huge
data set, and you calculate the probability of things occurring based on that data set.
So we're not going to do that part, because that's just kind of going a little bit too
far. And the whole point of this is just to introduce us to some different models. But
in this example, what we will do is use some predefined probability distributions. So let
me just read out the exact definition of a hidden Markov model and start going more in
depth. So the hidden Markov model is a finite set of states, each of which is associated
with a generally multi dimensional probability distribution. Transitions among the states
are governed by a set of probabilities called transition probabilities. So in a hidden Markov
model, we have a bunch of states. Now in the example I was talking about with this weather
model, the states we would have is hot day, and cold day. Now, these are what we call
hidden because never do we actually access or look at these states, while we interact
with the model, in fact, when we look at is something called observations. Now, at each
state, we have an observation, I'll give you an example of an observation. If it is hot
outside team has an 80% chance of being happy. If it is cold outside, Tim has a 20% chance
of being happy. That is an observation. So at that state, we can observe the probability
of something happening during that state is x, right? Or is y or whatever it is. So we
don't actually care about the States. In particular, we care about the observations we get from
that state. Now in our example, what we're actually going to do is we're going to look
at the weather as an observation for the state. So for example, on a sunny day, the weather
has, you know, the probability of being between five and 15 degrees Celsius with an average
temperature of 11 degrees. That's like that's a probability we can use. And I know this
is slightly abstract, but I just want to talk about the data we're going to work with here,
I'm going to draw out a little example go through it, and we actually get into the code.
So let's start by discussing the type of data we're going to use. So typically, in previous
ones, right, we use like hundreds, if not, like 1000s of entries, or rows or data points
for our models to train for this. We don't need any of that. In fact, all we need is
just constant values for probability and our What is it transition distributions and observation
distributions. Now, what I'm going to do is go in here and talk about states observations
and transitions. So we have a certain amount of states. Now we will define how many states
we have, we don't really care what that state is. So we could have states for example, like
warm cooled, high, low, red, green, blue, you can have as many states as we want, we
could have one state to be honest, although that would be kind of strange to have that.
And these are called hidden because we don't directly observe. Now observations. So each
state has a particular outcome or observation associated with it based on a probability
distribution. So it could be the fact that during a hot day, it is 100% true that Tim
is happy. Although in a hot day, we could observe that 80% of the time, Tim is happy,
and 20% of the time, he is sad, right? Those are observations we make about each state,
and each state will have their different observations and different probabilities of those observations
occurring. So if we were just going to have like an outcome for the state, that means
it's always the same, there's no probability that something happens. And in that case,
that's just called an outcome because the probability of the event occurring will be
100%. Okay, then we have transitions. So each state will have a probability to find the
likelihood of transitioning to a different state. So for example, if we have a hot day,
there'll be a percentage chance that the next day will be a cold day and if we have a cold
day, there'll be a percentage chance of the next day is either a hot day or a cold day.
So we're going to go through like the exact what we have for our specific model below.
Just understand there's a probability that we could transition into a different state.
And from each state, we can transition into every other state or a defined set of states
given a certain probability. So I know it's a mouthful, I know it's a lot. But let's go
into a basic drawing example. Because I just want to illustrate like graphically a little
bit kind of how this works. In case these are ideas are a little bit too abstract for
any of you. Okay, I'm just pulling out the drawing tablet, just one second here, and
let's do this basic weather model. So what I'm going to do is just simply draw two
states, actually, let's do it with some colors, because why not? So we're gonna use yellow.
And this is going to be our hot day, okay, this is going to be our Sun. And then I'm
just going to make a cloud, we'll just do like a gray cloud, this will be my cloud.
And we'll just say it's gonna be raining over here. Okay, so these are my two states. Now,
in each state, there's a probability of transitioning to the other state. So for example, in a hot
day, we have a, let's say, 20% chance of transitioning to a cold day, and we have a 80% chance of
transitioning to another hot day, like the next day, right? Now, in a cold day, we have,
let's say, a 30% chance of transitioning to a hot day. And we have in this case, what
is that going to be a 70% chance of transitioning to another cold day. Now, on each of these
days, we have a list of observations. So these are what we call states, right? So this could
be s one. And this could be s two, it doesn't really matter. Like if we named them or anything,
we just we have two states. That's what we know. We know the transition probability.
That's what we've just defined. Now we want the observation probability or distribution
for that. So essentially, on a hot day, our observation is going to be that the temperature
could be between 15 and 25 degrees Celsius with an average temperature of let's say,
20. So we could say observation, right, say, observation. And we'll say that the mean,
so the average temperature is going to be 20. And then the distribution for that will
be like the minimum value is going to be 15. And the max is going to be 25. So this is
what we call actually like a standard deviation, I'm not really going to explain exactly what
standard deviation is, although you can kind of think of it as something like this. So
essentially, there's a mean, which is the middle point, the most common event that could
occur, and at different levels of standard deviation, which is going into statistics,
which I don't really want to mention that much, because I'm definitely not an expert,
we have a probability of hitting different temperatures as we move to the left and right
of this value. So on this curve, somewhere, we have 15. And on this curve to the right,
somewhere, we have 25. Now we're just defining the fact that this is where we're going to
kind of end our curve. So we're going to say that, like the probability is in between these
numbers is gonna be in between 15 and 25, with an average of 20. And then our model
will kind of figure out some things to do with that. That's as far as I really want
to go in standard deviation. And I'm sure that's like a really horrible explanation.
But that's kind of the best I'm going to give you as for right now. Okay, so that's our
observation here, our observation over here is going to be similar. So we're going to
say mean, on a cold day temperature is going to be five degrees, we'll say the minimum
temperature, maybe it's going to be something like negative five, and the max could be something
like 15, or like, yeah, I guess a 15. So we'll have some distribution. That's just what we
want to understand, right. And this is kind of a strange distribution, because we're dealing
with what is it standard deviation, although we can just deal with like straight percentage
observations. So for example, it was a 20% chance that Tim is happy, or there's an 80%
chance that he is, like, those are probabilities that we can have as our observation probabilities
in the model. Okay, so there's a lot of lingo. There's a lot going on, we're gonna get into
like a concrete example now. So hopefully, this should make more sense. But again, just
understand states transitions observations, we don't actually ever look at the states,
we just have to know how many we have in the transition probability and observation probability
in each of them. Okay. So what I want to say now, though, is what do we even do with this
model? So once I make this right, once I make this hidden Markov model, what's the point
of it? Well, the point of it is to predict future events based on past events. So we
know that probability distribution, and I want to predict the weather for the next week.
Well, I can use that model to do that. Because I can say, Well, if the current day today
is warm, then what is the likelihood that the next day tomorrow is going to be cold,
right? And that's what we're kind of doing with this model. We're making predictions
for the future based on probability of past events occurring. Okay. So important stuff.
So let's just run this already loaded. Import TensorFlow. And notice that here I've imported
TensorFlow probability is TF P. This is because this is a separate module from TensorFlow
that deals with probability. Now we also need TensorFlow to before this hidden Markov model,
we're going to use the TensorFlow process. Build the module, not a huge deal. Okay, so whether model. So this is just going
to define what our model actually is. So the different parts of it. So this is taken directly
from the documentation of TensorFlow, you guys can see, you know, where I have all this
information from, like I've sourced all of it. But essentially, what the model we're
going to try to create is that cold days are encoded by zero and hot days encoded by one,
the first day in our sequence has an 80% chance of being cold. So whatever day we're starting
out has an 80% chance of being cold, which would mean 20% chance of being one, a cold
day has a 30% chance of being followed by hot day, and a hot day is a 20% chance of
being followed by a cold day, which would mean you know, 70% cold, the cold and 80%,
hot hot. On each day, the temperature is normally distributed with mean and standard deviation,
zero and five on a cold day and mean and standard deviation 15 and 10. On a hot day. Now, what
that means standard deviation is essentially, I mean, we can read this thing here is that
on a hot day, the average temperature is 15. That's mean, and ranges from five to 25. Because
the standard deviation is 10 of that, which just means 10 on each side, kind of the min
max value. Again, I'm not in statistics, so please don't quote me on any definitions of
standard deviation. I just tried to explain it enough so that you guys can understand
what we're doing. Okay, so what we're going to do to model this, and I'm just
kind of going through this fairly quickly, because it's pretty easy to really do this
is I'm going to load the TensorFlow probability distributions kind of module and just save
that as TF D. And I'm just going to do that. So I don't need to write TF p dot distributions
dot all of this, I can just kind of shortcut it. You'll notice I'm referencing TFT here,
which just stands for TFP distributions, and TFP is TensorFlow probability. Okay, so my
initial distribution is TensorFlow probability distributions, categorical. And this is probability
of 80%. And point percent. Now this refers to point two. So let's look at point two,
the first day in our sequence has an 80% chance of being cold. So we're saying that, that's
essentially what this is, the initial distribution of being cold is 80%. And then 20%, after
categorical, it's just a way that we can do this distribution. Okay. So transition distribution,
was it TensorFlow probability categorical, the probability is 70%, and 30%, and 20% 80%.
Now notice that since we have two states, we've defined two probabilities. Notice since
we have two states, we have defined two probabilities, the probability of landing on each of these
states at the very beginning of our sequence, this is the transition probability referred
to points three and four above. So this is what we have here, so called this 30%, chance,
20% chance for a hot day. And that's what we've defined. So we say this is going to
be cold, then state one, we have 70% chance of being cold there, again, we have 30% chance
of going hot day, and then you know, reverse here. Okay, so observation distribution. Now, this one
is a little bit different, but essentially, we do TFD dot normal. Now, I don't know, I'm
not gonna explain exactly what all this is. But when you're doing standard deviation,
you're going to do it like this, where you're going to say loke, which stands for your average
or your mean, right. So that was their average temperature is going to be zero on a hot day.
15. On a cold day, the standard deviation on the cold day is five, which means we range
from five, or negative five to five degrees. And on a hot day, it's 10. So that is going
to be we go range from five to 25 degrees, and our average temperature is 15. Now the
reason we've added dot here is because these just need to be float values. So rather than
inserting integers here, and having potentially type errors later on, we just have flipped.
Okay, so the loake argument represents the mean, and the scales, the standard deviation,
yeah, exactly what we just defined there. Alright, so let's run this, I think we actually
already did. And now we can create our model. So to create the models pretty easy. I mean,
all we do is say model equals TensorFlow distribution dot hidden Markov model, give it the initial
distribution, which is equal to initial distribution, transition, distribution, observation, distribution
and steps. Now, what is steps will steps is how many days we want to predict for. So the
number of steps is how many times we're going to step through this probability cycle, and
run the model essentially. Now remember, what we want to do is we want to predict the average
temperature on each day, right? Like that's what the goal of our example is, is to predict
the average temperature. So given this information, using these observations and using these transitions,
what we'll do is predict that so I'm going to run this model. What is the issue here?
tensor is on hashtag tensor is. Okay, give me one sec. Have a look here, though, I haven't
had this issue before. Okay. So after a painful amount of searching on Stack Overflow, and
Google and actually just reading through more documentation on TensorFlow, I have determined
the issue. So remember, the error was we're getting on actually this line here. I think
I can see what the output is on this. Okay. Well, this is a different error, but it was
there was an error at this line. Essentially, what was happening is we have a mismatch between
the two versions here. So the most recent version of TensorFlow is not compatible with
the older version of TensorFlow probability, at least in the sense that the things that
we're trying to do with it. So I just needed to make sure that I installed the most recent
version of TensorFlow probability. So what you need to do if this is in your notebook,
and this should actually work fine for you guys, because this will be updated by the
time you get there. But in case you run into the issue, I'll you know, deal with it. But
essentially, we're going to do select version two point x of TensorFlow, you're going to
run this install commands, you're gonna install TensorFlow probability, just run this command,
then after you run this command, you're going to read need to restart your runtimes, go
to runtime and then restart runtime. And then you can just continue on with the script,
select TensorFlow two point x, again, do your imports. And then you know, we'll test if
it's actually going to work for us here, run our distributions, create the model without
any issues, this time notice no red text, and then run this final line, which will give
you the output. Now, this is what I wanted to talk about here that we didn't quite get
to because we were having some bugs. But this is how we can actually kind of run our model
and see the output. So what you can do is do model dot mean. So you say mean equals
model dot mean. And what this is going to do is essentially just calculate, the probability
is going to essentially take that from the model. Now, when we have model dot mean, this
is what we call, you know, a partially defined tensor. So remember, our tensors were like
partially defined computations. Well, that's what model dot mean, actually is. That's what
this method is. So if we want to get the value of that we actually need to do is create a
new session in TensorFlow, run this part of the graph, which we're going to get by doing
mean got NumPy. And then we can print that out. So I know this might seem a little bit
confusing, but essentially, to run a session in the new version of TensorFlow, so two point
x, or 2.1, or whatever it is, you're going to type with tF compat, v1 dot session, as
sash. And then I mean, this is really matter what you have here, but whatever you want,
and then what I'm doing is just printing mean NumPy. So to actually get the value from this
here, this variable i called NumPy. And then what it does is print out this array, that
gives me the expected temperatures on each day. So we have, you know, three, six, essentially
7.5 8.25. And you can see these are the temperatures based on the fact that we start with an initial
probability of starting on a cold day. So we kind of get that here, right, we're starting
at three degrees. That's what it's determined we're going to start at. And then we have
all of these other temperatures as predicting for the next days. Now notice if we recreate
this model, so just rerun the distributions, rerun them and go model dot mean, again, this
stays the same, right? Well, because our probabilities are the same, this model is going to do the
calculation the exact same, there's not really any training that goes into this. So we get,
you know, very similar, if not the exact same values, I can't remember if these are identical.
But that's what it looks like to me, I mean, we can run this again, see, we get the same
one. And we'll create the model one more time. And let me just check these values here to
make sure I'm not lying to you guys. Yes, they're the exact same. Okay, so let's start
messing with a few probabilities and see what we can do to this temperature and see what
changes we can cause. So if I do 0.5, here, and I do 0.5, for the categorical probability,
remember, this refers to points three and four above. So that's a cold day has a 30%
chance of being followed by a hot day, and then a hot day is 20% chance of being followed
by a cold day. So what I've just done now is change the probability to be 50%. So that
a cold day now has a 50% chance of being followed by a hot day and a 50% chance of being followed
by cold day. And let's recreate this model. let's rerun this, and let's see if we get
a difference. But we do notice this, the temperature now has been is going a little bit higher. Now notice that we get the same starting temperature,
because that's just the average based on this probability that we have here. But if we wanted
to potentially start, you know, hotter, we could reverse these numbers, we go 0.2 0.8.
let's rerun all of this. And now look at this, what our temperatures are, we start at 12.
And then we actually drop our temperature down to 10. So that's how this hidden Markov
model works. And this is nice, because you can just tweak the probabilities, this happens
pretty well instantly. And we can have a look at our output very nicely. So obviously, this
is representing the temperature on our like the first day, this would be the second day,
third day, fourth day, fifth, six, seven. And obviously like the more days you go on,
the least accurate This is probably going to be because it's just runs off probability.
And if you're going to try to predict, you know, a year in advance, and you're using
the weather that you have from, I guess the previous year, you're probably not going to
get a very accurate prediction. But anyways, these are hidden Markov models. They're not
like extremely useful. There's some situations where you might want to use something like
this. So that's why we're implementing them in this course, and showing you how they work.
It's also another feature of TensorFlow that a lot of people don't talk about receive.
And, you know, personally, I hadn't really heard of hidden Markov models until I started
developing this course. So one of these that has been eight for this module. Now, I hope
that this kind of gave you guys a little bit of an idea of how we can actually implement
some of these machine learning algorithms, a little bit of idea of how to work with data,
how we can feed that to a model, the importance between testing and training data. And then
obviously, linear regression is when we focused a lot on so I hope you guys are very comfortable
with that algorithm. Then what was the last, the second one we did, I kind of go up to
remember exactly the sequence we had here. So classification that was important as well.
So I hope you guys really understood that clustering. We didn't go too far into that.
But again, this is an interesting algorithm. And if you need to do some kind of clustering,
you now know of one algorithm to do that, called k means clustering, and you understand
how that works. And now you know, hidden Markov models. So in the next module, we're going
to start covering neural networks, we now have the knowledge, we need to really dive
in there and start doing some cool stuff. And then in the future modules, we're going
to do deep computer vision, I believe we're gonna do chatbots with recurrent neural networks,
and then some form of reinforcement learning at the end. So with that being said, let's
go to the next module. love everybody, and welcome to module four. Now, in this module
of this course, we're gonna be talking about neural networks, discussing how neural networks
work, a little bit of the math behind them talking about gradient descent, and back propagation,
and how information actually flows to the neural network. And then getting into an example
where we use a neural network to classify articles of clothing. So I know that was a
lot, but that's what we're gonna be covering here. Now, neural networks are complex, there's
kind of a lot of components that go into them. And I'm going to apologize right now, because
it's very difficult to explain it all at once, what I'm going to be trying to do is kind
of piece things together and explain them in blocks. And then at the end, you know,
kind of combine everything together. Now, I will say, in case any of you didn't watch
the beginning of this course, I do have very horrible handwriting. But this is the easiest
way to explain things to you guys. So bear with me, you know, I'm sure you'll be able
to understand what I'm saying. But it might just be painful to read some of it. Alright,
so let's get into right away and start discussing what neural networks are and how they work.
Well, the whole point of a neural network is to provide, you know, classification or
predictions for us. So we have some input information, we feed it to the neural network,
and then we want it to give us some output. So if we think of the neural network as this
black box, we have all this input, right, we give all this data to the neural network,
maybe we're talking about an image, maybe we're talking about just some random data
points, maybe we're talking about a data set, then we get some meaningful output. This is
what we're looking at. So if we're just looking at a neural network from kind of the outside,
we think of it as this magical black box, we give some input, it gives us some output.
And I mean, we could call this black box, just some function, right? Where it's a function
of the input, maps it to some output. And that's exactly what a neural network does,
it takes input and maps that input to some output, just like any other function, right,
just like if you had a straight line like this, this is a function, you know, this is
your line, you know, whatever it is, you get to say y equals like 4x, maybe that's your
line, you give some input x, and it gives you some value y, this is a mapping of your
input to your output. Alright, so now that we have that down, what is a neural network
made up of? Well, a neural network is made up of layers. And remember, we talked about
the layered representation of data when we talk about neural networks. So I'm going to
draw a very basic neural network, we're going to start with the input layer. The input layer
is always the first layer in our neural network. And it is what is going to accept our raw
data. Now what I mean by raw data is whatever data we like want to give to network, whatever
we want to classify whatever our input information is, that's what this layer is going to receive
in the neural network. So we can say, you know, these arrows represent our input, and
they come to our first input layer. So this means for example, if you had an image
and this image, and I'll just draw like one like this, let's say this is our image, and
it has all these different pixels, right? All these different pixels in the image, and
you want to make a classification on this image. Well, maybe it has a width and a height
and a classic width and height example is 28 by 28. If you had 28 by 28 pixels, and
you want to make a classification on this image, how many input neurons you think you
would need in your neural network to do this? Well, this is kind of, you know, a tough question,
if you don't know a lot about neural networks. If you're predicting for the image, if you're
going to be looking at the entire image to make a prediction, you're going to need every
single one of those pixels, which is 28 times 28 pixels, which I believe is something like
784. I could be wrong on that number, but I believe that's what it is. So you would
need 784 input input neurons, that's totally fine. That might seem like a big number. But
we deal with massive numbers when it comes to computers. So this really isn't that many.
But that's an example of you know how you would use a neural network input layer to
represent an image, you would have 784 input neurons, and you would pass one pixel to every
single one of those neurons. Now, if we're doing an example, where maybe we just have
one piece of input information, maybe it's literally just one number, well, then all
we need is one input neuron. If we have an example where we have four pieces of information,
we would need four input neurons. Right now this can get a little bit more complicated,
but that's the basis that I want you to understand is that you know the pieces of input, you're
going to have for Regardless of what they are, you need one input neuron for each piece
of that information, unless you're going to be reshaping or putting that information at
different forms. Okay, so let's just actually skip ahead and go to now our output layer.
So this is going to be our output. Now what is our output layer? Well, our output layer
is going to have as many neurons and again, the neurons are just representing like a node
in the layer as output pieces that we want. Now let's say we're doing a classification
for images, right. And maybe there's two classes that we could represent. Well, there's a few
different ways we could design our output layer, what we could do is say, okay, we're
going to use one output neuron, this output neuron is going to give us some value, we
want this value to be between zero, and one. And we'll say that's inclusive. Now, what
we can do now if we're predicting two classes, say, okay, so if my open neuron is going to
give me some value, if that value is closer to zero, then that's going to be close to
zero. If this value is closer to one, it's going to be class one, right? And that would
mean, we have our training data, right? And we talked about training and testing data,
we give our input, and our output would need to be the value zero, or one because it's
either the correct class which is zero, right? Or the correct class, which is one. So like
our, what am I saying our labels for our training data set would be zero and one, and then this
value on our output neuron would be guaranteed to be between zero and one, based on something
that I'm going to talk about a little bit later. That's one way to approach it, right,
we have a single value, we look at that value. And based on what that value is, we can determine,
you know what class we predicted not work sometimes. But in other instances, when we're
doing classification, what makes more sense is to have as many output neurons as classes
you're looking to predict for. So let's say we're gonna have, you know, like five classes
that were predicting for maybe these three pieces of input information are enough to
make that prediction, well, we would actually have five output neurons, and each of these
neurons would have a value between zero and one. And the combination, so the sum of every
single one of these values would be equal to one. Now, can you think of what this means
if every single one of these neurons has a value between zero and one, and their sum
is one? What does this look like to you? Well, to me, this looks like a probability distribution.
And essentially, what's going to happen is we're gonna make predictions for how strongly
we think each our input information is each class. So if we think that it's like class one, maybe we'll just
label these like this, then what we would do is say, Okay, this is going to be 0.9,
representing 90%. Maybe this is like 0.001, maybe this is 0.05 0.003, right, you get the
point, it's going to add up to one, and this is a probability distribution for our output
layer. So that's a way to do it as well. And then obviously, if we're doing some kind of
regression task, we can just have one neuron and that will just predict some value. And
we'll define you know what we want that value to be. Okay. So that's my example, for my
output. Now, let's erase this. And let's actually just go back to one output neuron, because
that's what I want to use for this example. Now, we have something in between these layers,
because obviously, you know, we can't just go from input to output with nothing else.
What we have here is called a hidden layer. Now in neural networks, we can have many different
hidden layers, we can add, you know, hidden layers that are connecting to other hidden
layers, and like we could have hundreds 1000s if we wanted to, for this basic example, we'll
use one. And I'll write this as hidden. So now we have our three layers. Now why is this
called hidden? reason this is called hidden is because we don't observe it. When we're
using the neural network, we pass information to the input layer, we get information from
the output layer, we don't know what happens in this hidden layer, or in these hidden layers.
Now, how are these layers connected to each other? How do we get from this input layer
to the hidden layer to the output layer and get some meaningful hope? Well, every single
layer is connected to another layer with something called weights. Now we can have different
kind of architectures of connections, which means I could have something like this one
connects to this, this connects to this, this connects to this. And that could be like my
connection kind of architecture, right? We could have another one where this one goes
here. And you know, maybe this one goes here. And actually, after I've drawn this line,
now we get what we're gonna be talking about a lot, which is called a densely connected
neural network. Now, a densely connected neural network, or a densely connected layer, essentially
means that it's connected to every node from the previous layer. So in this case, you can
see, every single node in the input layer is connected to every single node in the output
layer, or in the hidden layer, my back. And these connections are what we call weights.
Now, these weights are actually what the neural network is going to change and optimize to
determine the mapping from our input to our output. Because again, remember, that's what
we're trying to do. We have some kind of function, we get some input, it gives us some output.
How do we get that input an output? Well, by modifying these weights, I was a little
bit more complex, but this is the starting. So these are the lines that I've drawn are
really just numbers. And every single one of these lines is some numeric value. Typically,
these numeric values are between zero and one, but they can be large they can be negative
really depends on what kind of network you're doing and how you've designed it. Now, let's
just write some random numbers, we'd have like 0.1, this could be like 0.7, you get
the point, right, we just have numbers for every single one of these lines. And these
are what we call the trainable parameters that our neural network will actually tweak
and change as we train to get the best possible result. So we have these connections. Now
our hidden layers connected to our output layer as well. This is again another densely
connected layer. Because every layer, or every neuron neuron from the previous layer is connected
to every neuron from the next layer, if you would like to determine how many connections
you have, what you can do is say there's three neurons here, there's two neurons here, three
times two equals six connections. That's how that works from layers. And then obviously,
you can just multiply all the neurons together as you go through and determine what that's
going to be. Okay, so that is how we connect these layers, we have these weights, so let's
just write a W on here. So we remember that those are weights. Now, we also have something
called biases. So let's add a bias here, I'm going to label this beat. Now biases are a
little bit different than these nodes, we have regular, there's only one bias, and a
bias exists in the previous layer to the layer that it affects. So in this case, what we
actually have is a bias that connects to each neuron in the next layer from this, right,
so it's still densely connected. But it's just a little bit different. Now notice that
this bias doesn't have an arrow beside it, because this doesn't take any input information.
This is another trainable parameter for the network. And this bias is just some constant
numeric value, that we're going to connect to the hidden layer, so we can do a few things
with it. Now these weights always have a value of one, we're going to talk about why they
have a value of one in a second. But just know that whenever a bias is connected to
another layer, or to another neuron, its weight is typically one. Okay, so we have that connected,
we have our bias. And that actually means we have a bias here as well. And this bias
connects to this, notice that our biases do not connect with each other. The reason for
this, again, is they're just some constant value. And they're just something we're kind
of adding into the network is another trainable parameter that we can use. Now let's talk
about how we actually pass information through the network and why we even use these weights
and biases of what they do. So let's say we have, I can't really think of a good examples,
we're just gonna do some arbitrary stuff. Let's say we have like a data points, right?
x, y, z. And all these data points have some map value, right? There's some value that
we're looking for for them, or there's some class we're trying to put them in, maybe we're
clustering them between, like, red dots and blue dots. So let's do that. Let's say an
XYZ is either a part of the red class, or the blue class, let's just do that. So what
we want this opener on to give us is red or blue. So what I'm going to do is say, since
you just one class, will get this output neuron in between the
range is your own one will say, Okay, if it's closer to zero, that's red, if it's closer
to one that's blue. And that's what we'll do for this network. And for this example,
now, our input neurons are going to obviously be x, y, and Zed. So let's pick some data point. And let's say we have you know,
the value to two. That's our data point. And we want to predict whether it's red or blue.
How do we pass it through? Well, what we need to do is determine how we can, you know, find
the value of this hidden layer note, we already know the value of this input node. But now
we need to go to the next layer using these connections and find what the value of these
nodes are. Well, the way we determine these values is I'm going to say and I've just said
n one, just to represent like this is a node like this is node one, maybe this one should
be node two, is equal to what we call a weighted sum of all of the previous nodes that are
connected to it, if that makes any sense to you guys. So a weighted sum is something like
this, I'm just gonna write the equation, I'll explain it, I'm gonna say n one is equal to
the sum of not say n equals zero, let's say I equals zero to n. In this case, we're going
to say w i times x i plus b. Now I know this equation looks really mathy and complicated,
it's really not what this symbol and this equation here means is taking the weighted
sum of all the neurons that are connected to this neuron. So in this case, we have neuron
x neuron y and neuron Zed connected to N one. So when we take the weighted sum, or we calculate
this, what this is really equal to is the weight at neuron x. So we say w x times the
value at neuron x, which in this case, is just equal to two right? plus whatever the
weight is at neuron y. So in this case, this is w y And then times two, and then you get
the point where we have w Zed, and I'm trying on the edge of my drawing tablet to write
this, times two. Now, obviously, these weights have some numeric value. Now, when we start
our neural network, these weights are just completely random. They don't make any sense.
They're just some random values that we can use. As the neural network gets better, these
weights are updated and changed to make more sense in our network. So right now, we'll
just leave them as w XWYW. Said, but no, these are some numeric values. So this returns to
a some value, right, some value, let's just call this value v. And that's what this is
equal to. So V, then what we do is we add the bias. Now remember, the bias was connected
with a weight of one, which means if we take the weighted sum of the bias, right, all we're
doing is adding whatever that biases value was. So if this bias value was 100, then what
we do is we add 100. Now I've just written the plus B to explicitly state the fact that
we're adding the bias, although it could really be considered as a part of the summation equation.
Because it's another connection to the neural net, let's just talk about what this symbol
means for anyone that's confused about that. Essentially, this stands for some AI stands
for an index, an N stands for what index we'll go up to now, m means how many neurons we
had in the previous layer. And then what we're doing here, saying wi xi, so we're gonna say
weight 0x, zero plus weight 1x, one plus weight 2x. Two is almost like a for loop, we're just
adding them all together. And then we add the beep. And I hope that makes enough sense.
So that we understand that. So that is our weighted sum, enter bias. So essentially,
what we do is we go through and calculate these values. So this gets some value, maybe
this values like 0.3, maybe this value seven, whatever it is, and we do the same thing now
at our output neuron. So we take the weighted sum of this value times its weight, and then
we take the weighted sum, so this value times its weight, plus the bias, this is given some
value here, and then we can look at that value and determine what the output of our neural
network is. So that is pretty much how that works in terms of the weighted sums, the weights
and the biases. Now let's talk about the kind of the training process and another thing
called an activation function. So I've lied to you a little bit, because I've said, I'm
just going to start erasing some stuff. So we have a little bit more room on here. So
I've lied to you. And I've said that this is completely how this works. What we're missing
one key feature that I want to talk about, which is called an activation function. Now
remember how we want this value to be in between zero and one right at our output layer? Well,
right now, we can't really guarantee that that's going to happen. I mean, especially
if we're starting with random weights and random biases in our neural network, we're
passing this information through, we could get to this point here, we could have like
700 as our value. That's kind of crazy to me, right? We have this huge value, how do
we look at 700 and determine whether this is red or whether this is blue? Well, we can
use something called an activation function. Now I'm gonna go back to my slides here, whatever
we want to call this, this notebook, just to talk about what an activation function
is. And you guys can see you can follow along, I have all the equations kind of written out
here as well. So let's go to activation function, which is right here. Okay. So these are some
examples of an activation function. And I just want you to look at what they do. So
this first one is called a rectified linear unit. Now notice that essentially, what this
activation function does is take any values that are less than zero and just make them zero. So any x values that are you know, in
the negative, it just makes their y zero. And then any values that are positive, it's
just equal to whatever their positive value is. So if it's 10, is 10. This allows us to
just pretty much eliminate any negative numbers, right? That's kind of what rectified linear
unit dots. Now 10 H or hyperbolic tangent? What does this do? It's actually squishes,
our values between negative one and one. So it takes whatever values we have. And the
more positive they are, the closer to one they are, the more negative they are the closer
to negative one they are. So can we see why this might be useful, right for a neural network.
And then last one is sigmoid, what this does is squish our values between zero and one.
A lot of people call it like the squishy fire function, because all it does is take any
extremely negative numbers and put them closer to zero and any extremely positive numbers
and put them close to one any values in between, you're going to get some number that's kind
of in between that based on the equation one over one plus e to the negative Zed, and this
is a data set, I guess is equal to that. Okay, so that's how that works. So those are some
activation functions. Now, I hope that's not too much math for you. But let's talk about
how we use them. Right. So essentially, what we do is at each of our neurons, we're going
to have an activation function that is applied to the output of that neuron. So we take this
this weighted sum plus the bias, and then we apply an activation function to it before
we send that value to the next neuron. So in this case, n one isn't actually just equal
to this, what n one is equal to is n one is equal to F, which stands for activation function
of this equation, right? So say I equals zero w i x i plus b. And that's what n ones value
is equal to when it comes to this output neuron. So each of these have an activation function
on them, and two has the same activation function as an one. And we can define what activation
function we want to apply at each neuron. Now, at our output neuron, the activation
function is very important, because we need to determine what we want our value to look
like, do we want it between negative one and one? Do we want it between zero and one? Or
do we want it to be some massively large number? Do we want it between zero and positive infinity?
What do we want, right? So what we do is we pick some activation function for our output
neuron. And based on what I said, where we want our values between zero and one, I'm
going to be picking the sigmoid function. So sigmoid recall squishes, our values between
zero and one. So what we'll do here is we'll take n one, right, so n one, times whatever
the weight is there. So weight zero, plus n two times weight one plus a bias and apply
sigmoid and then this will give us some value between zero and one, then we can look at
that value. And we can determine what the output of this network is. So that's great.
And that makes sense why we would use that on the output neuron, right? So we can squish
our value in between some kind of value. So we can actually look at it and determine you
know what to do with it, rather than just having these crazy, and I want to see if I
can make this eraser any bigger. That's much better. Okay. So there we go. Let's just erase
some of this. And now let's talk about why we use the activation function on like an
intermediate layer like this. Well, the whole point of an activation function is to introduce
complexity into our neural network. So essentially, we, you know, we just have these basic weights
and these biases. And this is kind of just, you know, like a complex function. At this
point, we have a bunch of weights, we have a bunch of biases. And those are the only
things that we're training and the only things that we're changing to make our network better
know what an activation function can do, is, for example, take a bunch of points that are
on the same like plane, right, so let's just say these are in some point, if we can apply
an activation function of these, where we introduce a higher dimensionality. So an activation
function like sigmoid that is like a higher dimension function, we can hopefully spread
these points out and move them up or down off the plane in hopes of extracting kind
of some different features. Now, it's hard to explain this until we get into the training
process of the neural network. But I'm hoping this is maybe giving you a little bit of idea,
if we can introduce a complex activation function into this kind of process, then it allows
us to make some more complex predictions, we can pick up on some different patterns.
If I can see that, you know, when sigmoid or rectified linear unit is applied to this
output, it moves my point opera moves it down or moves it in, like whatever direction and
n dimensional space, then I can determine specific patterns I couldn't determine in
the previous dimension. That's just like if we're looking at something in two dimensions.
If I can move that into three dimensions, I immediately see more detail. There's more
things that I can look at, right? And I will try to do a good example of why we might use
it like this. So let's say we have a square right? Like this, right? And I asked you,
I'm like, tell me some information about this square? Well, what you can tell me immediately
is you can tell me the width, you can tell me the height. And I guess you could tell
me the cover, right? You could tell me has one face, you
could tell me that's four vertexes in Tell me a fair amount about the square, you could
tell me its area. Now what happens as soon as I extend this square, and I make it into
a cube? Well, now you can immediately tell me a lot more information, you can tell me
you know the height, or I guess the depth, width, height, depth? Yeah, whatever you want
to call it there. You can tell me how many faces it has, you can tell me what color each
of the faces are, you can tell me how many vertexes you can tell me if this cube or the
square this rectangle is uniform or not. And you can pick up on a lot more information.
So that's kind of I mean, this is a very oversimplification of what this actually does. But this is kind
of the the concept, right is that if we are in two dimensions, if we can somehow move
our data points into a higher dimension by applying some function to them, then what
we can do is get more information and extract more information about the data points, which
will lead to better predictions. Okay. So now that we've talked about all this, as I'm
talking about how neural networks train, and I think you guys are ready for this, this
is a little bit more complicated. But again, it's not that crazy. Alright, so we talked
about these weights and biases. And these weights and biases are what our network will
come up with and determine to, you know, like, make the network better. So essentially, what
we're going to do now is talk about something called a loss function. So as our network
starts, right, the way that we train it just like we've trained other networks, or other
machine learning models is we give it some information, we give it what the expected
output is, and then we just see what the expected output or what the output was from the network
compared to the expected output and modify it like that. So essentially, what we start
with is we say okay, two to two, we say this class is red, which I forget what I labeled
that what as but let's just say you Like that was a zero. Okay, so this class is zero. So
I want this network to give me a zero for the point two to two. Now this network starts
with completely random weights and completely random biases. So chances are when we get to this output here, we're not
going to get zero, maybe we get some value after applying the sigmoid function that's
like 0.7. Well, this is pretty far away from red. But how far away is it? Well, this is
where we use something called loss function. Now, what a loss function does is calculate
how far away our output was from our expected output. So if our expected output is zero,
and our output was your point seven, the loss function is going to give us some value that
represents like how bad or how good this network was. Now, it tells us this network was really
bad, it gives us like a really high loss, then that tells us that we need to tweak the
weights and biases more and move the network in a different direction. We're starting to
get into gradient descent. But let's understand the loss function first. So it's going to
say if it was really bad, let's move it more, let's change the weights more drastically,
let's change the biases more drastically. Whereas if it was really good, it'll be like
okay, so that one was actually decent, you only need to tweak a little bit, and you only
need to move this, this and this. So that's good. And that's the point of this last month,
it just calculate some value, the higher the value, the worse our network was. A few examples
of loss function. Let's go down here cuz I think I had a few optimizer loss back here.
mean squared error mean absolute error and hinge loss. Now mean absolute error. You know
what, let's actually just look one up here. So mean, absolute error, and have a look at
what this is. So images, let's pick something, this is mean absolute error. This is the equation
for mean absolute error. Okay, so the summation of the absolute value of y i minus lambda
of x i over n. Now, this is kind of complicated, I'm not going to go into it too much I was
expecting, I was hoping I was gonna get like a better example for mean squared error. Okay,
so these are the three loss functions here. So mean squared error mean absolute error
hinge loss, obviously, there's a ton more that we could use, I'm not going to talk about
which how each of these work specifically, I mean, you can look them up pretty easily.
And also, so you know, these are also referenced as cost function. So cost or loss, you might
hear these change these terms kind of interchange, cost and loss essentially mean the same thing.
You want your network to cost the least you want your network to have the least amount
of loss. Okay, so now that we have talked about the loss function, we need to talk about
how we actually update these weights and biases. Now, let's, let's go back to here, because
I think I had some notes on it. This is what we call gradient descent. So essentially,
the parameters for our network our weights and biases, and by changing these weights
and biases, we will you know, either make the network better or make the network worse,
the loss function will determine if the network is getting better if it's getting worse. And
then we can determine how we're going to move the network to change that. So this is now
gradient descent where the math gets a little bit more complicated. So this is an example
of what your neural network function might look like. Now, as you have higher dimensional
math, you have, you know, a lot more dimensions, a lot more space to explore when it comes
to creating different parameters and creating different biases and activation functions
and all of that. So as we apply our activation functions, we're kind of spreading our network
into higher dimensions, which just makes things much more complicated. Now, essentially, what
we're trying to do with the neural network is optimize this loss function. This loss
function is telling us how good it is or how bad it is. So if we can get this loss function
as low as possible, then that means we should technically have the best neural network.
So this is our kind of loss functions, like mapping or whatever, what we're looking for
is something called a global minimum, we're looking for the minimum point where we get
the least possible loss from our neural network. So if we start where these red circles are,
right, I've just stole this image off Google Images, what we're trying to do is move downwards
into this globe, global minimum. And this is the process of called gradient descent.
So we calculate this loss and we use an algorithm called gradient descent, which tells us what
direction we need to move our function to determine our to get to this global minimum.
So it essentially looks where we are, it says this was the loss and it says, Okay, I'm going
to calculate what's called a gradient, which is literally just a steepness or a direction.
And we're going to move in that direction. And then the algorithm called brought backpropagation
will go backwards through the network and update the weights and biases so that we move
in that direction. I think this is as far as I really want to go because I know this
is getting more complicated already, then some of you guys probably can handle on that
I can probably explain, but that's kind of the basic principle. We'll go back to the
drawing board and we'll do a very quick recap before we get into some of the other stuff,
neural networks, input, output, hidden layers connected with weights, there's biases that
connect to each layer. These biases can be thought of as Y intercepts, they'll simply
move completely up or move completely down that entire, you know, activation function,
right, we're shifting things left or right, because this will allow us to get a better
prediction and have another parameter that we can train and add a little bit of complexity
to our neural network model. Now, the way that information is passed through these layers
is we take the weighted sum out of neurons of all of the connected neurons to it, we
then add this bias neuron, and we apply some activation function that's going to put this
you know, these values in between two set values. So for example, when we talk about
sigmoid that's going to squish our values between zero and one, when we talk about hyperbolic
tangent, that's going to squish our values between negative one and one. And when we
talk about rectified linear unit, that's gonna squish our values between zero and positive
infinity. So we apply those activation functions, and then we continue the process. So n, one
gets its value, and two gets its value. And then finally, we make our way to our output
layer, we might have passed through some other hidden layers before that. And then we do
the same thing. We take the weighted sum, we add the bias, we apply an activation function,
we look at the output, and we determine whether we know we are a class Why are we are classes
that are whether this is the value we're looking for. And, and that's how it works. Now we're
at the training process, right? So we're doing this now, that's kind of how this worked when
we were making a prediction. So when we're training, essentially, what happens is we
just make predictions, we compare those predictions to whatever these expected values should be
using this loss function, then we calculate what's called a gradient, a gradient is the
direction we need to move to minimize this last function. And this is where the Advanced
Math happens and why I'm kind of skimming over this aspect. And then we use an algorithm
called back propagation where we step backwards through the network, and update the weights
and biases according to the gradient that we calculated. Now, that is pretty much how
this works. So you know, the more info we have, likely, unless
we're overfitting, but, you know, if we have a lot of data, if we can keep feeding the
network, it starts off being really horrible, having no idea what's going on. And then as
more and more information comes in, it updates these weights and biases gets better and better
sees more examples. And after, you know, certain amount of epochs or certain amount of pieces
of information, our network is making better and better predictions and having a lower
and lower loss. And the way we will calculate how well our network is doing is by passing
it, you know, our validation data set where it can say, okay, so we got an 85% accuracy
on this data set, we're doing okay, you know, let's tweak this, let's tweak that, let's
do this. So the loss function, the lower this is, the better also known as the cost function.
And that is kind of neural networks. In a nutshell. Now, I know this wasn't really in
a nutshell, because it was 30 minutes long. But that is, you know, as much of an explanation
as I can really give you without going too far into the mathematics behind everything.
And again, remember the activation function is to move us up in dimensionality, the bias
is another layer of complexity and a trainable parameter for our network allows us to shift
this kind of activation function left, right up, down. And yeah, that is how that works.
Okay, so now we have an optimizer. This is kind of the last thing on how neural networks
work optimizer is literally just the algorithm that does the gradient descent and back propagation
for us. So I mean, you guys can read through some of them here, we'll be using probably
the atom optimizer for most of our examples, although there's you know, lots of different
ones that we can pick from now the this optimization technique, again, it's just a different algorithm.
There's some of them are faster, some of them are slower, some of them work a little bit
differently. And we're not really going to get into picking optimizers in this course,
because that's more of an advanced machine learning technique. Alright, so enough explaining
enough math enough drawings, enough talking now it is time to create our first official
neural network. Now, these are the imports we're going to need. So import TensorFlow
as TF TF from TensorFlow import Kerris again. So this does actually come with TensorFlow,
I forget if I said you need to install that before, my apologies and then import NumPy
as NP import matplotlib.pi plot as PLT. Alright, so I'm going to do actually similar
thing to what I did before, I'm kind of just gonna copy some of this code into another
notebook, just to make sure that we can look at everything
at the end, and then kind of step through the code step by step rather than all the
text going to happen here. Alright, so the data set, and the problem we are going to
consider for our first neural network is the fashion amnesty data set. Now the fashion
eminence data set contains 60,000 images for training and 10,000 images for validating
and testing 70,000 images, and it is essentially pixel data of clothing articles. So what we're
going to do to load in this data set from Kara's the section built into Kara as its
mentors Like a beginner, like testing training data set, we're gonna say fashion underscore
gymnast equals Kara's dot data sets dot fashion feminist. Now this will get the data set object.
And then we can load that object by doing fashion m this dot load data. Now by doing
this by having the topples, train images, train labels, test images, test labels equals
this, this will automatically split our data into the sets that we need. So we need the
training. And we need the testing. And again, we've talked about all that. So I'm going
to kind of skim through that. And now we have it in all of these kind of topples here. Alright,
so let's have a look at this data set to see what we were working with. Okay, so let's
run some of this code. Let's get this import going. If it doesn't take forever, okay, let's
get the data set. Yeah, this will take a second to download for you guys, if you don't already
have a cached. And then we'll go train images dot shape. And let's look at what one of the
images looks like. or sorry, what our data set looks like. So we have 60,000 images that
are 28 by 28. Now, what that means is we have 28 pixels, or 28 rows of 28 pixels, right?
So that's kind of what our, you know, information is. So we're going to have in total 784 pixels,
which I've denoted here. So let's have a look at one pixel. So to reference one pixel, this
is what I what I'm doing this comes in as a Actually, I'm not sure what type of data
frame this is, but let's have a look at it. So let's say type of train underscore images,
because I want to see that. So that's an NumPy array. So to reference the different indexes
in this is similar to pandas, we're just going to do zero comma 23, comma 23, which stands
for you know, image zero, 23, and then 23. And this gives us one pixel. So row 23, column
23, which will be that. Okay, so let's run this. And let's see, this value is 194. Okay,
so that's kind of interesting. That's what one pixel looks like. So let's look at what
multiple pixels look like. So we'll print, train underscore images. And okay, so we get
all these zeros. let's print train images, zero colon, that should work for us. And we're
getting all these zeros. Okay, so that's the border of the picture. That's okay, I can't
show you what I wanted to show you anyways, one pixel, and I wanted to have you guys guess
it is simply just represented by a number between zero and 255. Now what this stands
for is the grayscale value of this pixel. So we're dealing with grayscale images, although
we can deal with, you know, 3d 45 D images as well, or not five D images, but we can
deal with images that have like RGB values for so for example, we could have a number
between zero to 55, another number between zero and 255, and another number between zero
and 255 for every single pixel, right? whereas this one is just one simple static value.
Okay, so it sounds like your pixel values between zero and 205, zero being black and
255 being white. So essentially, you know, it's 255, that means that this is white. If
it's zero, that means that it is black. Alright, so let's have a look at the first 10 training
labels. So that was our training images. Now, what are the training labels? Okay, so we
have an array and we get values from zero to nine. Now, this is because we have 10 different
classes that we could have for our data set. So there's 10 different articles of clothing
that are represented, I don't know what all of them are, although they are right here.
So t shirt, trouser, pullover, dress coat, sandal shirt, sneaker bag, ankle boots. Okay,
so let's run this class names, just so that we have that saved. And now what I'm going
to do is just use matplotlib to show you what one of the images looks like. So in this case,
this is a shirt. I know this is printing out kind of weird, but I'm just showing the image.
I know it's like different colors. But that's because if we don't define that we're drawing
it grayscale, it's going to do this. But anyways, that is what we get for the shirt. So let's
go to another image. And let's have a look at what this one is. I actually don't know
what that is. So we'll skip that. Maybe that's a What is it? t shirt or top? This I guess
is gonna be like a dress. Yeah, so we do have dress there. Let's go for Have a look at this. Again, some of these are like hard to even
make out when I'm looking at the myself. And then I guess this will be like a hoodie or
something. I'm trying to get one to sandal to show you guys a few different ones. There
we go. So that is a sandal or a sneaker. Okay, so that is kind of how we do that and how
we look at the different images. So if you wanted to draw it out, all you do is just
make a figure, you just show the image, do the color bar, which is just giving you this,
then you're gonna say I don't want to grid and then you can just show the image, right?
Because if you don't have this line here, and you show with the grid, oh, it's actually
not showing the grid. That's interesting. Although I thought it was going to show me
those pixel grid, so I guess you don't need that line. Alright, so data pre processing.
Alright, so this is an important step in neural networks. And a lot of times when we have
our data, we have it in these like random forms or we're missing data. There's information
we don't know or that we haven't seen and typically what we need to do is pre processing.
Now what I'm going to do here is squish all my values between zero and one. Typically, it's a good idea to get all of your input values
in a neural network in between, like that range in between, I would say negative one,
and one is what you're trying to do, you're trying to make your numbers as small as possible
to feed to the neural network. The reason for this is your neural network starts out
with random weights and biases that are in between the range zero and one, unless you
change that value. So if you have massive input information and tiny weights, then you're
kind of having a bit of a mismatch. And you're going to make it much more difficult for your
network to actually classify your information. Because it's going to have to work harder
to update those weights and biases to reduce how large those values are going to be, if
that makes any sense. So it usually is a good idea to pre process these and make them in
between the value of zero and one. Now, since we know that we're just going to have pixel
values that are in the range of 255, we can just divide by 255. And that will automatically
scale it down for us. Although it is extremely important that we do this to not only the
training images, but the testing images as well. If you just pre process, your training
images, and then you pass in, you know, new data that's not pre processed, that's going
to be huge issue, you need to make sure that your data comes in the same form. And that
means when we're using the model to to make predictions, whatever, you know, I guess it
pixel data we have, we need to pre process in the same way that we pre processed our
other data. Okay, so let's pre process that. So train images, and test images. And I'm
just going to actually steal some of the stuff here and throw it in my other one before we
get too far. So let's get this data set. And let's throw it in here, just so we can come
back and reference all of this together. Let's go class names. We don't actually need the
figures, a few things I can skip, we do need this pre processing step like that. If I could
go over here, and then what else do we need, we're going to need this model. Okay, so let's
actually just copy the model into this and just make it a little bit cleaner. I'm gonna
have a look at it. So new codeblock model. Okay, so model, creating our model. Now creating
our model is actually really easy. I'm hoping what you guys have realized so far is that
data is usually the hardest part of machine learning and neural networks, getting your
data in the right form the right shape, and you know, pre processed correctly. Building
the model is usually pretty easy, because we have tools like TensorFlow, and Caris that
can do it for us. So we're gonna say model equals Kara's dot sequential. Now, sequential
simply stands for the most basic form of neural network, which we've talked about so far,
which is just information going from the left side to the right side, passing through the
layers, sequentially, right called sequential, we have not talked about recurrent or convolutional
neural networks yet. Now what we're going to do here is go Kara's dot layers dot flat.
So sorry, inside here, we're going to define the layers that we want in our neural network.
This first layer is our input layer. And what flatten does is allows us to take in a shape
of 28 by 28, which we've defined here, and flatten all of the pixels into 784 pixels.
So we take this 28 by 28, kind of matrix like structure, and just flatten it out. interiors
will do that for us, we don't actually need to take our you know, matrix data in transform
before passing. So we've done that. Next we have Kara's dot layers dot dense 128. activation
equals rectified linear unit. So this is our first hidden layer, layer two, right? That's
what I've denoted here. And this is a dense layer. Now dense again means that all of the
What is it, the neurons in the previous layer are connected to every neuron in this layer.
So we have 828 neurons here, how do we pick that number? We don't know, we kind of just
came up with it. Usually, it's a good idea that you're going to do this as like a little
bit smaller than what your input layer is, although sometimes it's going to be bigger,
you know, sometimes it's going to be half the size really depends on the problem, I
can't really give you a straight answer for that. And then our activation function we'll
define as rectified linear unit. Now we could pick a bunch of different activation functions,
there's time, we could pick sigma, and we could pick tan h, which is hyperbolic tangent,
doesn't really matter. And then we're going to define our last layer, which is our output
layer, which is a dense layer of 10 output neurons with the activation of softmax. Okay,
so can we think of why we would have picked 10? Here, right, I'll give you guys a second
to think about it. based on the fact that our output layer, you know, is supposed to
have as many neurons as classes, we're going to predict four. So that is exactly what we
have 10. If we look, we have 10 classes here. So we're going to have 10 output neurons in
our output layer. And again, we're going to have this probability distribution. And the
way we do that is using the activation function softmax. So soft, Max will make sure that
all of the values of our neurons add up to one and that there are between zero and one.
So that is our, our model. We've created the model now. So let's actually run this See, you're gonna
get any errors here, it's just going to run. And then we'll run the model. And then we'll
go on to the next step, which is actually going to be training and testing the model.
Okay, so let's create the model now shouldn't get any issues, and we're good. And now let's
move on to the next step. I'm forgetting what it is, though, which is train the model. Oh,
sorry, compiling the model. Okay, so to compiling the model. So we've built now what we call
the architecture of our neural network, right, we've defined the amount of neurons in each
layer, we've defined the activation function, and we define the type of layer and the type
of connections. The next thing we need to pick is the optimizer the loss and the metrics
we're going to be looking at. So the optimizer we're going to use is atom. This is, again,
just the algorithm that performs the gradient descent, you don't really need to look at
these too much, you can read up on some different activation functions, or sorry, optimizers,
if you want to kind of see the difference between them, but it's not crazy, we're going
to pick a loss. So in this case, sparse categorical cross entropy, again, not going to go into
depth about that you guys can look that up if you want to see how it works, and then
metrics. So what we're looking for the output that we want to see from the network, which
is accuracy. Now from, you know, kind of right now, with our current knowledge, we're just
going to stick with this as what we're going to compile our neural networks with, we can
pick different values if we want. And these are what we call, what is it hyper parameter
tuning. So the parameters that are inside here, so like the weights, and the biases
are things that we can't manually change. But these are things that we can change, right,
the optimizer, the loss, the metrics, the activation function, we can change that. So
these are called hyper parameters. Same thing with the number of neurons in each layer.
So hyper parameter tuning is a process of changing all of these values, and looking
at how models perform with different hyper parameters change. So I'm not really going
to talk about that too much. But that is something to note, because you'll probably hear that,
you know, this hyper parameter kind of idea. Okay, so we've compiled the model now using
this, which just means we picked all the different things that we need to use for it. And now
on to training the model. So I'm just gonna copy this in. Again, remember, this, these
parts are pretty syntactically heavy, but fairly easy to actually do. So we're going
to fit the model. So fit just means we're fitting it to the training data, it's another
word for training, essentially. So we're going to pass it the training images, the training
labels, and notice how much easier it has to pass this. Now, we don't need to do this
input function, we don't need to do all of that, because Kerris can handle it for us.
And we define our epochs as 10 epochs is another hyper parameter that you could tune and change
if you wanted to. Alright, so that will actually fit our model. So what I'm going to do is
put this in another code block, so I don't need to keep retraining this. So we'll go
like that. And let's actually look at this training process. So we run the model, this
should compile and now let's fit it and let's see what we actually end up getting. Alright,
so epoch one, and we can see that we're getting a loss, and we're getting accuracy printing
out on the side here. Now this was going to take a second like this is going to take a
few minutes, as opposed to our other models that we made or not a few minutes. But you
know, a few seconds, when you have 60,000 images, and you have a network that's comprised
of 784 neurons, 128 neurons, and then 10 neurons, you have a lot of weights and biases and a
lot of math that needs to go on. So this will take a few seconds to run. Now, if you're
on a much faster computer, you'll probably be faster than this. But this is why I like
Google Collaboratory. Because you know, this isn't using any of my computer's resources
to train. It's using this. And we can see, like the RAM and the disk. How do I look at
this in this network? Oh, is it in? Let me look at this now. Okay, I don't know why it's
not letting me click this. But usually you can have a look at it. And now we've trained
and we've fit the model. So we can see that we had an accuracy of 91%. But the thing is,
this is the accuracy, or testing or our training data. So now if we want to find what the true
accuracy is, what we need to do is actually test it on our testing data. So I'm going
to steal this line of code here. This is how we test our model. Pretty straightforward.
I'll just close this. Let's go to new code block. So we have test last test accuracy
is model dot evaluate test images, test labels, verbose equals one. Now what is verbose? I
was hoping it was gonna give me the things so I could just read it to you guys. But verbose
essentially, is just are we looking at output or not? So like, how much information are
we seeing as this model evaluates? It's like how much is printing out to the console? That's
what that means. And yes, this will just split up kind of the metrics that are returned to
this into test loss and test accuracy. So we can have a look at. Now, you will notice
when I run this, that the accuracy will likely be lower on this than it was on our model.
So actually, the accuracy we had from this was about 91. And now we're only getting at
8.5. So this is an example of something we call overfitting. Our model seemed like it
was doing really well on the testing. data or sorry, the training data. But that's because
it was seeing that data so often right with 10 epochs, it started to just kind of memorize
that data and get good at seeing that data. Whereas now when we pass it new data that
it's never seen before, it's only 88.5% accurate, which means we overfit our model. And it's
not as good at generalizing for other data sets, which is usually the goal, right? When
we create a model, we want the highest accuracy possible, but we want the highest accuracy
possible on new data. So we need to make sure our model generalizes properly. Now in this
instance, you know, like, it's, it's hard to figure out how do we do that, because we
don't know that much about neural networks. But this is the idea of overfitting and of
hyper parameter tuning, right? So if we can start changing some of this architecture,
and we can change, maybe the optimizer the loss function, maybe we go epochs eight, let's
see if this does any better, right, so let's now fit the model with eight epochs. We'll
have a look at what this accuracy is. And then we'll test it and see if we get a higher
accuracy on the testing data set. And this is kind of the idea of that hyper parameter
tuning, right? We just look at each epoch or not each epoch, we look at each parameter,
we tweak them a little bit, and usually will, like write some code that automates this for
us. But that's the idea is we want to get the most generalize accuracy that we can.
So I'll wait for this to train, we're actually almost done. So I won't even bother cutting
the video. And then we'll run it this evaluation. And we'll see now if we got a better accuracy.
Now I'm getting a little bit scared because the accuracy is getting very high here. And
sometimes you know that you want the accuracy to be high on your training data. But when
it gets to a point where it's very high, you're in a situation where it's likely that you've
overfit. So let's look at this now. And let's see what we get. So 88.4. So we actually dropped
down a little bit. And it seemed like those epochs didn't make a big difference. So maybe
if I train it on one epoch, let's have an idea and see what this does, you know, make
your prediction, you think we're gonna be better? Do you think we're gonna be worse,
it's only seen the training data one time, let's run this. And let's see 89.34. so in
this situation, less epochs was actually better. So that's something to consider, you know,
a lot of people I see just go like 100 epochs and just think their model is going to be
great, that's actually not good to do. A lot of the times you're going to have a worse
model, because what's going to end up happening is it's going to be seeing the same information
so much tweaking, so specifically to that information that it's seen that when you show
it new data, it can't actually, you know, classify and generalize on that. Alright, so let's go back. And let's see what else we're doing
now, with this. Okay, so now that we've done that, we need to make predictions. So to make
predictions is actually pretty easy. So I'm actually just going to copy this line in,
we'll go into a new code block down here. So all you have to do is say model dot predict,
and then you're going to give it an array of images that you want to predict on. So
in this case, if we look at test images shape, so actually, let's make a new code block.
And let's go here. So let's say test underscore images, dot shape. All right, give me a second.
So we have 10,000 by 28, by 28. So this is an array of 10,000 entries of images. Now,
if I just wanted to predict on one image, what I could do is say test images, zero and
then put that inside of an array. The reason I need to do that is because the data that
this model is used to seeing is an array of images to make a prediction on that's what
this predict method means. And it's much better at making predictions on many things at once
than just one specific item. So if you are predicting one item only, you do need to put
it in an array, because it's used to seeing that form. So we could do this. Um, I mean,
I'm just going to leave it so we're just going to predict on every single one of the test
images, because then we can have a look at a cool function I've kind of made. So let's
actually do this predictions equals model dot predict test images. I mean, let's print
predictions. And look at actually what it is. Where is my autocomplete? There it is.
Okay. So let's have a look. Is this some object? Whoa. Okay, so this is arrays of arrays, that
looks like we have some, like really tiny numbers on them. So what this is, is essentially
every single, you know, prediction, or so every single image has a list that represents
the prediction for it, just like we've done with kind of the linear models and stuff like
that. So if I want to see the prediction for test image zero, I would say prediction zero,
right? let's print this out. And this is the array that we're getting these this is the
probability distribution that was calculated on our output layer for you know, these, what
does it for that image. So if we want to figure out what class we actually think that this
is predicting for, we can use a cool function from NumPy called arg Max, which essentially
is just going to take the index, this is going to return to us the index of the maximum value
in this list. So let's say that it was I'm looking for the least negative, which I believe
is this, so this should be nine. This should return to us nine because this is the index
have the highest value in this list, unless I'm just wrong when I'm looking at the negatives
here. So nine, that's what we got. Okay, so now if we want to see what the actual classes,
well, we have our class names up here. So we know class nine is actually ankle boot.
So let's see if this is actually an ankle boot. So I'm just going to do class underscore
names. I think that's what I called it, like this. So that should print out what it thinks
it is. Yeah, class underscore names. But now, let's actually show the image of this prediction.
So to do that, I'm just going to steal some code from here because I don't remember all
the syntax off the top of my head. So this, so let's steal this figure. Let's show this
and let's see if it actually looks like an ankle boots. So to do that, we're gonna say,
test underscore images, zero because obviously, image zero corresponds to prediction zero,
and that will show this and see what we get. Okay, so ankle boot, and we'll be looking
at the image is actually an ankle boot. And we can do this for any of the images that
we want to read. So if I do prediction one, prediction one, now let's have a look pullover,
that kind of looks like a pullover to me. I mean, I don't know if it actually is, but
that's what it looks like. You need to to take a look here. Okay, trouser, yep, looks
like trousers to me. And we can see that that is how we get to predictions for our model,
we use model dot predict. Alright, so let's move down here now to the next thing that
we did. Alright, so we've already done that. So verifying predictions, okay. So this is
actually a cool kind of script that I wrote, I'll zoom out a little bit, so we can read
it. What this does, is let us use our model to actually make, and I'm stealing some of
this from TensorFlow to make predictions on any entry that we want. So what it's going
to do is ask us to type in some number, we're going to type in that number, it's going to
find that image in the test data set, it's going to make a prediction on that from the
model, and then show us what it actually is versus what it was predicted being. Now I
just need to actually run. Actually, let's just steal this code and bring it in the other
one, cuz I've already trained the model there. So we don't have to wait again. So let's go
11. For 11, let's go to a new code block, and run that. So let's run this script, have
a look down here. So pick a number, we'll pick some number, let's go 45. And then what
it's going to do is say expected sneaker, guess sneaker and actually show us the image there. So we can see this
is what you know, our pixel kind of data looks like. And this is what the expected was. And
this is what the guess was from the neural network. Now we can do the same thing. If
we run it again, pick a number 34. Let's see here, expected bank, guess back. So that's
kind of showing you how we can actually use this model. So anyways, that that has been
it for this kind of module on neural networks. Now I did this in about an hour, I'm hoping
I explained a good amount that you guys understand now how neural networks work. In the next
module, we're going to move on to convolution neural networks, which again, should help
you know, kind of get your understanding of neural networks up as well as learn how we
can do deep computer vision, object recognition and detection using convolutional neural networks.
So with that being said, let's get into the next module. Hello, everyone, and welcome
to the next module in this TensorFlow course. So what we're gonna be doing here is talking
about deep computer vision, which is very exciting, very cool. This has been used for
all kinds of things you ever seen the self driving cars, for example, Tesla, they you
actually use a TensorFlow deep learning model is obviously very complicated more than I
can really explain here to do a lot of their computer vision for self driving, we've used
computer vision in the medicine field, computer vision is actually used in sports a lot. For
things like goal line technology, and even detecting images and players on the field
doing analysis. There's lots of cool things we're doing with it nowadays. And for our
purposes, what we're gonna be doing is using this for to perform classification, although
it can be used for object detection and recognition, as well as facial detection and recognition
as well. So all kinds of applications, in my opinion, one of the cooler things in deep
learning that we're doing right now, and let's go ahead and talk about we're actually gonna
be focusing on here. So we're gonna start by discussing what a convolutional neural
network is, which is essentially the way that we do deep learning, we're going to learn
about image data. So what's the difference between image data and other regular data,
we're gonna talk about convolutional layers and pooling layers and how stacks of those
work together as what we call a convolutional base for our convolutional neural network.
We're going to talk about cnn architectures and get into actually using pre trained models
that have been developed by companies such as Google and TensorFlow themselves to perform
classification tasks for us. So that is pretty much the breakdown of what we're about to
learn there's quite a bit in this module is probably the more difficult one or the most
difficult one we've been doing so far. So if you do get lost at any point, and you don't
understand some of it, don't feel bad. This stuff is very difficult. And I would obviously
recommend reading through some of the descriptions I have here in this notebook, which again,
you can find from the link in the description or looking up some things that maybe I don't
go into Enough, enough. depth about in your own time as I can't really spend, you know,
1011 hours explaining a convolutional neural network. So let's now talk about image data,
which is the first thing we need to understand. So in our previous examples, what we did with
when we had a neural network is we had two dimensional data, right, we had a width and
a height when we're trying to classify some kind of images using a dense neural network.
And well, that's what we use two dimensions, well, with an image, we actually have three
dimensions. And what makes up those dimensions, well, we have a height and we have a width,
and then we have something called color channels. Now, it's very important to understand this,
because we're going to see this a lot as we get into convolution networks that the same
image is really represented by three specific layers, right, we have the first layer, which
tells us all of the red values of the pixels, the second layer, which tells us all the green
values, and the third layer, which tells us all the blue values. So in this case, those
are the color channels. And we're going to be talking about channels in depth quite a
bit in this series. So just understand that although you think of an image as a two dimensional
kind of thing, in our computer, it's really represented by three dimensions, where these
channels are telling us the color of each pixel. Because remember, in red, green, blue,
you have three values for each pixel, which means that you're going to need three layers
to represent that pixel. Right. So this is what we can kind of think of it as a stack
of layers. And in this case, a stack of pixels, right, or stack of colors really telling us
the value for each pixel. So if we were to draw this to the screen, we would get the
blue, green and red values of each pixel, determine the color of it, and then draw the
two dimensional image right based on the width and the height. Okay, so now we're gonna talk about a convolutional
neural network and the difference between that in a dense neural network. So in our
previous examples, when we use the dense neural network to do some kind of image classification,
like that fashion, M, this data set, what it essentially did was look at the entire
image at once and determined based on finding features in specific areas of the image, what
that image was, right? Maybe it found an edge here, a line here, maybe it found a shape,
maybe it found a horizontal diagonal line. The important thing to understand, though,
is that when it found these patterns and learn to the patterns that made up specific shapes,
it learns them in specific areas, it knew that if we're in between, for example, looking
at this cat image, we're gonna classify this as a cat. If an eye exists on, you know, the
left side of the screen where the eyes are here, then that's a cat. It doesn't necessarily
know that if we flipped this cat, we did a horizontal flip of this cat. And the eyes
were over here that that is a pattern that makes up a cat. So the idea is that the dense
network looks at things globally, it looks at the entire image and learns patterns in
specific areas. That's why we need things to be centered, we need things to be very
similar when we use a dense neural network to actually perform image classification.
Because it cannot learn local patterns, and apply those to different areas of the image.
So for example, some patterns we might look for, when we're looking at an image like a
cat here would be something like this, right, we would hope that maybe we could find a few
ears, we could find the eyes, the nose. And you know, the pause here, and those features
would tell us that this makes up a cat. Now with a dense neural network, it would find
these features it would learn them learn these patterns, we would only learn them in this
specific area where they're boxed off, which means if I horizontally flipped this image,
right, and I go like that, then it's not going to know that that's a catch, because it learned
that pattern a specific area, it'll need to relearn that pattern in the other area. Now,
convolutional neural network, on the other hand, learns local patterns. So rather than
learning that the ear exists in, you know, this specific location, it just learns that
this is what an ear looks like. And it can find that anywhere in the image. And we'll
talk about how we do that as we get to the explanation. But the whole point is that our
convolutional neural network will scan through our entire image, it will pick up features
and find features in the image. And then based on the features that exist in that image will
pass that actually to a dense neural network or a dense classifier, it will look at the
presence of these features and determine, you know, the combination of these presences
of features that make up specific classes or makeup specific objects. So that's kind
of the point. I hope that makes sense. The main thing to remember is that dense neural
networks work on a global scale, meaning they learned global patterns, which are specific
and are found in specific areas. Whereas convolutional neural networks or convolutional layers will
find patterns that exist anywhere in the image because they know what the pattern looks like.
Not that it just exists in a specific area. Alright, so how they work right? So let's
see, when a neural network regular neural network looks at this a dog image, this is
a good example, I should have been using this before, it will find that there's two eyes
that exist here, right? It will say okay, so I found that these eyes make up a dog.
This is its training image, for example, and it's like okay, so this pattern makes up the
dog the iris is in this location. Now what happens when we do this and we flip the image
to the The other side, well, our neural network starts looking for these eyes right on the
left side of the image where it found them previously, and where it was trained on, it
obviously doesn't find them there. And so it says that our image isn't a dog, although
it clearly is a dog, it's just a dog that's oriented differently. In fact, it's just flipped horizontally, right? We're actually I guess,
I would say, vertically flipped vertically. So since it doesn't find the eyes in this
location, and it can only look at patterns that is learned in specific locations, it
knows that this or it's gonna say this isn't a dog, even though it is, whereas our convolutional
layer will find the eyes regardless of where they are in the image, and still tell us that
this is a dog, because even though the dogs moved over, it knows what an eye looks like.
So we can find the eye anywhere in the image. So that's kind of the point of the convolutional
neural network and the convolutional layer. And what the convolutional layer does is look
at our image and essentially feed back to us what we call an output feature map that
tells us about the presence of specific features, or what we're going to call filters in our
image. So that is kind of the way that works. Now, essentially, the thing we have to remember
is that our dense neural networks output just a bunch of numeric values. Whereas what our
convolutional layers are actually going to be doing is outputting. What we call feature
map, I'm going to scroll down here to show you this example, we're actually going to
do is run what we call a filter over our image, we're going to sample the image at all these
different areas. And then we're going to create what we call an output feature map that quantifies
the presence of the filters pattern at different locations. And we'll run many, many, many
different filters over our image at a time. So that we have all these different feature
maps telling us about the presence of all these different features. So one convolutional
layer will start by doing that with very small, simple filters, such as straight lines like
this. And then other convolution layers on top of that, right, because it's going to
return a map that looks something like this out of the layer. We'll take this map in now
the one that was created from the previous layer and say, Okay, what this map is representing
to me, for example, the presence of these diagonal lines, let me try to look for curbs,
right or let me try to look for edges. So it will look at the presence of the features
from the previous convolutional layer, and then say, Okay, well, if I have all these
lines combined together, that makes up an edge, and it will look for that, right. And
that's kind of the way that a convolutional neural network works and why we stack these
different layers. Now, we also use something called pooling, and there's a few other things
that we're going to get into. But that is the basics. I'm going to go into a drawing
example and show you exactly how that works. But hopefully, this makes a little bit of
sense that the convolution layer returns a feature map that quantifies the presence of
a filter at a specific location. And this filter, the advantage of it is that we slide
it across the entire image. So if this filter or This feature is present anywhere in the
image, we will know about it rather than in our dense network where it had to learn that
pattern in a specific global location. Okay, so let's get on the drawing tablet and do
a few examples. Alright, so I'm here on my drawing tablet, and we're going to explain
exactly how a convolutional layer works and how the network kind of works together. So
this is an image I've drawn on the left side of our screen here. I know this is very basic,
you know, this is just an x, right? This is what our images, we're just going to assume
this is grayscale, we're going to avoid doing anything with color channels this second,
just because they're not that important. But just understand that what I'm about to show
you does apply to cover channels, as well and to multiple kind of layers and depth.
And then if we can understand it on a simple level, we should understand it more thoroughly.
So what we want essentially is our convolutional layer, to give us some output, it's meaningful
about this image. So we're gonna assume this is the first convolutional layer. And what
it needs to do essentially is returned to us some feature map that tells us about the
presence of specific what we call filters in this image. So each convolutional layer
has a few properties to it. The first one is going to be the input signs. So what can
we expect? Well, what is that that was as the as the input size? How many filters are
we going to have? So filters like this, and what's the sample size of our filters? That's
what we need to know, for each of our convolutional neural networks. So essentially, what is a
filter will filter is just some pattern of pixels. And we saw them before we'll do a
pretty basic one here, as the filter we're going to look for, which looks something like
this. This will be the first filter we're going to look for just to illustrate how this
works. But the idea is that at each convolutional layer, we look for many different filters.
And in fact, the number we're typically looking for, is actually about times 32 filters. Sometimes
we have 64 filters as well and sometimes even 128. So we can do as many filters so we want
as few filters as we want, but the filters are what is going to be trained. So this filter
is actually what is going to be found by the neural network. It's what's going to change
it You know, this is essentially what we're looking
for this is what's created in the program. And that's kind of like the trainable parameter
of a convolutional neural network is the filter. So the amount of filters and what they are
will change as the program goes on. As we're learning more and figuring out what features
that make up, you know, a specific image. So I'm going to get rid of this stuff right
now, just so we can draw and do a basic example. But I want to show you how we look for a filter
in the image. So we have filters, right, they'll come up with them looking to start completely
random, but they'll change as we go on. So let's say the filter we're looking for is
that one I drew before, I'm just gonna redraw it at the top here a little bit smaller. And
we'll just say it's a diagonal line, right. But another filter we could look for might
be something like, you know, a straight line, just like that all across, we could have a
horizontal line. And in fact, we'll have 32 of them. And when we're doing just, you know,
three by three grids of filters, well, there's not that many combinations, we're going to
do at least grayscale wise. So what we'll do is we'll define the sample size, which
is how big our filter is, is going to be three by three, which we know right now, which means
that what we're going to do is we're going to look at three by three spots in our image,
and look at the pixels. And try to find how closely these filters match with the pixels
we're looking at on each sample. So what this is going to do, this convolution layer is
going to output us what we call a feature map, which can be a little bit smaller than
the original image. And you'll see why in a second. But that tells us about the presence
of specific features in areas of image. So since we're looking for two filters, here,
actually, we'll do two filters, which means that we're actually going to have a depth
two feature map being returned to us right, because for two filters, that means we need
two maps, quantifying the presence of both of those filters. So for this green box that
we're looking at the left side here, we'll look for this first filter here. And what
do we get? Well, the way we actually do this, the way we look at this filter, is we take
the cross product, or actually the cross product, the dot product, sorry, between this little
green box and this filter, right? Because they're both pixels, they're both actually
numeric values down at the bottom. So what we do is we take that dot product, which essentially
means we're element wise, adding, or what is it element wise, multiplying all these
pixels by each other. So if this pixel values is zero, right, because it's white, or it
could be the other way around, we could say White is one black is zero, it doesn't really
matter, right? If this is a zero, and this is a one, these are obviously very different.
And when we do the dot product of those two, so we multiply them together, then in our
output feature, we would have zero, right, that's kind of the way it works. So we do
this dot product of this entire thing. If you don't know what the dot product is, I'm
not really going to go into that. But we do the dot product. And that gives us some value
essentially telling us how similar these two blocks are. So how similar this sample is
that we're taking the image and the filter that we're looking for, they're very similar,
we're going to likely put a one or something telling us you know, they're very close together,
they're not similar at all, we're going to put a zero. So in this case, for our first
filter, we're probably going to have a value because this middle pixel is the same of something
like zero point like one, two, right? But the all the other values are different. So
it's not going to be very similar whatsoever. So then, what we're going to do now is we'll look at
the actually second filter, which is this horizontal line. And in fact, we're going
to get a very similar output response here, probably something like, you know, 0.12, that's
going to go in the top left. And again, these are both maps representing each filter, right?
So now we'll move our green box over one like this, to just shift that over one. And now
we'll start looking at the next section. And in fact, I'm gonna see if I can erase this,
just to make it a little bit cleaner here. Get rid of the green, there we go. Okay, so
we'll move this box over like this. And now start looking at this one, it will do the
exact same thing we did again before. So we're gonna say, all right, how similar are these?
Well, they're not similar at all. So we're gonna get zero for that first filter, how
some of the other ones? Oh, actually, they're like, a little bit similar. There's a lot
of white that's kind of in the same space, like, you know, stuff like that. So we'll
say maybe this is like 0.7, right? I'm just randomly picking these numbers, they are going
to be much different than what I'm putting in here. But I'm just trying to get you to
understand what's kind of happening, right, and this is completely random, the way I'm
making the numbers, just make sure you understand that because this is not exactly what it would
look like. Okay, so then we're gonna move the box over one more time, let's just erase
this to keep this clean. This will be the last time we do this for the purpose of this
example. And now what we're going to have is Wow, we have a perfect match for the first
filter. So we put one, the other ones like add, it's kind of similar, there's a few things
that are different. So maybe this gets like 0.4 or something, right? whatever they are,
we end up getting some value. So we'll fill in all these values. Let's just put some arbitrary
values here for now, just so we can do something with the examples 0.70 0.1 to 0.4 to 0.3 0.9
to point one, again, completely random 0.4 0.6 Alright, so this is now what We've gotten
our response map from looking at two filters on our original image of five by five. Now
notice that the size of these is three by three. And obviously the reason for that is
because in a five by five image, when we're taking three by three samples, well, we can
only take nine, three by three samples, because when we go down a row, right, we're going
to move down one, and we're going to do the same thing we did before of this, these three
by three samples. And if we add the amount of times we can do that, well, we just get
three by three, which is now. So this now is kind of telling us the presence of features
in this original image map. Now, the thing is, though, we're going to do this 64 times
right, for 64 filters, or 32 filters of the amount of filters that we have. So we're going
to have a lot of layers like a ton of different layers, which means that we're going to be
constantly expanding as we go through the convolutional layers, the depth of this, this
kind of output feature map. And that means that there's a lot of computations that need
to be done. And essentially, that means that this can be very slow. So now we need to talk
about an operation called pooling. So I'll backtrack a little bit, but we will talk about
pooling in a second, what's going to happen right is when we have all these layers that
are generated, so this is called the output feature map right? From this original image,
what we're gonna do is the next convolution layer, and the network is now going to do
the process we just talked about, except on this output feature map, which means that
since this one was picking up things like lines and edges, right, the next convolutional
layer, will pick up combinations of lines and edges and maybe find what a curve is,
right, we'll slowly work our way up from very, very small amount of pixels, to finding more
and more, almost, I want to say abstract, different features that exist in the image.
And this is what really allows us to do some amazing things with a convolutional neural
network. When we have a ton of different layers stacking up on each other, we can pick out
all the small little edges, which are pretty easy to find. And with all these combinations
of layers working together, we can even find things like say eyes, right, or feet, or heads
or face, right, we can find very complicated structures, because we slowly work our way
up starting by solving very easy problem, which are like finding lines, and then finding
combinations of lines, combination of edges, shapes, and very abstract things. That's how
this convolutional network works. So we've done that now it's now time to talk about
pooling. And we'll also talk about Pat, actually, we'll go padding first, before we go pooling,
I just, it doesn't really matter what order we talk about this in. But I just think padding
makes sense based on the way we're going right now. So sometimes, we want to make sure that
the output feature map from our original image here is the same dimensions are same size
as this red. So this is five by five, obviously, and this is three by three. So if we want
this to be five by five as an output, what we need to do is add something called padding
to our original image. So padding is essentially just adding an extra row and column on each
side of our image here, so that when we and we just fill in all these pixels, and like
kind of the padded pixels here, I just blank random pixels, they don't mean anything. Essentially,
why we do that is so that when we do our three by three sample size here like this, we can
take a three by three sample where every single pixel is in the center of that sample, because
right now, this pixels not in the center, this pixel can never be in the center, this
pixel can never be in the center, only, you know, a few pixels get to be in the center.
And what this allows us to do is generate an output map that is the same size as our
original input, and allows us to look at features that are maybe right on the edges of images
that we might not have been able to see. See before. Now this isn't super important when
you go to like very large images, but it just something to consider you can add padding,
or we may do this as we get through our examples. And there's also something called stride which
I want to talk about as well. So what a stride is is essentially how much
we move the sample box every time that we're about to move it right. So before like so
let's say we're doing an example of padding here, right, we our first sample we would
take here and again, these pixels are just added we added them in to make this work better
for us, you would assume that the next time we move the box, you're gonna move it one
pixel over, that's called a stride of one, we can do that. But we also can employ a stride
of two, which means we'll move over by two, obviously, the larger your stride, the smaller
your output feature map is going to be. So you might want to add more padding, well,
you don't want to add too much padding, but it's just something to consider and we will
use a stride in different instances. Okay, so that's great. That hopefully makes sense.
Let's erase this. Now we don't need this anymore. We talked about padding, we talked about the
stride. Now we're gonna talk about a pooling operation, which is very important. So kind
of the idea is that we're gonna have a ton of layers, right for all these filters, and
we're just gonna have a lot of numbers, a lot of computations, and there must be some
way to make these a little bit simpler, a little bit easier to use. Well, yes, that's
true. And there is a way to do that. And that's called pooling. So there's three different
types of pooling. Well, there's more but the basic ones are min, max, and average. And
essentially, a pooling operation is just taking specific values from a sample of the output
feature map. So once we generate this output feature map, what we do to reduce its dimensionality
and just make it a little bit easier to work with, is when we sample, typically two by
two areas of this output feature map, and just take either the min max or average value
of all the values inside of here and map these, we're gonna go back this way to a new feature
map that's twice the one times the size essentially, or not, what am I saying two times smaller
than this original map. And it's kind of hard with three, like three by three to really
show you this. But essentially, what's gonna end up happening is we're gonna have something
like this. So we're gonna take the sample here, we're gonna say, Okay, what are we doing
min max, or average pooling, we're doing min pooling, we're gonna take the smallest value,
which means we'll take zero, we're doing max pooling, we'll take the maximum value, which
means we'll take 0.3, if we're doing average, we're probably gonna get an average value
of close to what 0.2, maybe. So let's say 0.2, we'll go there. That's how we do that
with pooling again, just to make this feature map smaller. So we'll do that for both of
the filters. But let's just say this is your point to let's say, this here that I'm blocking
off, is, I don't know, what is this gonna be 0.6. It's hard to do this average with
four numbers, let's say this one down here, is going to be zero point. I don't know, let's just do two one or something.
And then this last one, here, we Okay, we got some bigger values, maybe this will be
like 0.4. Okay, so that's one, the one down here we'll have some values of its own, we'll
just do squiggles to represent that it has something, we've effectively done a max pooling
operation on this, we've reduced the size of it by about half. And that is kind of how
that works. Now typically, what we do is we use a two by two pooling, or like sample size
like that with a stride of two, which actually means that we would straw it like this. But
since we're not going to do padding on this layer, right now, we'll just do a stride of
one. And this is how we pool it. Now, the different kinds of pooling are used for different
kinds of things, the reason we would use a max pooling operation is to pretty much tell
us about the maximum presence of a feature in that kind of local area, we really only
care if the feature exists, where if it doesn't exist, and average pooling is not very often
used, although in this case, we did use an average pooling. But you know, it just different
kinds of pooling average tells you about the average presence of the feature in that area.
Max tells you about is that feature present in that area at all, and min tells you does
it not exist. If it doesn't exist, right, we're just gonna have a zero if there's even
one zero in that area. So that's the point of pooling and support of convolutional layers.
I think I'm done with the whiteboarding. For now we're actually going to start getting
into a little bit of code, and talking about creating our own convolutional networks, which
hopefully, will make this a lot more clear. So let's go ahead and get into that. Alright,
so now it is time to create our first convolutional neural network. Now we're gonna be using Kerris
to do this. And we're also gonna be using the CIA FDR image data set that contains 60,000
images of 10 different classes of everyday objects. Now, these images are 32 by 32, which
essentially means they are blurs, and they are colorful. Now, I just want to emphasize
as we get into this, that the reason I'm not typing all of these lines out, and I just
have them in here already, is because this is likely what you guys will be using or doing
when you actually make your own models. Chances are that you are not going to sit unless you're
a pro at TensorFlow, and I am not even there yet either with my knowledge of it, and have
all of the lines memorized and not have to go reference the syntax. So the point is here,
so long as you can understand why this works, and what these lines are doing, you're gonna
be fine. You don't need to memorize them, and I have not memorized them and I don't
I look up the documentation, I copy and paste what I need, I alter them, I write a little
bit of my own code. But that's kind of what you're going to end up doing. So that's what
I'm doing here. So this is the image data set. We have truck, core ship, airplane, you
know, just an everyday regular objects, there is 60,000 images, as we said, and 6000 images
of each class. So we don't have too many images of just one specific class. So we'll start
by importing our modules. So TensorFlow, we're gonna import TensorFlow caros. We're gonna
use the data set built into Kerris for this. So that's the CI FDR image data set, would
you actually look at just by clicking at this, it'll bring you and give me information about
the data set, although we don't need that right now, because I already know the information
about it. And now we're just going to load our images in. So again, this stuff, the way
this works is you're gonna say data sets.ci, fA r 10 dot load data. Now this load data
has like a very strange TensorFlow object that's like a data set object. So this is
different from what we've used before where some of our objects have actually been, like
a NumPy arrays where we can look at them better, this is not going to be in that. So just something
to keep in mind here. We're going to normalize this data into train images and test images,
but it's dividing both of them by 255. Now again, we're doing that because we want to
make sure that our values are between zero and one because that's a lot better to work
with in our neural networks rather than large integer values, just causes, you know, some
things to mess up sometimes, no class names, we're just going to find a list here. So we
have all the class names, so that zero represents airplane one automobiles so far until truck,
we run that block of code here, we'll download this data set, although I don't think it takes
that long to do that. So okay, so wait, I guess yeah, I guess that's good. I think we're
okay there. And now, let's just have a look at actually some of the images here by running
this script. So we can see this is a truck can change the image index to be two, we can
see this is another truck, let's go to say six, we get a bird. And you can see these
are really blurry, but that's fine. For this example, we're just trying to get something
that works. Alright, guys, that's a horse, you know, you get the point. Alright, so now
cnn architecture. So essentially, we've already talked about how a convolution neural network
works. We haven't talked about the architecture and how we actually make one, essentially,
what we do is we stack a bunch of convolutional layers, and Max pooling, min pooling, or average
pooling layers together in something like this, right. So after each convolutional layer,
we have a max pooling layer, some kind of pooling layer, typically, to reduce the dimensionality,
although you don't need that, you could just go straight into three convolutional layers.
And on our first layer, what we do is we define the amount of filters just like here, we define
the sample size. So how big are those filters and activation function, which essentially
means after we apply that, what is it that cross not crossed by dot product operation
that we talked about, will apply rectified linear unit to that and then put that in the
output feature map? Again, we've talked with activation functions before, so I won't go
too far into depth with them. And then we define the input shape, which essentially
means what can we expect in this first layer? Well, 32 by 32, by three, these ones, we don't
need to do that, because they're going to figure out what that is based on the input
from the previous layer. Alright, so these are just a breakdown of the layers. The convolution
or the max pooling layers here, two by two essentially means that we're going to do is
we're going to have a two by two sample size with actually a stride of two. Again, the
whole point of this is to actually divide or, you know, shrink it by a factor of two,
how large each of these layers are. Alright, so now let's have a summary. It's already
printed out here, we can see that we have, Wait, is this correct? mobilenet v2, I don't
think that's correct. That's because I haven't run this one. My apologies on that, guys,
this is from something later in the tutorial, we can see that we have comm two D as our
first layer, this is the output shape of that layer, notice that it is not 32 by 32, by
32. It is 30 by 30 by 32. Because when we do that sampling, without padding, right,
that's what we're gonna get, we're gonna get to pixels less, because the amount of samples
we can take. All right, next, we have the max pooling to dealer. So this now this is
the output shape is 15 by 15, by 32, which means we've shrunk this shape by a factor
of two, we do a convolution on this, which means that now we get 1313. And we're doing
64. Because we're going to take 64 filters, this time. And the max pooling again, we go
six by six by 64, because we're going to divide this again by a factor of two, notice that
it just rounded, right, and then come to do so another layer here, we get four by four
by 64. Again, because of the way we take those values. So this is what we have defined so
far. But this is not the end of our convolutional neural network. In fact, this doesn't really
mean much to us, right? This just tells us about the presence of specific features as
we've gone through this convolution base, which is what this is called the stack of
convolution and Max pooling layers. So what we actually need to do is now pass this information
into some kind of dense layer classifier, which is actually going to take this pixel
data that we've kind of calculated and found, so the almost extraction of features that
exist in the image, and tell us which combination of these features map to either you know,
what one of these 10 classes are. So that's kind of the point you do this convolution
base, which extracts all of the features out of your image. And then you use the dense
network to say, Okay, well, if these combination of features exist, then that means this image
is this, otherwise, it's this and that, and so on. So that's what we're doing here. Alright,
so let's say adding the dense layer. So to add the dense layer is pretty easy model dot
add is just how we add them, right. So we're going to flatten all of those pixels was,
which essentially means take the four by four by 64. And just put those all into a straight
line like we've done before, so just one dimensional, then we're going to have a 64 neuron dense
layer that connects all of those things to it with an activation function of rectified
linear unit, then our output layer of a dense layer with 10 neurons, obviously 10 because
that's the amount of classes we have for this problem. So let's run this here. We'll add
those layers. Let's look at a summary and see Oh, things have changed now. Should we
go from four by four by 64 2024. Notice that that is precisely the calculation of four
times four times 64. That's how we get that number here, then we have a dense layer and
another dense layer. And this is our output layer. Finally, this is what we're getting
is we're going to get 10 neurons out. So essentially just a list of values. And that's how we can
determine which class is predicted. So this up to here is the convolutional base. This
is what we call the classifier, and they work together to essentially extract the features,
and then look at the features and predict the actual object or whatever it is the class.
Alright, so that's how that works. Now it's time to train again, we'll go through this
quickly. Um, I believe I've already trained this, this takes a long time to train. So
I'm actually going to reduce the epochs here to just be four, I'd recommend you guys train
this on higher. So like 10, if you're going to do but it does take a while. So for our
purposes, and for my time, will leave a little bit shorter right now. But you should be getting
about a 70% accuracy. And you can see I've trained this previously, if you train it on
10 epochs, but I'm just gonna train up to four, we get our 67 68%. And that should be
fine. So we'll be back once this is training, and we'll talk about how some of this works.
Okay, so the model is finally finished training, we did about four epochs, you can see we got
an accuracy about 67% on the evaluation data, to quickly go over this stuff. optimizers,
Adam talked about that before loss function is sparse, categorical cross entropy. That
one I mean, you can read this, if you want computes the cross entropy loss between the
labels and predictions. And I'm not gonna go into that, but these kinds of things are
things that you can look up if you really understand why they work. For most problems,
you can just if you want to figure out what you know, loss function optimizer to use,
just use the basics, like use atom use a categorical cross entropy, using a classification task,
you want to do something like this, there's just you can go up and look kind of all of
the different loss functions. And it'll tell you when to use which one and you can kind
of mess with them and tweak them if you want. Now, history equals model dot fit this just
so we can access some of the statistics from this model dot fit. Obviously, it's just training
the data to this test images, test labels, and train images and train labels where this
is the validation data suite. So evaluating the model, we want to evaluate the model,
we can evaluate it now on the test images and test labels, we're obviously going to
get the same thing because the valuation is test images and test labels. So we should
get the same accuracy as 60 735, which we do right here. Alright, so there we go, we
get about 70% of you guys trained this on 10 epochs, you should get close to 70, I'm
a little bit lower, just because I didn't want to go that high. And that is now the
model. I mean, we could use this if we want, we could use predict, we could pass in some
image. And we could see the prediction for it. I'm not going to do that just because
we've already talked about that enough. And I want to get into some of the cooler stuff
when we're working with smaller data sets. So the basic idea here is this is actually
a pretty small data set, right, we use about 60,000 images. And if you think about the
amount of different patterns, we need to pick up to classify, you know, things like horses
versus trucks, that's a pretty difficult task to do. Which means that we need a lot of data.
And in fact, some of the best convolutional neural networks that are out there are trained
on millions of pieces of you know, sample information or data. So obviously, we don't
have that kind of data. So how can we work with, you know, a few images, maybe like a
few 1000 images, and still get a decent model? Well, the thing is, you can't unless we use
some of the techniques, and I'm about to show you. So working with small data sets. So just
like I mentioned, it's difficult to create a very good convolution neural network from
scratch, if you're using a small amount of data. That is why we can actually employ these
techniques, the first one data augmentation, but also using pre trained models to kind
of accomplish what we need to do. And that's we're going to be talking about now on the
second part of the tutorial, we're going to create another convolutional neural network.
So just to clarify, this is created, we've made the model up here already, this is all
we need to do to do it. This is the architecture and this was just to get you familiar with
the idea. So data augmentation. So this is basically the idea. If you have one image,
we can turn that image into several different images, and train and pass all those images
to our, our model. So essentially, if we can rotate the image, if we can flip it, if we
can stretch it, compress it, you know, shift it, zoom it, whatever it is, and pass that
to our model, it should be better at generalizing. Because we'll see the same image, but modified
and augmented multiple times, which means that we can turn a data set, say of 10,000
images into 40,000 images by doing four augmentations on every single image. Now, obviously, you
still want a lot of unique images. But this technique can help a lot and is used quite
a bit because that allows our kind of modeled to be able to pick up images that maybe are
oriented differently or zoomed in a bit or stretch something different, right? Just better
at generalizing, which is the whole point. So I'm not going to go through this in to
depth, too much depth, but this is essentially a script that does data augmentation for you.
We're gonna use this ima image data generator from the Kara's pre processing dot image module,
we're going to create an image data generator object. And essentially what this allows us
to do is specify some parameters on how we want to modify our image. In this case, we
have the route range, some shifts, shear, zoom, horizontal flip and the mode. Now I'm
not going to go into how this works, you can look at the documentation if you'd like. But
essentially, this will just allow us to augment our images. Now what I'm going to do is pick
one arbitrary image from the test image data set, just our test image, I guess, group of
photos, whatever you want to call it, I'm going to convert that to an image array, which
essentially takes it from the weird data set object that it kind of is and turns it into
a NumPy array, then we're going to reshape this. So it's in the form one comma, which
essentially means one, and then this will figure out what the rest of the shape should
be, oh, sorry, one, and then plus the image shape, which is whatever this shape is. So
we'll reshape that. And then what we're gonna do is we're gonna say, for batch in data flow
Gen dot flow, talk about how that works in a second. Essentially, this is just going
to augment the image for us and actually save it onto our drive. So in this instance, what's
going to happen is this data Gen dot flow is going to take the image which we've created
here, right, and we formatted it correctly, by doing these two steps, which you need to
do beforehand, it's going to save this image as test dot jpg, and this will be the prefix,
which means it'll be some information after, and it will do this as many times until we
break. So essentially, given an image, it will do test one, test two, test three, test
four, test five, with random augmentations. Using this, until eventually, we decided to
break out of this. Now what I'm doing is just showing the image by doing this, and batch
zero is just showing us the you know, that first image in there. And that's kind of how
this works. So you can mess with the script and figure out a way to use it. But I would
recommend if you want to do data augmentation, just look into image data generator, this
is something that I just want to show you. So you're aware of, and I'll just run it so
you can see exactly how this works. So essentially, given an image of a truck, what it will do
is augmented in these different ways. You can see kind of the shifts, the translations,
the rotations, all of that. And we'll do actually a different image here to see what one looks
like, let's just do image size 22, we get something different. So in this case, I believe
this is maybe like a deer rabbit or a dog or something, I don't really know exactly
what it is because it's so blurry. But you can see that's kind of the shifts we're getting.
And it makes sense because you want to have images in different areas so that we have
a better generalization. Alright, so let's close that. Okay, so now we're gonna talk
about using it or sorry, what is it pre trained models? Okay. So we talked about data augmentation,
that's a great technique if you want to increase the size of your data set. But what if even
after that we still don't have enough images in our data set? Well, what we can do is use
something called a pre trained model. Now companies like Google, and you know, TensorFlow,
which is owned by Google, make their own amazing convolutional neural networks that are completely
open source that we can use. So what we're going to do is actually use part of a convolutional
neural network that they've trained already on, I believe, 1.4 million images. And we're
just gonna use part of that model, as kind of the base of our models that we have a really
good starting point. And all we need to do is what's called fine tune the last few layers
of that network, so that they work a little bit better for our purposes. So what we're
going to do essentially is say, all right, we have this model that Google's trained,
they've trained it on 1.4 million images, it's capable of classifying, let's say, 1000
different classes, which is actually the example we'll look at later. So obviously, the beginning
of that model, is what's picking up on the smaller edges. And you know, kind of the very
general things that appear in all of our images. So if we can use the base of that model, so
kind of the beginning of it, that does a really good job picking up on edges, and general
things that will apply to any images, then what we can do is just change the top layers
of that model a tiny bit, or add our own layers to it to classify for the problem that we
want. And that should be a very effective way to use this pre trained model. We're saying
we're going to use the beginning part that's really good at kind of the generalization
step, then we'll pass it into our own layers that will do whatever we need to do specifically
for our problem. That's what's like the fine tuning step. And then we should have a model
that works pretty well. In fact, that's what we're going to do in this example, now. So
that's kind of the point of what I'm talking about here is using part of a model that already
exists that's very good at generalizing, and it's been trained on so many different images.
And then we'll pass our own training data in, we won't modify the beginning aspect of
our neural network, because it already works really well. We'll just modify the last few
layers that are really good at classifying, for example, just cats and dogs, which is
exactly the example we're actually going to do here. So I hope that makes sense. As we
get through this should be cleared up a little bit. But using a pre trained model is now
the section we're getting into. So this is based on this documentation. As always, I'm
referencing everything. So you guys can see that if you'd like we do our imports like
this, we're going to load a data set. This actually takes a second to load the data set,
I believe, oh, maybe not. And essentially, the problem we're doing is trying to classify
dogs versus cats with a fair degree of accuracy. In fact, we'd like to get above 90%. So this
is the data set we're loading in from TensorFlow data sets as TF DS This is kind of a weird
way to load it in again, stuff like this, you just have to reference the documentation,
I can explain it to you. But it's not really going to help when the next example is going
to be a different way of loading the data, right? So so long as you know how to get the
data in the correct form, you can get it into some kind of NumPy array, you can split it
into training, testing and validation data, you should be okay. And if you're using a
TensorFlow data set, it should tell you in the documentation, how to load it in properly.
So loaded in here, we're training 80% train, I will go 10% for once, raw validation, and
10% for the testing data. So we've loaded that. And now what we're doing here is just
we'll look at a few images. So this actually creates a function, I know, this is a weird
thing, this is pretty unique to this example, that allows us to call this function with
some integer, essentially, and get what the actual string representation of that is to
the label for it. And what I'm doing here is just taking two images from our raw training
data set, and just displaying them. And you can see that's what we're getting here, dog
and dog. If I go ahead and take five, we'll see, these are what our images look like.
Right, so here's an example of a dog, we have a cat, right, and so on so forth, you kind
of get that you get the point there. Now, notice, though, that these images are different
dimensions. In fact, none of these images other than these two actually are the same
dimension at all. Oh, actually, I don't think these ones are either. So obviously, there's
a step that we need to do, which is we need to scale all these images to be the same size.
So to do that, what we're going to do is write a little function like this, that essentially
will return an image that is reshaped. So I guess, that these reshaped to the image
size, which I'm going to set out 160 by 160. Now we can make this bigger if we want. But
the problem is, sometimes if you make an image that is bigger than like, you want to make
your image bigger than most of your data set examples. And that means you're going to be
really stretching a lot of the examples out and you're losing a lot of detail. So it's
much better to make the image size smaller rather than bigger. You might say, well, if
you make it smaller, you're gonna lose detail too. But it's just, it's better to compress
it smaller than it is to go really big, even just when it comes to the amount of training
time and how complex networks going to be. So that's something to consider. You can mess
around with those when you're making your own networks. But again, smaller is typically
better, in my opinion, you don't wanna go too small, but something that's like, you
know, half the size, what an average image would be. Alright, so we're gonna go format
example, we're gonna just take an image and a label. And what this will do is return to
us just the reshaped image and labels. In this case, we're going to cast which means
convert every single pixel in our image to be a float 32 value because it could be integers,
we're then going to divide that by 127.5, which taken is exactly half of 255, and then
subtract one, then we're going to resize this image to be the image size. So sorry, the
image will be resized to the image size, so 160 by 160. And we'll return new image and
the label. So now we can apply this function to all of our images using map if you don't
know what map is, essentially, it takes every single example in, in this case going to be
raw train, and applies the function to it, which will mean that it will convert rod train
into images that are all resized to 160 by 160. And we'll do the same thing for validation
and test. So run that no issue there. And now let's have a look at our images and see
what we get. And there we are. Now I've just messed up the color because I didn't add a
seam up thing, which I think I needed. Where was the sea map. Anyways, you know what, that's
fine. For now. This is what our images look like this is the resize now we get all images
160 by 160. And we are good to go. Alright, so now let's have a look at the shape of an
original image versus our new image. So I mean, this was just to prove that essentially,
our original shapes were like 262 409 by some random values, and they're all reshaping that
161 60 by three, three, obviously is the color channel of the images. Alright, so picking
a pre trained model. So this is the next step is probably one of the hardest steps is picking
a model that you would actually like to use the base up. Now we're going to use one called
mobile net v two, which is actually from Google. It's built into TensorFlow itself. That's
why I've picked it. And all we're going to do is set this.
So essentially, we're going to say the base model in our code is equal to tf Kara's the
applications mobile net v2, which is just telling us the architecture of the model that
we want, we'll have a look at it down below here. In just a second, we'll define the input
shape, which is important because this can take any input shape that we want. So we'll
change it to 161 60 by three, which we've defined up here, include top, very important
means do we include the classifier that comes with this network already or not? Now in our
case, we're going to be retraining parts of this network so that it works specifically
for dogs and cats, and not for 1000 different classes, which is what this model was actually
aimed to do is train a 1.4 million images for 1000 different classes of everyday objects.
So we're going to not include the top which means don't include the classifier for these
1000 classes. And we're going to load the weights from what's called image net, which
is just a specific save of the weights. So this is the architecture and this is kind
of the data Now that we're filling in for that architecture, so the weights, and we'll
load that in which we have here. So base model. Now let's look at it. So let's have a summary.
You can see this is a pretty crazy model. I mean, we would never be expected to create
something like this by herself. This is, you know, teams of data scientists, PhD students,
engineers, would I write the experts in the field that have created a network like this.
So that's why we're going to use it because it works. So effectively for the generalization
at the beginning, which is what we want. And then we can take those features that this
takes out, so in five by five by 1280, which is what I want us to focus on the output of
this actual network here. So really, you can see this last layer, we're going to take this
and using this information, pass that to some more convolutional layers, and actually our
own classifier, I believe, and use that to predict versus dogs versus cats. So at this
point, the base model will simply open a shape 32 by five by five by 1280. That's the tensor
that we're going to get out of this. That's the shape, you can watch how this kind of
works as you go through it. And yes, alright, so we can just have a look at this here, this,
what I wanted to do essentially was just look at what the actual shape was going to be.
So 32, five by five by 1280, just because this gives us none until it knows what the
input is. And now it's I'm talking about freezing the base. So essentially, the point is, we
want to use this as the base of our network, which means we don't want to change it. If
we just put this network in right now is the base to our neural network. Well, what's going
to happen is, it's going to start retraining all these weights and biases. And in fact,
it's going to train 2.25 7 million more weights and biases, when in fact, we don't want to
change these because these have already been defined, they've been set. And we know that
they will work well for the problem already. Right? They worked well for classifying 1000
classes. Why are we going to touch this now. And if we were going to touch this, what's
the point of even using this base, right, we don't want to train this, we want to leave
it the same. So to do that, we're just gonna freeze it. Now freezing is a pretty, I mean,
it just essentially means turning the trainable attribute of a layer off or of the model off.
So what we do is we just say base model dot trainable equals false, which essentially
means that we are no longer going to be training, any aspect of that I want to say model, although
we'll just call it the base layer for now, the base model. So now if we look at the summary,
we can see when we scroll down to the bottom, if we get there any day soon, that now the
trainable parameters is zero instead of 2.25 7 million, which it was before. And now it's
time to add our own classifier on top of this. So essentially, we've got a pretty good network,
right five by five by 12, at our last output. And what we want to do now is take that, and
we want to use it to classify either cat or either Doc, right, so what we're going to
do is add a global average layer, which essentially is going to take the entire average of every
single so of 12 180 different layers that are five by five, input that into a one D
tensor, which is kind of flattening that for us. So we do that global average pooling.
And then we're just going to add the prediction layer, which essentially is going to just
be one dense node. And since we're only classifying two different classes, right, dogs and cats,
we only need one, then we're going to add all these models together. So the base model,
and I guess, layers, the global average layer that we define there, and then the prediction
layer, to create our final model. So let's do this global average layer, prediction layer
model. Give that a second to kind of run there. Now when we look at the summary, we can see
we have mobile net v2, which is actually a model, but that is our base layer. And that's
fine, because the output shape is that then global average pooling, which again, just
takes this flattens it out does the average for us. And then finally, our dense layer,
which is going to simply have one neuron, which is going to be our output. Now notice
that we have 2.25 and 9 million parameters in total. And only 1200 81 of them are trainable.
That's because we have 1200 80 connections from this layer to this layer, which means
1200 80 weights and one bias. So that is what we're doing. This is what we have created.
Now this base, the majority of the network has been done for us. And we just add our
own little classifier on top of this. And now we're going to feed some training samples
and data to this. Remember, we're not training this base layer whatsoever. So the only thing
that needs to be learned is the weights and biases on these two layers here. Once we have
that we should have a decent model ready to go. So let's actually train this now. I'm
going to compile this here. I'm picking a learning rate. It's very slow. What essentially
what the learning rate means is how much am I allowed to modify the weights and biases
of this network, which is what I've done, just made that very low, because we don't
want to make any major changes if we don't have to, because we're already using a base
model that exists, right. So we'll set the learning rate, I'm not going to talk about
what this does specifically, you can look that up if you'd like to. And then the loss
function will use binary cross entropy just because we're using two classes. If you're
using more than more than two classes, you just have cross entropy or some other type
of cross entropy. And then what we're going to do is actually evaluate the model right
now before we even train it. So I've compiled it, I've just set what we'll end up using.
But I want to evaluate the model currently, without training it whatsoever on our validation
data validation batches and see what it actually looks like what it actually you know what
we're getting right now, with the current base model being the way it is, and not having
changed the weights and biases, the completely random from the global average pooling in
the dense layer. So let's evaluate. And let's see what we get as an accuracy. Okay, so we
can actually see that with the random weights and biases. For those last layer that we added,
we're getting an accuracy of 56%, which pretty much means that it's guessing, right? It's,
you know, 50% is late to classes. So if we got anything lower than 50, like 50, should
have been our guests, which is what we're getting. So now what we're going to do, and
I actually, I've trained this already, I think so I might not have to do it, again, is a
train this model on all of our images to all of our images, and cats and cats and dogs.
So we've loaded in four, which will allow us now to modify these weights and biases
of this layer. So hopefully, it can determine what features need to be present for a dog
to be a dog and for cat to be a cat, right. And then it can make a pretty good prediction.
In fact, I'm not going to train this in front of us right now, because this actually takes
close to an hour to train just because there is a lot of images that it needs to look at,
and a lot of calculations that need to happen. But when you do end up training this, you
end up getting an accuracy of a close to 92 or 93%, which is pretty good considering the
fact that all we did was use that original layer, like base layer that classified up
to 1000 different images, so very general, and applied that just to cats and dogs by
adding our dense layer classifier on top. So you can see this was kind of the accuracy
I had from training this previously, I don't want to train again, because it takes so long.
But I did want to show that you can save a model and load a model by doing this syntax.
So essentially, on your model object, you can call model dot save, save it as whatever
name you'd like dot each phi, which is just a format for saving models. And Kara's is
specific to Kara's not TensorFlow. And then you can load the model by doing this. So this
is useful because after you train this for an hour, obviously, you don't want to retrain
this if you don't have to, to actually use it to make predictions. So you can just load
the model. Now, I'm not going to go into using them all specifically, you guys can look up
the documentation to do that. We're at the point now where I've showed you so much syntax
on predicting and how we actually use the models. But the basic idea would be to do
model dot predict, right, and then you can see that it's even giving me the input here.
So model dot predict, give it some x batch size verbose, right, because it will predict
on multiple things. And that will spit back to you a class which then you can figure out,
Okay, this is a cat or this is a dog, you're going to pass this obviously the same input
information we had before, which is 160 by 160, by three, and that will make the prediction
for you. So that's kind of the thing there, I was getting an error just because I hadn't
saved this previously. But that's how you save and load models, which I think is important
when you're doing very large models. So when you fit this, feel free to change the box
to be something slower if you'd like, again, right? This takes a long time to actually
end up running. But you can see that the accuracy increases pretty well exponential exponentially,
from when we didn't even have that classifier on it. Now the last thing I want to talk about
is object detection, I'm just going to load up a page, we're not going to do any examples,
I'm just gonna give you a brief introduction, because we're kind of running out of time
for this module, because you can use TensorFlow to do object detection and recognition, which
is kind of cool. So let's get into that now. Okay, so right now I'm on a GitHub page that's
built by TensorFlow here, I'm going to leave that link in the notebook where it says object
detection, so you guys can look at that. But essentially, there is an API for TensorFlow
that does object detection for you. And in fact, it works very well and even gives you
confidence scores. So you can see this is what you'll actually end up getting if you
end up using this API. Now, unfortunately, we don't have time to go through this because
this will take a good amount of time to talk about the setup and how to actually use this
project properly. But if you go through this documentation, you should be able to figure
it out. And now you guys are familiar with TensorFlow, and you understand some of the
concepts here. This runs a very different model than what we've discussed before. Unfortunately,
we don't have time to get into it. But just something I wanted to make clear is that you
can do something like this with TensorFlow. And I will leave that resource so that if
you'd like to check this out, you can use it. There's also a great module in Python
called facial recognition. It's not a part of TensorFlow. But it does use some kind of
convolutional neural network to do facial detection and recognition, which is pretty
cool as well. So I'll put that link in here. But for that, for now, that's going to be
our what is a compositional neural network kind of module. So I hope this has cleared
some things up on how deep vision works and how convolutional neural networks work. I
know I haven't gone into crazy examples, what I've shown you some different techniques,
that hopefully you'll go look up kind of on your own and really dive into because now
you have that base kind of domain knowledge where you're going to be able to follow along
with the tutorial and understand exactly what to do. And if you want to create your own
model, so long as you can get enough sufficient training data, you can load that training
data into your computer. Put that in A NumPy array, then what you can do is create a model
like we've just done using even something like the mobile net, what is it v2 that we
talked about previously, but could even get up, I need to close this output, oh my gosh,
this just was massive output here, where's this begin to pre train model? Yeah, mobile
net v2, you can use the base of that, and then add your own classifier on, do a similar
thing to what I've done with that dense neuron and that global average layer. And hopefully,
you should get a decent result from that. So this is just showing you what you can do,
obviously, you can pick a different base layer, depending on what kind of problem you're trying
to solve. So anyways, that has been conditional neural networks. I hope you
enjoyed that module. Now we're on to recurrent neural networks, which is actually gonna be
pretty interesting. So see you in that module. Hello, everyone, and welcome to the next module
in this course, which is covering natural language processing with recurrent neural
networks. Now, what we're going to be doing in this module here is, first of all, first
off discussing what natural language processing is, which I guess I'll start with here, essentially,
for those of you that don't know, natural language processing, or NLP, for short, is
the field or discipline in computing, or machine learning that deals with trying to understand
natural Earth human languages. Now, the reason we call them natural is because these are
not computer languages, or programming languages, per se. And actually, computers are quite
bad at understanding textual information and human languages. And that's what we've come
up with this entire discipline focused on how they can do that. So we're going to do
that using something called recurrent neural networks. But some examples of natural language
processing would be something like spellcheck, autocomplete, voice assistants, translation
between languages, there's all different kinds of things, chatbots, but essentially, anything
that deals with textual data, so you like paragraphs, sentences, even words, that is
probably going to be classified under natural language processing in terms of doing some
kind of machine learning stuff with it. Now, we are going to be talking about a different
kind of neural network in this series called recurrent neural networks. Now, these are
very good, classifying and understanding textual data. And that's why we'll be using them.
But they are fairly complex. And there's a lot of stuff that goes into them. Now, in
the interest of time, and just not knowing a lot of your math background, I'm not going
to be getting into the exact details of how this works on a lower level, like I did, when
I explained kind of our, I guess, fundamental learning algorithms, which are a bit easier
to grasp. And even just regular neural networks in general, we're going to be kind of skipping
over that, and really focusing on why this works the way it does, rather than how and
when you should use this. And then maybe understanding a few of the different kinds of layers that
have to do with recurrent neural networks. But again, we're not going to get into the
math, if you'd like to learn about that there will be some sources at the bottom of the
guide. And you can also just look up recurrent neural networks. And you'll find lots of resources
that explain all of the fancy math that goes on behind them. Now, the exact applications
and kind of things we'll be working towards here is sentiment analysis, that's the first
kind of task or thing we're going to do, we're actually going to use movie reviews and try
to determine whether these movie reviews are positive or negative by performing sentiment
analysis on them. Now, if you're unfamiliar with sentiment analysis, we'll talk about
it more later. But essentially means trying to determine how positive or negative a sentence
or piece of text is, once you can see why that'd be useful for movie reviews. Next,
we're going to do character slash text generation. So essentially, we're going to use a natural
language processing model, I guess, if you want to call it that, to generate the next
character in a sequence of text for us. And we're going to use that model a bunch of times
to actually generate an entire play. Now, I know this seems a little bit ridiculous
compared to some of the trivial examples we've done before, this will be quite a bit more
code than anything we've really looked at yet. But this is very cool, because we're
actually going to train a model to learn how to write a play. That's literally what it's
going to do, it's going to read through a play, I believe it's Rumi, Romeo and Juliet.
And then we're going to give it a little prompt when we're actually using the model and say,
Okay, this is the first part of the play, write the rest of it, and then it will actually
go and write the rest of the characters in the play. And we'll see that we can get something
that's pretty good using the techniques that we'll talk about. So the first thing that
I want to do is talk about data. So I'm going to hop onto my drawing tablet here. And we're
going to compare the difference between textual data and numeric data like we've seen before,
and why we're going to have to employ some pretty complex and different steps to turn
something like this, you know, a block of text into some meaningful information that
our neural networks actually going to be actually going to be able to understand and process.
So let's go ahead and get over to that. Okay, so now we're going to get into the problem
of how we can turn some textual data into numeric data that we can feed to our neural
network. Now, this is a pretty interesting problem. And we'll kind of go through as we
start going through it, you should see why this is interesting and why there's like difficulties
with the different methods that we pick But the first method that I want to talk about
is something called bag of words, in terms of how we can kind of encode and pre process
text into integers. Now, obviously, I'm not the first person to come up with this bag
of words is a very famous almost, I want to say algorithm or method of converting textual
data to numeric data. Although it is pretty flawed. It only really works for simple tasks.
And we're going to understand why in a second. So we're going to call this bag of words.
Essentially, what bag of words says is, what we're going to do is we're going to look at
our entire training data set Rex, we're going to be turning our training data set into a
form the network can understand. And we're going to create a dictionary lookup of the
vocabulary. Now what I mean by that is, we're going to say that every single unique word
in our data set is the vocabulary, right? That's the amount of words that the model
is expected to understand, because we're going to show all those words to the model. And
we're going to say that every single one of these words, so every single one of these
words in the vocabulary is going to be placed in a dictionary. And Beside that, we're going
to have some integer that represents it. So for example, maybe the vocabulary of our data
set is the words you know, I, a, maybe, Tim, maybe day, me, right, we're gonna have a bunch
of arbitrary words, let's put.dot.to show that this kind of goes to the length of the
vocabulary. And every single one of these words, we place in a dictionary, which we're
just going to call kind of our lookup table or word index table. And we're going to have
a number that represents every single one of them. So you can imagine that in very large data
sets, we're going to have you know, 10s, of 1000s of hundreds of 1000s, sometimes even
maybe millions of different words, and they're all going to be encoded by different integers.
Now, the reason we call this bag of words is because what we're actually going to do
when we look at a sentence is we're only going to keep track of the words that are present
and the frequency of those words. And in fact, what we'll do well is we'll create what we
call a bag, and whenever we see a word appears, we'll simply add its number into the bag.
So if I have a sentence, like, you know, I am, Tim de, de, I'm just going to do like
a random sentence like that, then what we're going to do is, every time we see a word,
we're going to take its number and throw it into the back. So we're gonna say, all right,
I, that's zero AM. That's one, Tim, that's two, day, that's three. That's three again,
and notice that what's happening here is we're losing the ordering of these words, but we're
just keeping track of the frequency. Now there's lots of different ways to kind of format how
we want to do bag of words. But this is the basic idea I'm not going to go too far in
because we're not actually really going to use this technique. But essentially, you lose
the ordering in which words appear, but you just keep track of the frequency and what
words appear. So this can be very useful when you're looking, you know, you're doing very
simple tasks, where the presence of a certain word will really influence the kind of type
of sentence it is, or the meaning that you're going to get from it. But when we're looking
at more complex input, where you know, different words have different meanings, depending on
where they are in a sentence, this is a pretty flawed way to encode this data. Now, I won't
go much further into this, this is not the exact way the bag of words works. But I just
wanted to show you kind of an idea here, which is we just encode every single unique word
by an integer. And then we don't even really care about where these words are, we just
throw them into a bag and we say, Alright, you know, this is our bag right here that
I'm doing the arrow to, we'll just throw in three, as many times as you know, the word
de appears, we'll throw in one as many times as the word, I guess, m appears, and so on
and so forth. And then what will happen is we'll feed this bag to our neural network
in some form, depending on the network that we're using. And it will just look at and
say, OK, so I have all these different numbers, that means these words are present, and try
to do something with it. Now I'm going to show you a few examples of where this kind
of breaks down. But just understand that this is how this works. This is the first technique
called bag of words, which again, we will not be using. So what happens when we have
a sentence where the same word conveys a very different meaning. Right? And I'm actually
I think I have an example on the slides here that I'll go into. Yes. So like this. Okay,
so for our bag of words technique, which we can kind of see here, and maybe we'll go through
it. Let's consider the two sentences. Where are they here? I thought the movie was going
to be bad, but it was actually amazing. And I thought the movie was going to be amazing,
but it was actually bad. Right? So consider these two sentences. Now I know you guys already
know what I'm going to get at. But essentially, these sentences use the exact same words.
In fact, they use the exact same number of words, the exact same words in total. And
well, they have a very different meaning. With our bag of words technique. We're actually
going to encode these two sentences using the exact same representation because remember,
all we do is we care about the frequency What words appear, but we don't care about where
they appear. So we end up losing that meaning from the sentence. Because the sentence, I
thought the movie was going to be bad, but it was actually amazing is encoded and represented
by the same thing as this sentences. So that, you know, obviously is an issue. That's a
flaw. And that's one of the reasons why bag of words is not very good to use. Because
we lose the context of the words within the sentence, we just pick up the frequency and
the fact that these words exist. So that's the first technique that's called bag of words,
I've actually written a little function here that does this. For us, this is not really
the exact way that we would write a bag of words function, but you kind of get the idea
that when I have a text, this is a test to see if this test will work is test a, I've
just did a bunch of random stuff. So we can see, what I'm doing is printing out the bag,
which I get from this function. And you guys can look at this, if you kind of want to see
how this works. And essentially, what it tells us is the word one appears two times. Yes,
the word two appears three times the word three appears three times word four appears
three times 51617. ones, so on, that's the information we get from our back right from
that encoding. And then if we look up here, this is our vocabulary. So this stands for
one is is to AES three, so on, and you can kind of get the idea from that. So that is
how we would use bag of words, right? If we did an encoding kind of like this, that's
what that does. And that's one way of encoding it. Now I'm going to go back and we'll talk
about another method here as well, actually a few more methods before we get into anything
further. Alright, so I'm sure a lot of you were looking at the previous example I did.
And you saw the fact that what I did was completely remove the idea of kind of sequence or ordering
of words, right. And what I did was just throwing in a bank, and I said, Alright, we're just
gonna keep track of the fact that we have, you know, three A's, or we have for those
are seven Tim's right, and we're gonna just going to lose the fact that, you know, words
come after one each other, we're going to lose their ordering in the sentence. And that's
how we're going to encode it. And I'm sure a lot of you are saying, Well, why don't we
just not lose the ordering of those words, we'll just encode every single word with an
integer and just leave it in its space where it would have been in the original string.
Okay, good idea. So what you're telling me to do something like this, you know, Tim,
is here will be our sentence. Let's say we encode the word Tim was zero is one, here's
two. And then that means our translation goes 012. And that means, right, if we have a translation,
say, like 210, even though these use the exact same number of words, and exact same representation
for all these words, well, this is a different sentence. And our model should be able to
tell that because these words come in a different order. And to you good point, if you made
that point. But I'm going to discuss where this falls apart as well, and why we're not
going to use this method. So although this does solve the problem I talked about previously,
where we're going to kind of lose out on the context of a word, there's still a lot of
issues with this. And they come especially when you're dealing with very large vocabularies.
Now let's take an example where we actually have a vocabulary of, say, 100,000 words.
And we know that that means we're going to have to have 100,000 unique mappings from
words to integers. So let's say our mappings are something like this one, maps to the string
happy, the word happy, right? Two maps to sad. And let's say that the string 100,000,
or the number 100,000, maps to the word, I don't know, let's say Good. Now we know as
humans, but kind of just thinking about, let's consider the fact that we're going to try
to classify sentences as a positive or negative. So sentiment analysis, that the words happy
and good in that regard, you know, sentiment analysis are probably pretty similar words,
right. And then if we were going to group these words, we'd probably put them in a similar
group, we classify them as similar words, we could probably interchange them in a sentence,
and it wouldn't change the meaning a whole ton. I mean, it might, but it might not as
well. And then we could say, these are kind of similar. But our model or our encoder,
right, whatever we're doing to translate our text into integers here, has decided that
100,000 is going to represent good and one is going to represent happy and well, there's
an issue with that, because that means when we pass in something like one, or 100,000
to our model, it's gonna have a very difficult time determining the fact that one and 100,000,
although they're 99,999 kind of units apart, are actually very similar words. And that's
the issue we get into when we do something like this is that the numbers we decide to
pick to represent each word are very important. And we don't really have a way of being able
to look at words, group them, and saying, Okay, well, we need to put all of the happy
words in the range of zero to 100. All of the like adjectives in this range, we don't
really have a way to do that. And this gets even harder for a model when we have these
arbitrary mappings, right? And then we have something like to in between where two is
very close to one, right? Yet these words are complete opposites. In fact, I'd say they're
probably polar opposites. Our model trying to learn that the difference between one and
two is actually way larger. than the difference between one and 100,000 is going to be very
difficult. And say it's even able to do that as soon as we throw in the mapping 900. Not
right, the 99,900 we put that as bad. Well, now it gets even more difficult, because it's
now like, Okay, what the range is this big, then that means these words are actually very
similar. But then you throw another word in here like this, and it messes up the entire
system. So that's kind of what I want to show is that that's where this breaks apart on
these large vocabularies. And that's what I'm going to introduce us now to another concept
called word embeddings. Now, what word embeddings does is essentially try to find a way to represent
words that are similar using very similar numbers. And in fact, what a word embedding
is actually going to do. And I'll talk about this more in detail as we go on is classify
or translate every single one of our words into a vector. And that vector is going to
have some, you know, n amount of dimensions, usually, we're going to use something like
64, or maybe 128 dimensions for each vector. And every single component of that vector
will kind of tell us what group it belongs to, or how similar it is to other words, so
let me give you an idea what I mean. So we're going to create something called a word embeddings.
Now don't ask why it's called embeddings. I don't know the exact reason but I believe
it's to have has to do something with the fact that they're vectors. And let's just
say we have a 3d plane like this. So we've already kind of looked at what vectors are
before. So I'll skip over explaining them. And what we're going to do is take some word.
So let's say we have the word good. And instead of picking some integer to represent it, we're
going to pick some vector, which means we're going to draw some vector in this 3d space,
actually, let's make this a different color. Let's make this vector say red, like this.
And this vector represents this word good. And in this case, we'll say we have x 1x 2x. Three is our dimensions, which means
that every single word in our data set will be represented by three coordinates. So one
vector with three different dimensions, where we have x 1x, two and x three. And our hope
is that by using this word embeddings layer, and we'll talk about how it accomplishes this
in a second is that we can have vectors that represent very similar words being very similar,
which means that you know, if we have the vector good here, we would hope the vector
happy from our previous example, right would be a vector that points in a similar direction
to it, that is kind of a similar looking thing where the angle between these two vectors,
right, and maybe I'll draw it here, so we can see is small so that we know that these
words are similar. And then we would hope that if we had a word that was much different,
maybe say like the word bad, that that would point in a different direction, the vector
that represents it, and that that would tell our model, because the angle between these
two vectors is so big, that these are very different words, right? Now, in theory, does
the embedding word layer work like this? You know, not always. But this is what it's trying
to do is essentially pick some representation in a vector form for each word. And then these
vectors, we hope, if there's similar words are going to be pointing in a very similar
direction. And that's kind of the best explanation of a word embeddings layer I can give you.
Now, how do we do this, though? How do we actually you know, go from Word to vector,
and have that be meaningful? Well, this is actually what we call a layer. So word embeddings
is actually a layer, it's something we're going to add to our model. And that means
that this actually learns the embeddings for our words. And the way it does that is by
trying to kind of pick out context in the sentence and determine based on where a word
is in a sentence, kind of what it means, and then encodes it doing that. I know, that's
kind of a rough explanation to give to you guys, I don't want to go too far into Word
embeddings, in terms of the math, because I don't want to get, you know, waste our time
or get too complicated if we don't need to, but just understand that our word embeddings
are actually trained, and that the model actually learns these word embeddings as it goes. And
we hope that by the time it's looked at enough training data, it's determined really good
ways to represent all of our different words, so that they make sense to our model and the
further layers. And we can use pre trained word embedding layers if we'd like just like
we use that pre trained convolutional base in the previous section. And we might actually
end up doing that, actually, probably not in this tutorial, but it is something to consider
that you can do that. So that's how word embeddings work. This is how we encode textual data.
And this is why it's so important that we kind of consider the way that we pass information
to our neural network, because it makes a huge difference. Okay, so now that we've talked
about kind of the form that we need to get our data in before we can pass it further
in the neural network, right before we can get past that embedding layer, before it can
get put in, put into any dense neurons before we can even really do any math with it. We
need to turn it into numbers, right our textual data so now that we know that it's time to
talk about recurrent neural networks. Now recurrent neural networks are the type of
networks we use when we process textual data. Typically, you don't always have to use these
but they are just the best for natural language processing. And that's why They're kind of
their own class right? Now the fundamental difference between a recurrent neural network
and something like a dense neural network or a convolutional neural network, is the
fact that it contains an internal loop. Now, what this really means is that the recurrent
neural network does not process our entire data at once. So it doesn't process the entire
training example, or the entire input to the model at once. What it does is processes it
at different time steps, and maintains what we call an internal memory, and kind of an
internal state so that when it looks at a new input, it will remember what it seen previously,
and treat that input based on kind of the context of the understanding, it's already
developed. Now, I understand that this doesn't make any sense right now, with a dense neural
network, or the neural networks we looked at so far, we call those something called
feed forward neural networks. What that means is we give all of our data to it at once,
and we pass that data from left to right, or I guess for you guys from left to right.
So we give all of the information, you know, we would pass those through the convolutional
layer to start, maybe we pass them through dense neurons, but they get given all of the
info. And then that information gets translated through the network to the very end again,
from left to right. Whereas here, with recurrent neural networks, we actually have a loop,
which means that we don't feed the entire textual data at once we actually feed one
word at a time, he processes that word, generate some output based on that word, and uses the
internal memory state that is keeping track of to do that as part of the calculation.
So essentially, the reason we do this is because just like humans, when we, you know, look
at text, we don't just take a photo of this text and process it all at once we read it left to right, word to word. And
based on the words that we've already read, we start to slowly develop an understanding
of what we're reading, right? If I just read the word Now, that doesn't mean much to me,
if I just read the word in code, that doesn't mean much. Whereas if I read the entire sentence,
now that we've learned a little bit about how we can encode text, I start to develop
an understanding about what this next word means based on the previous words before it
right. And that's kind of the point here is that this is what a recurrent neural network
is going to do for us, it's going to read one word at a time, and slowly start building
up its understanding of what the entire textual data means. And this works in kind of a more
complicated sense than that will draw it out a little bit. But this is kind of what would
happen if we um, I guess unraveled a recurrent layer, because recurrent neural network, yes,
it has a loop in it. But really, the recurrent aspect of a neural network is the layer that
implements this recurrent functionality with a loop. Essentially, what we can see here
is that we're saying x is our input, and h is our output, x t is going to be our input
at time t, whereas each T is going to be our output at time t, if we had a text of, say,
length four, so for words like we've encoded them into integers, now at this point, the
first input at time zero will be the first word into our network, right, or the first
word that this layer is going to see. And the output at that time is going to be our
current understanding of the entire text after looking at just that one word. Next, what
we're going to do is process input one, which will be the next word in the sentence. But
we're going to use the output from the previous kind of computation of the previous iteration.
To do this, so we're going to process this word in combination with what we've already
seen, and then have a new output, which hopefully should now give us an understanding of what
those two words mean. Next, we'll go to the third word, and so forth, and slowly start
building our understanding what the entire textual data means by building it up one by
one, the reason we don't pass the entire sequence at once is because it's very, very difficult
to just kind of look at this huge blob of integers and figure out what the entire thing
means. If we can do it one by one and understand the meaning of specific words based on the
words that came before it and start learning those patterns, that's going to be a lot easier
for a neural network to deal with, than just passing it all at once looking at it in trying
to get some output. And that's why we have these recurrent layers. There's a few different
types of them. And I'm going to go through them, and then we'll talk a little bit more
in depth of how they work. So the first one is called long short term memory. And actually,
in fact, before we get into this, let's, let's talk about just a firt, like a simple layer
so that we kind of have a reference point before going here. Okay, so this is kind of
the example I want to use here to illustrate how a recurrent neural network works and a
more teaching style rather than what I was doing before. So essentially, the way that
this works is that this whole thing that I'm drawing here, right, all of this circle stuff
is really one layer. And what I'm doing right now is breaking this layer apart and showing
you kind of how this works in a series of steps. So rather than passing all the information
at once, we're going to pass it as a sequence, which means that we're going to have all these
different words. And we're going to pass them one at a time to the code to the layer right
to this recurrent layer. So we're going to start from this left side over here, as well
as right, you know, start over here at time step zero, that's what zero mean. So time
step is just you know the order. In this case, this is the first word. So let's say we have
the sentence Hi, I am Tim, right, we've broken these down into vectors, they've been turned
into their numbers, I'm just writing them here. So we can kind of see what I mean in
like a natural language. And they are the input to this recurrent layer. So all of our
different words, right, that's how many kind of little cells we're going to draw here is
how many words we have in this sequence that we're talking about. So in this case, we have
four, right four words. So that's why I've drawn four cells to illustrate that. Now,
what we do is that time step zero, the internal state of this layer is nothing, there's no
previous output, we haven't seen anything yet. Which means that this first kind of cell,
which is what I'm looking at right here, what I'm drawing in this first cell, is only going
to look and consider this first word and kind of make some prediction about it and do something
with it, we're gonna pass high to the sell, some math is going to go on in here. And then
what it's going to do is it's going to output some value, which you know, tells us something
about the word high, right, some numeric value, we're not going to talk about what that is,
but it's gonna be there's gonna be some output. Now, what happens is after the cell has finished
processing this, so right, so this one's done, this is completed h zero, the outputs there,
we'll do a check mark to say that that's done, it's finished processing, this output gets
fed into actually the same thing. Again, we're kind of just keeping track of it. And now
what we do is we processed the next input, which is AI. And we use the output from the
previous cell to process this and understand what it means. So now, technically, we should
have some output from the previous cell. So from whatever high was right, we do some analysis
on the word AI, we kind of combine these things together. And that's the output of this cell
is our understanding of not only the current input, but the previous input with the current
input. So we're slowly kind of building up our understanding of what this word AI means,
based on the words we saw before. And that's the point I'm trying to get at is that this
network uses what it's seen previously, to understand the next thing that it sees it's
building a context is trying to understand not only the word but what the word means,
you know, in relation to what's come before it. So that's what's happening here. So then
this output here, right, we get some output, we finish this, we get some output h one,
H, one is passed into here. And now we have the understanding of what high and
AI means. And we add em like that, we do some kind of computations, we build an understanding
what the sentences, and then we get the output h2, that passes to H three. And they'll Finally
we have this final output H three, which is going to understand hopefully, what this entire
thing means. Now, this is good, this works fairly well. And this is called a simple RNN
layer, which means that all we do is we take the output from the previous cell of the previous
iteration, because really all of these cells is just an iteration, almost an a for loop,
right based on all the different words in our sequence. And we slowly start building
to that understanding as we go through the entire sequence. Now, the only issue with
this is that as we have a very long sequence, so sequences of length, say 100, or 150, the
beginning of those sequences starts to kind of get lost as we go through this because
remember, all we're doing right is the output from h2 is really a combination of the output
from h zero and H one. And then there's a new word that we've looked at, and H three
is now a combination of everything before it and this new word. So it becomes increasingly
difficult for our model to actually build a really good understanding of the text in
general, when the sequence gets long, because it's hard for me to remember what it seemed
at the very beginning because that is now so insignificant, there's been so many outputs
tacked on to that, that is hard for it to go back and see that if that makes any sense.
Okay, so what I'm going to do now is try to explain the next layer we're going to look
at which is called LS tm. So the previous layer, we just looked at the recurrent layer
was called a simple RNN layer. So simple recurrent neural network layer, whatever you want to
call it, right simple, recurrent layer. Now we're going to talk about the layer which
is lsdm, which stands for long short term memory. Now long and short, are hyphenated
together. But essentially what we're doing and it just gets a little bit more complex,
but I won't go into the math is we add another component that keeps track of the internal
state. So right now, the only thing that we were tracking As kind of our internal state
as the memory for this model was the previous output. So whatever the previous output was,
so for example, at time zero here, there was no previous output, so there was nothing being
kept in this model. But at time one, the output from this cell right here was what we were
storing. And then at sell to, the only thing we were storing was the output at time one,
right, and we've lost now the output from time zero, what we're adding in long short
term memory is an ability to access the output from any previous state at any point in the
future when we want it. Now, what this means is that rather than just keeping track of
the previous output, we'll add all of the outputs that we've seen so far into what
I'm going to call my little kind of conveyor belt, it's going to run at the top up here,
I know it's kind of hard to see, but it's just what I'm highlighting, it's almost just
like a lookup table that can tell us the output at any previous cell mean that we want. So
we can kind of add things to this conveyor belt, we can pull things off, we can look
at them. And this just adds a little bit of complexity to the model, it allows us to not
just remember the last state, but look anywhere at any point in time, which can be useful.
Now, I don't want to go into much more depth about exactly how this works. But essentially,
you know, just think about the idea that as the sequence gets very long, it's pretty easy
to forget the things we saw at the beginning. So if we can keep track of some of the things
we've seen at the beginning, and some of the things in between on this little conveyor
belt, and we can access them whenever we want, then that's going to make this probably a
much more useful layer, right, we can look at the first sentence and the last sentence
of a big piece of text at any point that we want, and say, okay, you know, this tells
us x about the meaning of this text, right. So that's what this lsdm does. I again, I
don't want to go too far, we've already spent a lot of time kind of covering, you know,
recurrent layers and how all this works. Anyways, if you do want to look it up some great mathematical
definitions, again, I will source everything at the bottom of this document, so you can
go there. But again, that's lsdm long short term memory. That's what we're going to use
for some of our examples. Although simple, RNN does work fairly well for shorter length
sequences. And again, remember, we're treating our text as a sequence. Now, we're going to
feed each word into the recurrent layer, and it's going to slowly start to develop an understanding
as it reads through each word right and processes that Okay, so now we are on to our first example,
where we're going to be performing sentiment analysis on movie reviews to determine whether
they are positive reviews or negative reviews. Now, we already know what sentiment means.
That's essentially what I just described. So picking up you know whether a block of
text is considered positive or negative. And for this example, we're gonna be using the
movie review data sets. Now, as per usual, this is based off of this TensorFlow tutorial
slash guide, I found this one kind of confusing to follow on the TensorFlow website. But obviously,
you can follow along with that if you not prefer that version overmind. But anyways,
we're going to be talking about the movie review data set. So this data set is straight
from Kara's, and contains 25,000 reviews, which are already pre processed and labeled.
Now what that means for us is that every single word is actually already encoded by an integer.
And in fact, they've done kind of a clever encoding system where what they've done is
said, if a character is encoded by say, integer zero, that represents how common that word
is in the entire data set. So if an integer was encoded by or non integer, a word was
encoded by integer three, that would mean that it is the third most common word in the
data set. And in this specific data set, we have vocabulary size of 88,584, unique words,
which means that something that was classified as this, so 88,584, would be the least common
word in the data set. So certainly keep in mind, we're going to load in the data set
and do our imports just by hitting run here. And as I've mentioned previously, you know,
I'm not going to be typing this stuff out, it's just kind of a waste of time, I don't
have all the syntax memorize, I would never expect you guys to memorize this either. But
what I will do is obviously walk through the code step by step, and make sure you understand
why it is that we have what we have here. Okay. So what we've done is defined the vocabulary
size, the max length of review, and the batch size. Now, what we've done is just loaded
in our data set, by defining the vocabulary size. So this is just the words that will
include so in this case, all of them. Then we have trained data, train labels, test data,
test labels, and we can look at a review and see what it looks like by doing something
like this. So this is an example of our first review, we can see kind of the different encodings
for all of these words. And this is what it looks like they're already in integer form.
Now, just something to note here is that the length of our reviews are not unique. So if
I do the Len of train data, I guess I wouldn't say unique, but I mean, they're just all different.
So the lead of trained at zero is different than the land of trained data one, right?
So that's something to consider, as we go through this and something we're actually
going to have to handle. Okay, so More pre processing. So this is what I was talking
about. If you have a look at our loaded interviews, we'll notice they're of different lengths.
This is an issue, we cannot pass different like data into our neural network, which is
true. Therefore, we must make each review the same length. Okay, so what we're going
to do for now is we're actually going to pad our sequences. Now, what that means is, we're
going to follow this kind of step that I've talked about here. So if the review is greater
than 250 words, we will trim off extra words, if the review is less than 250 words will
add the necessary amount of this should actually be zeros in here, let's fix this of zeros
to make it equal to 250. So what that means is, we're essentially going to add some kind
of padding to a review. So in this case, I believe we're actually going to pad to the
left side, which means that say we have a review of length, you know, 200, we're going
to add 50, just kind of blank words, which will represent with the index zero to the
left side of the review to make it the necessary length. So that's, that's good, we'll do that.
So if we look at train data and test data, what this does is we're just gonna use something
from Kara's, which we've imported above. So we're saying from Kara's pre processing, import
sequence, again, we're treating our text data as a sequence, as we've talked about, we're
gonna say sequence dot pad sequences, train data, and then we define the length that we
want to pad it to. So that's what this will do, it will perform these steps that we've
already talked about. And again, we're just going to assign test data and train data to
you know, whatever this does for us, we pass the entire thing, it'll pass all of them for
us at once. Okay, so let's run that. And then let's just have a look at say, train data
one now, because remember, this was like 189, right? So if we look at train data, so train
underscore data, one, like that, we can see those an array with a bunch of zeros before,
because that is the padding that we've employed to make it the correct length. Okay, so that's
padding, that's something that we're probably gonna have to do most of the time when we
feed something to our neural networks. Alright, so the next step is actually to create the
model. Now, this model is pretty straightforward. We have an embedding layer and lsdm, and a
dense layer here. So the reason we've done dense with the activation function of sigmoid
at the end, is because we're trying to pretty much predict the sentiment of this, right,
which means that if we have the sentiment between zero and one, then if a number is
greater than 0.5, we could classify that as a positive review. And if it's less than 0.5,
or equal, you know, whatever you want to set the bounds at, then we can say that's a negative
review. So sigmoid, as we probably might recall, squishes, our values between zero and one,
so whatever the value is, at the end of the network will be between zero and one, which
means that you know, we can make the accurate prediction. Now here, the reason we have the
embedding layer, like Well, we've already pre processed our review is, even though we've
pre processed this with these integers, and they are a bit more meaningful than just our
random lookup table that we've talked about before, we still want to pass that to an embedding
layer, which is going to find a way more meaningful representation for those numbers than just
there integer values already. So it's going to create those vectors for us. And this 32
is denoting the fact that we're going to make the output of every single one of our embeddings,
or vectors that are created 32 dimensions, which means that when we pass them to the
lsdm layer, we need to tell the lsdm layer, it's going to have 32 dimensions for every
single word, which is what we're doing. And this will implement that long short term memory
process we talked about before, and output the final output to tf Kerris layers dense,
which will tell us you know, that's what this is right? It'll make the prediction. So that's
what this model is, we can see, let me give us a second to run here, the model summary,
which is already printed out, we can look at the fact that the embedding layer actually
has the most amount of parameters, because essentially, it's trying to figure out, you
know, all these different numbers, how can we convert that into a tensor of 32 dimensions,
which is not that easy to do. And this is going to be the major aspect that's being
trained. And then we have our lsdm layer, we can see the parameters there. And our final
dense layer, which is eight getting 33 parameters. That's because the output from every single
one of these dimensions 32 plus a bias node, right that we need. So that's what we'll get
there. You can see monitored summary, we get the sequential model. Okay. So training. Alright,
so now it's time to compile and train the model, you can see I've already trained mine.
What I'm going to say here is if you want to speed up your training, because this will
actually take a second and we'll talk about why we pick these things in a minute is go
to runtime, change runtime type, and add a hardware accelerator of GPU. What this will
allow you to do is utilize a GPU while you're training, which should speed up your training
by about 10 to 20 times. So I probably should have mentioned that beforehand. But you can
do that. And please do for these examples. So model compile. Alright, so we're compiling
our model. We're picking the loss function as binary cross entropy. The reason we're
picking this is because this is going to essentially tell us how far away From the correct probability,
right, because we have two different things we could be predicting. So you know, either
zero or one so positive or negative. So this will give us a correct loss for that kind
of problem that we've talked about before. The optimizer, we're gonna use RMS prop again,
I'm not going to discuss all the different optimizers, you can look them up if you care
that much about what they do. And we're gonna use metrics as ACC, one thing I will say is
the optimizer is not crazy important. For this one, you can use atom if you wanted to,
and it would still work fine. My usual go to is just use the atom optimizer unless you
think there's a better one to use. But anyways, that's something to mention. Okay, so finally,
we will fit the model, we've looked at the syntax a lot before. So model that fit will
give the training data, the training labels, the epochs, and we'll do a validation split
of 20%, such as 0.2 stands for, which means that what we're going to be doing is using
20% of the training data to actually evaluate and validate the models we go through. And
we can see that after training, which I've already done, and you guys are welcome to
obviously do on your own computer, we kind of stall out an evaluation accuracy of about
88%. Whereas the model actually gets overfit to about 97 98%. So what this is telling us
essentially, is that we don't have enough training data and that after we've even done
just one epoch, we're pretty much stuck on the same validation accuracy, and that there's
something that needs to change in the model to make it better. But for now, that's fine.
We'll leave it the way that it is. Okay, so now we can look at the results. I've already
did the results here, just to again, speed up some time. But we'll do the evaluation
on our test data and test labels to get a more accurate kind of result here. And that
tells us we have an accuracy of about 85.5%, which you know, isn't great, but it's decent,
considering that we didn't really write that much code to get to the point that we're at
right now. Okay, so that's what we're getting the models been trained. Again, it's not too
complicated. And now we're on to making predictions. So the idea is that now we've trained our
model, and we want to actually use it to make a prediction on some kind of movie review.
So since our data was pre processed, when we gave it to the model, that means we actually
need to process anything, we want to make a prediction on in the exact same way, we
need to use the same lookup table, we need to encode it, you know, precisely the same.
Otherwise, when we give it to the model, it's going to think that the words are different,
and it's not going to make an accurate prediction. So what I've done here is I've made a function
that will encode any text into what he called the proper pre processed kind of integers,
right, just like our training data was pre processed, that's what this function is going
to do for us is pre processed some line of text. So what I've done is actually gotten
the lookup table. So essentially, the mappings from IBM IB I am IMDb could read that properly
from that data set that we loaded earlier. So let me go see if I can find where I defined
IMDb. You can see up here. So Kara's dot data sets import IMDb, just like we loaded it in,
we can also actually get all of the word indexes in that map, we can actually print this out
if we want to look at what it is after. But anyways, we have that mapping, which means
that all we need to do is Cara's pre processing text, text to word sequence What this means
is give given some text, convert all of that text into what we call tokens, which are just
the individual words themself. And then what we're going to do is just use a kind of for
loop inside of here that says word index at word if word in Word index, LC, euro for word
and tokens. Now what this means is essentially, if the word that's in these tokens now is
in our mapping, so in that vocabulary of 88,000 words, then what we'll do is replace its location
in the list with that specific word, or with that specific integer that represents it.
Otherwise, we'll put zero just to stand for, you know, we don't know what this character
is. And then what we'll do is return sequence dot pad sequences. And we'll pad this token
sequence. And just return actually the first index here. The reason we're doing that is
because this pad sequences works on a list of sequences, so multiple sequences. So we
need to put this inside a list, which means that this is going to return to us a list
of lists. So we just obviously want the first entry because we only want you know that one
sequence that we padded. So that's how this works. Sorry, it's a bit of a mouthful to
explain. But you guys can run through and print this stuff out if you want to see how
all of it works specifically. But yeah, so we can run this cell and have a look at what
this actually does for us on some sample text. So that movie was just amazing. So amazing.
We can see we get the output that we were kind of expecting so integer encoded words
down here, and then a bunch of zeros just for all the padding. Now while we're at it,
I decided why not? Why don't we make a decode function so that if we have any movie review
like this, that's in the integer form, we can decode that into the text value. So the
way we're going to do that is start by reversing the word index that we just created. Now the
reason for that is the word index we looked at which is this right? goes from Word to
integer. But we actually now want to go from integer to word so that we can actually translate
a sentence, right. So what I've done is made this decode integers function, we've set the
padding key zero, which means that if we see zero, that's really just means you know nothing's
there, we're going to create a text string, which we're going to add to that I'm ucsa.
For num in integers, integers is our input, which will be a list that looks something
like this, or an array, whatever you want to call it, we're gonna say if number does
not equal pad, so essentially, if the number is not zero, right, it's not padding, then
what we'll do is add the lookup of reverse word index num. So whatever that number is,
into this new string plus a space and then just return text colon negative one, which
means return everything except the last space that we would have added. And then if I print
the decode integers, we can see that this encoded thing that we had before, which looks
like this gets encoded by the string, that movie was just amazing soulmates are not encoded
decoded, because this was the encoded form. So that's how that works. Okay, so now it's
time to actually make a prediction. So I've written a function here that will make a prediction
on some piece of text as the movie review for us. And I'll just walk us through quickly
how this works, then I'll show us the actual output from our model, you know, making predictions
like this. So what we say is, we'll take some parameter text, which will be our movie review.
And we're going to encode that text using the ENCODE text function we've created above.
So just this one right here, that essentially takes our sequence of words, we get the pre
processing, so turn that into a sequence, remove all the spaces, whatnot, you know,
get the words, then we turn those into the integers, we have that we return that. So
here we have our proper pre processed text, then what we do is we create a blank NumPy
array, that is just a bunch of zeros, that's in the form one 254 in that shape. Now, the
reason I'm putting in that in that shape is because the shape that our model expects is
something 250, which means some number of entries, and then 250, integers representing
each word, right, because that's the length of movie review, is what we've told the model
is like 250. So that's the length of the review, then what we do is we put press zero, so that's
what's up here, equals the encoded text. So we just essentially insert our one entry into
this, this array we've created, then what we do is say model dot predict on that array,
and just return and print the result zero. Now, that's pretty much all there is to it.
I mean, that's how it works. The reason we're doing result zero is because again, model
is optimized to predict on multiple things, which means like, I would have to do you know,
list of encoded text, which is kind of what I've done by just doing this prediction lines
here, which means it's going to return to me an array of arrays. So if I want the first
prediction, I need to index zero, because that will give me the prediction for our first
and only entry. Alright, so I hope that makes sense. Now we have a positive review, I've
written in a negative review, and we're just going to compare the analysis on both of them.
So that movie was so awesome, I really loved it, and would watch it again, because it was
amazingly great. And then that movie sucked, I hated it and wouldn't watch it again, was
one of the worst things I've ever watched. So let's look at this. Now, we can see the
first one gets predicted at 72% positive, whereas the other one is 23%. positive. So
essentially, what that means is that, you know, if the lower the number, the more negative
we're predicting it is, the higher the number, the more positive we're predicting it is if
we wanted to not just print out this value, and instead what we wanted to do was print
out, you know, positive or negative, we could just make a little if statement, it says if
this number is greater than 0.5, say positive, otherwise say not say negative, right. And
I just want to show you that changing these reviews ever so slightly actually makes a
big difference. So if I remove the word Awesome, so that movie was so and then I run this,
you can see that, oh, wow, this actually increases and goes up to 84%. Right? So the presence
of certain words and certain locations actually makes a big difference. And especially when
we have a shorter length review, right? If we have a longer length review, it won't make
that big of a difference. But even the removal of a few words here. And let's see. So the
removing the word awesome changed it by almost like 10%. Right? Now, if I move. So let's
see if that makes a bigger difference, it makes very little difference because it's
learned at least the model, right that the word so doesn't really make a huge impact
into the type of review. Whereas if I remove the word I let's see if that makes a big impact.
Probably not right now it goes back up to 84. So that's cool. And that's something to
play with is removing certain words and seeing how much impact those actually carry. And
even if I just add the word great, like would great. Watch it again, just in the middle
of the sentence doesn't have to make any sense. Let's look at this here. Oh, boom, we increase
like a little bit right? And let's say if I add this movie, you really suck. Let's see
if that makes a difference. No, that just reduces it like a tiny bit. So swing cool,
something to play with me. Now let's move on to the next example. So now we're on to
our last and final example, which is going to be creating a recurrent neural network
plate generator, this is going to be the first kind of neural network we've done, that's
actually going to be creating something for us. But essentially, what we're going to do
is make a model that's capable of predicting the next character in a sequence. So we're
going to give it some sequence as an input. And what it's going to do is just simply predict
the most likely next character. Now there's quite a bit that's going to go into this,
but the way we're going to use this to predict a play is we're going to train the model on
a bunch of sequences of texts from the play Romeo and Juliet. And then we're going to
have it so that we'll ask the model, we'll give it some starting prompt some string to
start with. And that'll be the first thing we pass to it, it will predict to us with
the most likely next character for that sequence is, and we'll take the output from the model
and feed it as the input again to the model and keep predicting sequence of characters.
So keep predicting the next character from the previous output as many times as we want
to generate an entire play. So we're gonna have this neural network that's capable of
predicting one letter at a time, actually end up generating an entire play for us by
running it multiple times on the previous output from the last iteration. Now, that's
kind of the problem. That's what we're trying to solve. So let's go ahead and get into it
and talk about what's involved in doing this. So the first thing we're going to do, obviously,
is our imports. So from Kara's pre processing, import sequence, import Kara's we need TensorFlow
NumPy, and oh, S. So we'll load that in. And now what we're gonna do is download the file,
so the data set for Romeo and Juliet, which we can get by using this line here. So Kerris
has this utils thing, which will allow us to get a file, save it as whatever we want.
In this case, we're gonna save it as Shakespeare dot txt, and we're going to get that from
this link. Now, I believe this is just some like shared drive that we have access to from
Kara's, so we'll load that in here. And then this will simply give us the path on this
machine, because remember, this is Google Collaboratory, to this text file. Now, if
you want, you can actually load in your own text data. So we don't necessarily need to
use the Shakespeare play, we can use anything we want. In fact, an example that I'll show
later is using the B movie script. But the way you do that is run this block of code
here. And you'll see that it pops up this thing for choose files, just choose a file
from your local computer. And then what that will do
is just save this on Google Collaboratory. And then that will allow you to actually use
that. So make sure that's a txt file that you're loading in there. But regardless, that
should work. And then from there, you'll be good to go. So if you, you know, you don't
need to do that, you can just run this block of code here, if you want to load in the Shakespeare.
txt, but otherwise, you can load in your own file. Now, after we do that, we want to do
is actually open this file. So remember, that was just saving the path to it. So we'll open
that file in RB mode, which is read bytes mode, I believe. And then we're going to say
dot read, and we're going to read that in as an entire string, we're going to decode
that into UTF, eight format. And then we're just printing the length of the text or the
amount of characters in the text. So if we do that, we can see we have the length of
the text is 1.1 million characters approximately. And then we can have a look at the first 250
characters by doing this. So we can see that this is kind of what the plate looks like
we have whoever speaking colon, then some line, whoever speaking colon, some line, and
there's all these brake lines. So backslash ends, which are telling us, you know, go to
the next line, right. So it's going to be important, because we're going to hope that
our neural network will be able to predict things like brake lines and spaces, and even
this kind of format as we teach it more and get further in. But now it's time to talk
about encoding. So obviously, all of this text is in text form, it's not pre processed
for us, which means we need to pre process it and encode it as integers before we can
move forward. Now fortunately, for us, this problem is actually a little bit easier than
the problem we discussed earlier with encoding words, because what we're going to do is simply
encode each character in the text with an integer. Now, you can imagine why this makes
this easier because there really is a finite set of characters. Whereas there's kind of
indefinite, or, you know, I guess, infinite amount of words that could be created. So
we're not really going to run into the problem where, you know, two words are encoded with
such differ two characters are encoded with such different integers. That makes it difficult
for the model to understand because, I mean, we can look at what the value vocab is here,
we're only going to have so many characters in the text. And for characters, it just doesn't
matter as much because you know, an R isn't like super meaningful compared to an A. So
we can kind of encode in a simple format, which is what we're going to do. So essentially,
we need to figure out how many unique characters are in our vocabulary. So to do that, we're
going to say vocab equals sorted set text. This will sort all of the unique characters
in the text. And then what we're going to do is create a mapping from unique characters
to index indices. So essentially we're gonna say UI for IU in a new enumerate vocabulary.
What this will do is give us, essentially zero, whatever the string is, one, whatever
the string is to whatever the string is, for every single letter or character in our vocabulary,
which will allow us to create this mapping. And then what we'll do is just turn this initial
vocabulary into a list or into an array. So we can just use the index at which a letter
appears as the reverse mapping. So going from index to letter, rather than lettered index,
which is what this one's doing here. Next, I've just written a function that takes some
text and converts that to an int, or the into representation for it just to make a little
bit easier for us as we get later on in the tutorial. So we're just going to say NP dot
array of in this case, and we're just going to convert every single character in our text
into its integer representation by just referencing that character, and putting that in a list
here, and then obviously, converting that to NumPy array. So then, if we wanted to have
a look at how this works, we can say text as int equals text to int text. So remember,
text is that entire loaded file that we had above here. So we're just going to convert
that to its integer representation entirely using this function. And now we can look at
how this works down here. So we can see that the text first citizen, which is the first
13 letters, is encoded by 1840 750-657-5081. And obviously, each character has its own
encoding. And you can go through and kind of figure out what they are based on the ones
that are repeated, right. So that is how that works. Now, I figured while we were at it,
we might as well write a function that goes the other way. So into Tech's reason I'm trying
to convert this to a NumPy array first is just because we're going to be passing in
different objects potentially in here. So if it's not already a NumPy array, it needs
to be a NumPy array, which is kind of what this is doing. Otherwise, we're just going
to pass on that we don't need to convert it to a NumPy array, if it already has one, we
can just join all of the characters from this list into here. So that's essentially what
this is doing for us it just joining into text. And then we can see if we go into text
text is int, colon 13. That translates that back to us first citizen. I mean, you can
look more into this function if you want, but it's not that complicated. Okay, so now
that we have all this text encoded as integers, what we need to do is create some training
examples, it's not really feasible to just pass the entire, you know, 1.1 million characters
to our her model at once for training, we need to split that up into something that's
meaningful. So what we're actually going to be doing is creating training examples where
we have B first, where the training input, right, so the input value is going to be some
sequence of some length, we'll pick the sequence length, in this case, we're actually going
to pick 100, and then the output or the expected output, so I guess, like the label for that
training example, is going to be the exact same sequence shifted right by one character.
So essentially, I put a good example here, our input will be something like hell, right.
Now, our output will be e Ll O. So what it's going to do is predict this last
character, essentially. And these are what our training samples are going to look like.
So the entire beginning sequence, and then the output sequence should be that beginning
sequence minus the first letter, but tack on what the last letter should be. So that
this way, we can look at some input sequence and then predict that output sequence that
you know, plus a character, right. Okay, so that's how that works. So now we're going
to do is define a sequence length of 100, we're going to say the amount of examples
per epoch is going to be the length of the text divided by the sequence length plus one.
The reason we're doing this is because for every training example, we need to create
a sequence input that's 100 characters long. And we need to create a sequence output that's
100 characters long, which means that we need to have 101 characters that we use for every
training example, right? Hopefully, that would make sense. So what this next line here is
going to do is convert our entire string data set into characters. And it's actually going
to allow us to have a stream of characters, which means that it's going to essentially
contain 1.1 million characters inside of this TF dot data set object from tensor slices.
That's what that's doing. Next, so let's run this and make sure this works. All right,
what we're going to do is say sequences is equal to char data set dot batch sequence
length is the length of each batch. So in this case, one to one, and then drop remainder
means let's say that we have, you know, 105 characters in our text. Well, since we need
sequences of length 101, we'll just drop the last four characters of our tax because we
can even put those into a batch. So that's what this is doing for us is going to take
our entire character data set here that we've created, and batch it into length of 101,
and then just drop the remainder. So that's what we're going to do. sequences does. Now
split input target. What this is going to do essentially is just create those training
examples that we needed. So taking this these sequences of 101 length, and converting them
into the input and target text, and I'll show you how they work in a second, we can do this
convert the sequences to that by just mapping them to this function. So that's what this
function does. So if we say sequences dot map, and we put this function here, that means
every single sequence will have this operation applied to it. And that will be stored inside
this data set object. Or I guess you'd say object, but we'll also just say that's it's
gonna be, you know, the variable, right. So if we want to look at an example of how this
works, we can kind of see so it just says example. The input will be first citizen.
Before we proceed any further hear me speak, all speak speak, first citizen you and the
output, notice the first characters gone, starts at I, in the last character is actually
just a space here. Whereas here, it didn't have a space. So you can see there's no space.
Here, there is a space, that's kind of what I'm trying to highlight for you. The next
example, we get are all resolved rather to die rather than family, whatever it goes to
here, right. And then you can see here, we omit that a, and the next letter is actually
a K, right? That's added in there. So that's how that works. Okay, so next, we need to
make training batches. So we're gonna say the batch size equals 64. The vocabulary size
is the length of the vocabulary, which if you remember, all the way back up to the top
of the code was the set, or the sorted set of the text, which essentially told us how
many unique characters are in there, the embedding dimension is 256, the RNN units is 1024. And
the buffer size is 10,000. What we're going to do now is create a data set that shuffles
we're going to switch around all these sequences, they don't get shown in the proper order,
which we actually don't want. And I'm going to batch them by the batch size. So if we
haven't kind of gone over what batching. And all this does before. I mean, you can read
these comments as a straight from the TensorFlow documentation, what we want to do is feed
our model 64 batches of data at a time. So what we're going to do is shuffle all the
data batches into that size, and then again, drop the remainder. If there's not enough
batches, which is what we'll do, we're going to define the embedding dimension, which is
essentially, how big we want every single vector to represent our words are in the embedding
layer, and then the RNN units, I won't really discuss what that is right now. But that's
essentially how many, it's hard to really, I'm just gonna omit describing that for right
now. Because I don't want to butcher an explanation. It's not that important. Anyways, okay, so
now we're gonna go down to building the model. So we've kind of set these parameters up here,
remember what those are, we've batched. And we've shuffled the data set. And again, that's
how this works, you can print it out if you want to see what a batch actually looks like.
But essentially, it's just 64 entries of those sequences, right. So 64 different training
examples is what a batch of that is. Alright, so now we go down here, and we're gonna say
build model, we're actually making a function is going to return to us a built model. The
reason for this is because, right now, we're going to pass the model batches of size 64,
for training, right. But what we're going to do later is save this model, and then we're
going to patch pass it batches of one pieces of you know, training, whatever data so that
you can actually make a prediction on just one piece of data. Because for right now,
what it's going to do is take a batch size of 64, it's gonna take 64 training examples
and return to a 64 outputs. That's what this model is going to be built to do the way we
build it now to start. But later on, we're going to rebuild the model using the same
parameters that we've saved and trained for the model. But change it just be a batch size
of one. So that that way, we can get one prediction for one input sequence, right? So that's why
I'm creating this build model function. Now in here, it's going to have the vocabulary
sizes first argument, the embedding dimension, which remember was 256 as a second argument,
but also these are the parameters up here, right? And then we're going to find the batch
size, as you know, batch size none would this none means is we don't know how long the sequences
are going to be in each batch. All we know is that we're going to have 64 entries in
each batch. And then of those 64 entries. So training examples, right, we don't know
how long each one will be, although in our case, we're going to use ones that are length
100. But when we actually use the model to make predictions, we don't know how long the
sequence is going to be that we input so we leave this nun. Next we'll make an LSTM layer,
which is long short term memory RNN units, which is 1024, which again, I don't really
want to explain, but you can look up if you want return sequences means return the intermediate
stage at every step. The reason we're doing this is because we want to look at what the
model is seeing at the end. immediate steps and not just the final stage. So if you leave
this as false, and you don't set this to true, what happens is this lsdm just returns one
output, that tells us what the model kind of found at the very last time step. But we
actually want the output at every single time step for this specific model. And that's why
we're setting this true stateful, I'm not going to talk about that one, right now, that's
something you can look up if you want. And then recurrent initializer is just what these
values are going to start at, in the lsdm. We're just picking this because this is what
TensorFlow is kind of said is a good default to pick, I won't go into more depth about
that, again, things that you can look up more if you want. Finally, we have a dense layer,
which is going to contain the amount of vocabulary size notes. The reason we're doing this is
because we want the final layer to have the amount of nodes in it equal to the amount
of characters in the vocabulary. This way, every single one of those nodes can represent
a probability distribution that that character comes next. So all of those nodes value some
sum together should give us the value of one. And that's going to allow us to look at that
last layer as a predictive layer, where it's telling us the probability that these characters
come next, we've discussed how that's worked previously with other neural networks. So
let's run this now. Name embedding dem is not defined, which I mean believes I have
not ran this yet. So now we run that, and we should be good. So we look at the model
summary, we can see we have our initial embedding layer, we have our lsdm. And then we have
our dense layer at the end. Now notice 64 is the batch size, right? That's the initial
shape, none is the length of the sequence, which we don't know. And then this is going
to be just the output dimension, or sorry, this is the amount of
values in the vector, right, so we're gonna start with 256, we'll just do 1024 units in
the lsdm. And then 65 stands for the amount of nodes, because that is the length of the
vocabulary. Alright, so combined, that's how many trainable parameters we get, you can
see each of them for each layer. And now it's time to move on to the next section. Okay,
so now we're moving on to the next step of the tutorial, which is creating a loss function
to compile our model with. Now I'll talk about why we need to do this in a second. But I
first want to explore the output shape of our model. So remember, the input to our model
is something that is of length 64, because we're going to have batches of 64 training
examples, right? So every time we feed our model, we're going to give it 64 training
examples. Now, what those training examples are, are sequences of length 100, that's what
I want you to remember, we're passing 64 entries, that are all of length 100, into the model
as its training data, right. But sometimes, and when we make predictions with the model,
later on, we'll be passing it just one entry that is of some variable length, right. And
that's why we've created this build model function. So we can build this model using
the parameters that we've saved later on, once we train the model, and it can expect
a different input shape, right? Because when we're training it, it's gonna be given a different
shape than we're actually testing with it. Know, what I want to do is explore the output
of this model, though, at the current point in time. So we've created a model that accepts
a batch of 64 training examples that are length 100. So let's just look at what the output
is from the final layer, give this a second to run, we get 64 165. And that represents
the batch size, the sequence length, and the vocabulary size. Now the reason for this is
we have to remember that when we create a dense layer as our last layer that has 65
nodes, every prediction is going to contain 65 numbers. And that's going to be the probability
of every one of those characters occurring, right. That's what that does it the last one
for us. So obviously, our last dimension is going to be 65. For the vocabulary size. This
is the sequence length, and that's a batch, I just want to make sure this is really clear
before we keep going. Otherwise, this can get very confusing very quickly. So what I
want to do now is actually look at the length of the example batch predictions, and just
print them out and look at what they actually are. So example batch predictions is what
happens when I use my model on some random input example actually pulled the first one
from my data set with when it's not trained. So I can actually use my model before it's
trained with random weights and random biases and parameters by simply using model and then
I can put little brackets like this and just pass in some example that I want to get a
prediction for. So that's what I'm going to do. I'm going to give it the first batch and
it can even it shows me the shape of this batch 64 100. I'm going to pass that to the
model. And it's going to give us a prediction for that. And in fact, it's actually going
to give us a prediction For every single element in the batch, right, every single training
example in the batch is going to give us a prediction for. So let's look at what those
predictions are. So this is what we get, we get a length 64 tensor, right. And then inside
of here, we get a list inside of a list or an array inside of an array with all these
different predictions. So we'll stop there, for this, like explaining this aspect here.
But you can see we're getting 64 different predictions, because there's 64 elements in
the batch. Now, let's look at one prediction. So let's look at the very first prediction
for say, the first element in the batch, right. So let's do that here. And we see now that
we get a length 100 tensor. And that this is what it looks like, there's still another
layer inside. And in fact, we can see that there's another nested layer here, right,
another nested array inside of this array. So the reason for this is because at every
single time step, which means the length of the sequence, right? Because remember, a recurrent
neural network is going to feed one at a time, every word in the sequence. In this case,
our sequences are length 100, at every time step, we're actually saving that output as
a as a prediction, right, and we're passing that back. So we can see that for one batch one training,
sorry, not one batch one training example, we get 100 outputs. And these outputs are
in some shape, we'll talk about what those are in a second. So that's something to remember
that for every single training example, we get whatever the length of that training example
was outputs, because that's the way that this model works. And then finally, we look at
the prediction at just the very first time step. So this is 100 different time steps.
So let's look at the first time step and see what that prediction is. And we can see that
now we get a tensor of length 65. And this is telling us the probability of every single
character occurring next at the first time step. So that's what I want to walk through,
is showing you what's actually outputted from the model, the current way that it works.
And that's why we need to actually make our own loss function to be able to determine
how, you know, good our models performing, when it outputs something ridiculous that
looks like this, because there is no just built in loss function in TensorFlow that
can look at a three dimensional nested array of probabilities over, you know, the vocabulary
size, and tell us how different the two things are. So we need to make our own loss function.
So if we want to determine the predicted character, from this array, so what we'll go there now,
what we can do is get the categorical, with this call, we can sample the categorical distribution.
And that will tell us the predicted character. So what I mean is, let's just look at this,
and then we'll explain this. So since our model works on random weights and biases,
right now, we haven't trained yet, this is actually all of the predicted characters that
it had. So at every time step at the first time step, a predicted age, then it predicted
hyphen, then age, then G, then u, and so on so forth, you get the point, right. So what we're doing to get this value is we're going
to sample the prediction. So at this, this is just the first time step, actually, we're
sample the prediction. Actually, no, sorry, we're sampling every time step by bad there.
We're gonna say sampled indices equals NP dot reshapes there's reshaping this just changing
the shape of it, we're gonna say predicted characters equals into two text sampled indices.
So it's a really, it's hard to explain all this, if you guys don't have a statistics
kind of background a little bit to talk about why we're sampling, and not just taking the
argument max value of like this array, because you would think that what we'll do is just
take the one that has the highest probability out of here, and that will be the index of
the next predicted character. There's some issues with doing that for the loss function,
just because if we do that, then what that means is, we're going to kind of get stuck
in an infinite loop almost where we just keep accepting the biggest character. So what we'll
do is pick a character based on this probability distribution. Kind of Yeah, again, it's hard,
it's called sampling the distribution, you can look that up if you don't know what that
means. But sampling is just like trying to pick a character based on a probability distribution,
it doesn't guarantee that the character with the highest probability is going to be picked,
it just uses those probabilities to pick it. I hope that makes sense. I know that was like
a really rambley definition, but that's the best I can do. So here, we reshaped the array
and convert all the integers to numbers to see the actual characters. So that's what
these two lines are doing here. And then I'm just showing the predicted characters by showing
you this. And you know, the character here is what was predicted at time step zero to
be the next character, and so on. Okay, so now we can create a loss function that actually
handles this for us. So this is the loss function that we have karass has like a built in one
that we can utilize which is what we're doing What this is going to do is take all the labels
and all of the probability distributions, which is what this is legit. So I'm not going
to talk about that, really. And we'll compute a loss on those. So how different or how similar
those two things are. Remember, the goal of our algorithm and the neural network is to
reduce the loss, right? Okay, so next, we're going to compile the model, which we'll do
here. So we're going to compile the model with the atom optimizer and the loss function
as loss, which we defined here. And now we're going to set up some checkpoints. I'm not
going to talk about how these work, you can kind of just read through this if you want.
And then we're going to train the model. Remember to start your GPU hardware accelerator under
runtime, change runtime type, GPU, because if you do not, then this is going to be very
slow. But once you do that, you can train the model, I've already trained it. But if
we go through this training, we can see it's gonna say train for 172 steps, it's gonna
take about, you know, 30 seconds per epoch, probably maybe a little bit less than that.
And the more epochs you run this for, the better it will get, this is a different, we're
not likely going to overfit here. So we can run this for like, say, 100 epochs if we wanted
to. For our case, let's actually start by just training this on, let's say two epochs,
just to see how it does. And then we'll train it on like 1020 4050. And compare the results.
But you'll notice the more epochs, the better it's going to get. But just like for our case,
we'll start with two and then we'll work our way up. So while that trains will actually explain the next aspect of this
without running the code. So essentially, what we need to do, after we've trained the
model, we've initialized the weights and biases, we need to rebuild it using a new batch size
of one. So remember, the initial batch size was 64, which means that we'd have to pass
it 64 inputs or sequences for to work properly. But now what I've done is I'm going to rebuild
the model and change it to a batch size of one so that we can just pass it some sequence
of whatever length we want, and it will work. So if we run this, we've rebuilt the model
with batch size one, that's the only thing we've changed. And now what I can do is load
the weights by saying model dot load weights, TF dot train, dot latest checkpoint, checkpoint
directory, and then build the model. Using the tensor shape one, none. I know sounds
strange. This is how we do this rebuild the model. One nine is just saying expect the
input one and then none means we don't know what the next dimension length will be. But
here, checkpoint directory is just we've defined where on our computer, we're going to save
these TensorFlow checkpoints. This is just saying this is the was the prefix we're going
to save the checkpoint with. So we're going to do the checkpoint directory. And then checkpoint
epoch where epoch will stand for obviously, whatever epoch we're on. So we'll save checkpoint
here, we'll save a checkpoint at epoch one, a checkpoint at epoch two, to get the latest
checkpoint, we do this. And then if we wanted to load any intermediate checkpoint, say,
like checkpoint 10, which is what I've defined here, we can use this block of code down here.
And I've just hardwired the checkpoint that I'm loading by saying TF dot train dot load
checkpoint, whereas this one just gets the most recent, so we'll get the most recent
which should be checkpoint two for me. And then what we're going to do is generate the
text. So this function, oh, dig into it in a second. But I just want to run and show
you how this works. Because I feel like we've done a lot of work for not very many results
right now. And I'm just gonna type in the string Romeo. And just show you that when
I do this, we give it a second. And it will actually generate an output sequence like
this. So we have Romeo loose give this is the beginning of our sequence that says Lady
Capulet, food martone. Father gnomes come to those shell, right? So it's like pseudo
English. Most of it are like kind of proper words. But again, this is because we train
it on just two epochs. So I'll talk about how we build this in a second. But if you
wanted a better output for this part, then you would train this on more epochs. So now
let's talk about how I actually generated that output. So we rebuilt the model to accept
a batch size of one, which means that I can pass it a sequence of any length. And in fact,
what I start by doing is passing the sequence that I've typed in here, which was Romeo,
then what that does, is we run this function generate text, I just stole this from TensorFlow
as website like I'm stealing almost all of this code. And then we say the number of characters
to generate is 800. The input evaluation which is now what we need to pre process this text
again, so that this works properly, we could use my little function, or we can just write
this line of code here, which does what the function that I wrote does for us, so char
to ID x s for S and start string start string is what we typed in that case Romeo, then
what we're going to do is expand the dimensions. So essentially turn just a list like this that has all these numbers
987 into a double list. So just a nested list because that's what it's expecting as the
input one batch one entry, then what we do is we're going to say the string that we want
to store because we want to print this out at the end, right? We'll put in this text
generated list, temperature equals 1.0. What this will allow us to do is if we change this
value to be higher, well, I mean, you can read the comment here, right, low temperature
results in more predictable text, higher temperature results in more surprising text. So this is
just a parameter to mess with, if you want, you don't necessarily need it. And I would
like I've just left mine at one for now, we're gonna start by resetting the status of the
model. This is because when we rebuild the model, it's gonna have stored the last state
that it remembered when it was training. So we need to clear that before we pass new input
text to it. And we say for i in range num generate, which means how many characters
we want to generate, which is 800. Here, what we're going to do is, say predictions equals
model, input a Val, that's going to start as the start string that's encoded, right.
And then what we're going to do is say predictions equals TF dot squeeze, prediction zero, what
this does is take our predictions, which is going to be in a nested list, and just removes
that exterior dimension. So we just have the predictions that we want, we don't have that
extra dimension that we need to index again. And then we're gonna say using a categorical
distribution to predict the character returned by the model. That's what he writes here.
We'll divide by the temperature, if it's one, that's not going to do anything. And we'll
say predicted ID equals we'll sample whatever the output was from the model, which is what
this is doing. And then we're going to take that output, so the predicted ID, and we're
going to add that to the input evaluation. And then what we're going to say is text generate
dot append, and we're going to convert the text that are integers now, back into a string, and return all of this. Now, I know this seems
like a lot, again, this is just given to us, by TensorFlow to, you know, create this aspect,
you can read through the comments yourself, if you want to understand it more, but I think
that was a decent, decent explanation of what this is doing. So yeah, that is how we can
generate, you know, sequences using a recurrent neural network. Now, what I'm going to do
is go to my other window here, where I've actually typed all the code just in full and
do a quick summary of everything that we've done, just because there was a lot that went
on. And then from there, I'm actually going to train this on a B movie script and show
you kind of how that works in comparison to the Romeo and Juliet. Okay, so what I'm in
now is just the exact same notebook we had before, but I've just pretty much copied all
the text in here. Or it's the exact same code we had before. So we just don't have all that
other text in between. So I can kind of do a short summary of what we did, as well as
show you how this worked when I trained it on the B movie script. So I did mention that
I was going to show you that I'm not lying, I will show you can see I've got B movie dot
txt loaded in here. And in fact, actually, I'm gonna just show you the script first,
to show you what it looks like. So this is what the B movie script looks like. You can
see it just like a long, you know, script of text, I just download this for free off
the internet. And it's actually not as long as the Romeo and Juliet play. So we're not
going to get as good of results from our model. But it should hopefully be okay. So we just
start and I'm just gonna do a brief summary. And then I'll show you the results from the
B movie script, just so that people that are confused, maybe have something that wraps
it up here, we're doing our imports, I don't think I need to explain that this part up
here is just loading in your file, again, I don't think I need to explain that, then
we're actually going to read the file. So open it from our directory, decode it into
UTF, eight, we're going to create a vocabulary and encode all of the text that's inside of
this file, then what we're going to do is turn all of that text into, you know, the
encoded version, we're writing a function here that goes the other way around. So from
int to tax, not from text to int, we're going to define the sequence length that we want
to train with, which will be sequence length of 100, you can decrease this value, if you
want you go 50 go 20, it doesn't really matter, it's up to you, it just that's going to determine
how many training examples you're going to have right as the sequence length. Next, what
we're going to do is create a character data set from a tensor slices from text as int,
well, this is going to do is just convert our entire text that's now an integer array
into a bunch of slices of characters. Um, so that's what this is doing here. So are
not slices, what am I saying? You're just going to convert, like, split that entire
array into just characters. Like that's pretty much what it's doing. And then we're gonna
say sequences equals chart data set dot batch, which now is going to take all those characters
and batch them in length of 101. We're going to do then is split all of that into the training
examples. So like this, right atll and then ELO. We're going to map this function to sequences,
which means we're going to apply this to every single sequence and store that in data set.
Then, we're going to find the parameters for our initial network. We're going to shuffle
the data set and batch that into now 64 training examples. And then we're going to make the
function that builds the model which already discussed, we're going to actually build the
model starting with the batch size of 64. We're going to create our loss function, compile
the model, set our checkpoints for saving, and then train the model and make sure that
we say checkpoint callback, as the checkpoint callback for the model, which means it's going
to save every epoch, the weights of the model had computed at that epoch. So after we do
that, then our models trained. So we've trained the model, you can see I trained this on 50
epochs for the B movie script, and then we're gonna do is build the model now with a batch
size of one. So we can pass with one example tune and get a prediction, we're going to
load the most recent weights into our model from the checkpoint directory that we defined
above. And then what we're going to do is build the model and tell it to expect the
shape one, none as its initial input. Now, none just means we don't know what that value
is going to be. But we know we're gonna have one entry. Alright, so now we have this generate
text method, or function here, which I've already kind of went through how that works.
And then we can see that if I type in input string, so we type, you know, input string,
let's say, Hello, and hit enter, we'll watch and we can see that the B movie, you know,
trained model comes up with its output here. Now unfortunately, the B movie script does
not work as well as Romeo and Juliet. That's just because Romeo and Juliet is a much longer
piece of text. It's much better only it's formatted a lot nicer and a lot more predictable.
But yeah, you kind of get the idea here. And it's kind of cool to see how this performs
on different data. So I would highly recommend that you guys find some training data that
you could give this other than just the Romeo and Juliet or maybe even try another play
or something and see what you can get out of it. Also, quick side note to make your
model better increase the amount of epochs here, ideally, you want this loss to be as
low as possible, you can see mine was still actually moving down at epoch 50, you will
reach a point where the amount of epochs won't make a difference. Although, with models like
this, the more epochs typically the better because it's difficult for it to kind of overfit.
Because all you want it to do really is just kind of learn how the language works, and
then be able to replicate that to you almost right. So that's kind of the idea here. And
with that being said, I'm going to say that this section is probably done. Now, I know
this was a long, probably confusing section for a lot of you. But this is you know what
happens when you start getting into some more complex things in machine learning, it's very
difficult to kind of grasp and understand all these concepts in an hour of me just explaining
them. What I try to do in these videos is introduce you to the syntax show you how to
get a working, you know, kind of prototype and hopefully give you enough knowledge to
the fact where if you're confused by something that I said, you can go, and you can look
that up. And you can figure out kind of the more important details for yourself, because
I really just I can't go into all you know, the extremes in these videos. So anyways,
that has been this section. I hope you guys enjoyed doing this, I thought this was pretty
cool. And in the next section, we're gonna be talking about reinforcement learning. Hello,
everyone, and welcome to the next module in this course on reinforcement learning. So
what we're gonna be doing in this module is talking about another technique in machine
learning called reinforcement learning. Now, if you remember, at the very beginning of
this course, which I know for you guys is probably at like six hours ago, at this point,
we did briefly discuss what reinforcement learning was now go through a recap here just
to make sure everyone's clear on it. But essentially, reinforcement learning is kind of the strategy
in machine learning where rather than feeding a ton of data and a ton of examples to our
model, we let the model or in this case, we're going to call it agent actually come up with
these examples itself. And we do this by letting an agent explore an environment. Now essentially,
the concept here is just like humans, the way that we learn to do something, say like
play a game is by actually doing it, we get put in the environment, we try to do it. And
then you know, we'll make mistakes, we'll encounter different things, we'll see what
goes correctly. And based on those experiences we learned and we figure out the correct things
to do. A very basic example is, you know, say we play a game. And when we go left, we
fell off a cliff or something right. Next time we play that game. And we get to that
point, we're probably not going to go left, because we're going to remember the fact that
that was bad, and hence learned from our mistakes. So that's kind of the idea here with reinforcement
learning. I'm gonna go through exactly how this works and give some better examples and
some math behind one of the implementations we're going to use. But I just want to make
this clear that there's a lot of different types of reinforcement learning. In this example,
we're just going to be talking about something called q learning and I'm going to keep this
module shorter compared to the other ones, because this field of AI and machine learning
is pretty complex and can get pretty difficult pretty quickly. So it's something that's maybe
a more advanced topic for some of you guys. Alright, so anyways, now we need to define
some terminology before I can even start really explaining the technique we're going to use
and how this works. So we have something called an environment agent, state, action and reward.
Now I'm hoping that some of you guys will remember this from the very beginning but
environment is essentially what we're trying to solve or what we're trying to do. So in reinforcement learning, we have
this notion of an agent. And the agent is what's going to explore the environment. So
if we're thinking about reinforcement learning when it comes to say, training an AI to play
a game, well, in that instance, so we're talking about Mario, the agent would be Mario is that
is the thing that's moving around and exploring our environment. And the environment would
be the level in which we're playing in. So you know, in another example, maybe an example
we're going to use below, we're actually going to be kind of in almost a maze. So the environment
is going to be the maze, and the agent is going to be the character or the entity or
whatever you want to call it that's exploring that maze. So it's pretty, it's usually pretty
intuitive to come up with what the environment and the agent are. Although in some more complex
examples, it might not always be clear. But just understand that reinforcement learning
deals with an agent, something exploring an environment, and a very common application
of reinforcement learning is in training AI's on how to play games. And it's actually very
interesting what they've been able to do in that field recently. Okay, so we have environments
an agent, hopefully, that makes sense, the next thing to talk about is state. So essentially,
the state is where you are in the environment. So obviously, inside of the environment, we
can have many different states. And a state could also be associated with the, you know,
Agent itself. So we're gonna say the agent is in a specific state, whenever it is in
some part of the environment. Now, in the case of our game, the state that an agent
would be in would be their position in the level, say, if they're at, you know, X, Y
coordinates, like 1020, they would be at state or in state 1020. That's kind of how we think
about states. Now, obviously, state can be applied in some different instance, as well,
we're playing say, maybe a turn based game, you know, she that's not really a great example,
I'm trying to think of something where the state wouldn't necessarily be a position,
maybe if you're playing a game where you have like health or something like that. And part
of the states might be the health of the character, this can get complicated depending on what
you're trying to do. But just understand the notion that for most of our examples, state
is simply going to be location, although it really is just kind of telling us information
about where the agent is, and its status in the environment. So next, we have this notion
of an action. So in reinforcement learning, our agent is exploring the environment, it's
trying to figure out the best way or how to accomplish some kind of goal in the environment.
And the way that it interacts with the environment is with something called actions. Now, actions
could be say, moving the left arrow key, right, moving to the left, and the environment moving
to the right, it could be something like jumping in an action can actually be not doing something
at all. So when we say, you know, Agent performed action, that could really mean that the action
in that maybe time step was that they didn't do something right that they didn't do anything
that was their action. So it's kind of the idea of action. In the example of our Mario
one, which I keep going back to in action would be something like jumping, and typically
actions will change the state of our entity or agent. Although they might not necessarily
do that. In fact, we will observe with a lot of the different actions that we could actually
be in the same state after performing that action. Alright, so now we're on to the last
part, which is actually the most important to understand. And this is reward. So reward
is actually what our agent is trying to maximize. Well, it is in the environment. So the goal
of reinforcement learning is to have this agent navigate this environment, go through
a bunch of the different states of it, and determine which actions maximize the reward
at every given state. So essentially, the goal of our agent is to maximize a reward.
But what is a reward? Well, after every action that's taken, the agent will receive a reward.
Now, this reward is something that us as the programmer need to come up with. The reason
we need to do this is because we need to tell the agent when it's performing well, and when
it's performing poorly. And just like we had like a loss function in neural networks, when
we were using those before, this is almost like our loss function, you know, the higher
this number is, the more reward the agent gets, the better, the lower the reward, you
know, it's not as good as not doing as well. So that's how we kind of monitor and assess
performance where agents is by determining the almost average amount of reward that they're
able to achieve. And their goal is really to you know, it's almost an optimization problem
where they're trying to maximize this reward. So what we're going to do in reinforcement
learning is have this agent exploring the environment, going through these different
states and performing these different actions trying to maximize its reward. And obviously,
if we're trying to get the agent to say finish a level or you know, complete the game, then
the maximum maximum reward will be achieved once it's completed the level of completed
the game. And if it does, things that we don't like say like dying or like jumping in the
wrong spot, we could give it a negative reward to try to influence it to not do that. And
our goal, you know when we train these agents is for them to get the most reward. And we
hope that they're going to learn the optimal route through a level or through some environment
that will maximize that reward for them. Okay, so now I'm going to talk about a technique
called cue learning, which is actually just an algorithm that we're going to use to implement
this idea of reinforcement learning, we're not going to get into anything too crazy in
this last module, because this is meant to be more of an introduction into the kind of
field of reinforcement learning than anything else. But q learning is the most basic way
to implement reinforcement learning, at least that I have discovered. And essentially what
q learning is, and I don't actually really know why they call it Q, although I should
probably know that is creating some kind of table or matrix likes data structure, that's
going to contain as the What is it, I guess, the rows every single state, and as the columns
every single action that could be taken in all of those different states. So for an example
here, and we'll do one on kind of the whiteboard later on, if we can get there. But here, we
can see that this is kind of my cue table. And what I'm saying is that we have a one,
a two, a three a four as all the possible actions that can be performed in any given
state. And we have three states denoted by the fact that we have three rows, and the
numbers in this, this table with this Q, what are they called Q matrix Q table, whatever
you want to call it, the numbers that are present here represent what the predicted
reward will be, given that we take an action, whatever this action is, in this state. So
I'm not sure if this is making sense to you guys. But essentially, if we're saying that
row zero is state zero, action to a two, this value tells us what reward we should expect
to get, if we take this action while we're in this state. That's what that is trying
to tell us. That's what that means. Same thing here in, you know, state two, we can see that
the optimal action to take would be action two, because that has the highest reward for
this state. And that's what this table is that we're going to try to generate with this
technique called q learning. A table that can tell us given any state with the predicted
reward will be for any action that we take. And we're going to generate this table by exploring the environment many
different times, and updating these values according to what we kind of see or what the
agencies in the environment and the rewards that receives for any given action in any
given state. And we'll talk about how we're going to update that later. But this is the
basic premise. So that is kind of cue learning. We're gonna hop on the whiteboard now. And
we'll do a more in depth example. But then we're going to talk about how we actually
learned this cue table that I just discussed. Okay, so I've drawn a pretty basic example
right now that I'm going to try to use to illustrate the idea of Q learning and talk
about some problems with it, and how we can kind of combat those as we learn more about
how q learning works. But the idea here is that we currently have three states and why
what is happening, why was that happening up at the top, I don't know. Anyways, the
idea is, we have three states s one, s two and s three. And at each state, we have two
possible actions that can be taken, we can either stay in this state or we can move.
Now what I've done is kind of just written some integers here that represent the reward
that we're going to get or that the agent is going to get such that it takes that action
in a given state. So if we take the action, here, in s one, right of moving, then we will
receive a reward of one because that's what we've written here is the reward that we get
from moving. Whereas if we stay, we'll get a reward of three, you know, same concept
here, if we stay, we get to if we move, we get one, and I think you understand the point.
So the goal of our agent to remember is to maximize its reward in the environment. And
what we're going to call the environment is this right here, the environment is essentially
defines the number of states the number of actions and you know, the way that the agent
can interact with these states and these actions. So in this case, the agent can interact with
the states by taking actions that change its state, right. So that's what we're getting
at with this. Now, what I want to do is show you how we use this cue table, or learn this
cue table to come up with kind of the almost, you know the model like the machine learning
model that we're going to use. So essentially, what we would want to have here is we want
to have a kind of pattern in this table that allows our agent to receive the maximum reward.
So in this case, we're going to say that our agent will start at state s one. And obviously,
whenever we're doing this reinforcement learning, we need to have some kind of start state that
the agent will start in this could be a random state, it could change, but it doesn't start
in some state. So in this case, we're going to say it starts at s one. Now when we're
in s one, the agent has two things that it can do. It can stay in the current state and
receive a reward of three or it can move and receive a reward of one, right if we get to
s two in this state. What can we do? We can Stay, which means we receive a reward of two,
or we can move, which means we get a reward of one. And same thing for s3, we can stay,
we get a reward of four, and we can move, we get a reward of one. Now, right now, if
we had just ran this one time and have the agent stay in each state, like start in each
unique state, this is what the cue table we would get would look like. Because after looking
at this just one time starting in each state, with the agent would be able to or I guess
two times because it would have to try each action. Let's say we had the agent starting
each state twice. So it started as one twice, it started as two twice, and it started in
s3 twice. And every time it started there, it tried one of the different actions. So
when it started as one it tried moving once and then it tried staying once, we would have
a cue table that looks like this, because what would happen is we would update
values in our cue table to represent the reward we received when we took that action from
that state. So we can see here, the one we were in state s one, and we decided to stay
what we did is we wrote a three inside of the state column, because that is how much
reward we received when we moved, right? Same thing for state two, when we moved for state
two or I guess sorry, state when we stayed in state two, we received a reward of two,
same thing for four. Now, this is okay, right? This tells us kind of, you know, the optimal
move to make in any state to receive the maximum reward. But what if we introduce the idea
that you know, our agent, we want it to receive the maximum total reward possible, right.
So if it's in state one, Ideally, we'd like it to move to state two, and then move to
states three, and then just stay in state three, because it will receive the most amount
of reward? Well, with the current table that we've developed, if we just follow this, and
we look at the table, we say, Okay, if we want to use this cue learning table now to
you know, move an agent around our level, what we'll do is we'll say, Okay, what state
is it in, if it's in state two, we'll do stay, because that's the highest reward that we
have in this table. If that's the approach we use, then we can see that if our, you know,
Agent start in state one or state two, it would stay in what we call a local minima,
because it's not able to kind of realize from this state, that it can move any further and
receive a much greater reward. Right. And that's kind of the concept we're going to
talk about as we implement and, you know, discuss further how q learning works. But
hopefully, it gives you a little bit of insight into what we do with this table, essentially,
when we're updating these table values is when we're exploring this environments, when
we explore this environment, and we start in a state, when we take an action to another
state, we observe the reward that we got from going there, and we observe the state that
we change to right, so we observe the fact that in state one, when we go to state two,
we receive the reward of one. And what we do is we take that observation, and we use
it to update this cue table. And the goal is at the end of all of these observations,
and it could be millions of them, that we have a cue table that tells us the optimal
action to take in any single state. So we're actually hard coding, this kind of mapping,
that essentially just tells us given any state, all you have to do is look up in this table,
look at all of the actions that could be taken, and just take the maximum action or the reward
that's supposed to give, I guess, the action that's supposed to give the maximum reward.
And if we were to follow that on this, we can see we get stuck in the local minima,
which is why we're going to introduce a lot of other concepts. So our reinforcement learning
model and Q learning, we have to implement the concept of being able to explore the environment,
not based on previous experiences, right, because if we just tell our model, okay, what
we're going to do is we're going to start in all these different states, we're going
to start in the start state and just start navigating around. If we update our model
immediately, or update our cue table immediately and put this three here for state, we can
almost guarantee that since this three is here, when our model is training, right, if
it's using this cue table to determine what state to move to next, when it's training
and determining what to do, it's just always gonna stay, which means we'll never get a
chance to even see what we could have gotten to at s3. So we need to kind of introduce
some concept of taking random actions, and being able to explore the environment more
freely, before starting to look at these q values, and use that for the training. So
I'm actually going to go back to my slides now to make sure I don't get lost, because
I think I was starting to ramble a little bit there. So we're gonna now talk about learning
the cue table. So essentially, I showed you how we use that cue table, which is given
some state we just look that state up in the cue table, and then determine what the maximum
reward we could get by taking you know, some actions and then take that action. And that's
how we would use the cue table later on when we're actually using the model. But when we're
learning the cue table, that's not necessarily what we want to do. We don't want to explore
the environment by just taking the maximum reward. We've seen so far and just always
going that direction, we need to make sure that we're exploring in a different way and
learning the correct values for the cue table. So essentially, our agent learns by exploring
the environment and observing the outcome slash reward from each action it takes in
a given state, which we've already set. But how does it know what action to take in each
state? When it's learning? That's the question I need to answer for you now, well, there's
two ways of doing this, our agent can essentially, you know, use the current cue table to find the best action, which is kind of
what I just discussed. So taking, looking at the cue table, looking at the state and
just taking the highest reward, or it can randomly pick a valid action. And our goal
is going to be when we create this Q learning algorithm to have a really great balance of
these two, where sometimes we use the Q table to find the best action, and sometimes we
take a random action. So that is one thing. But now I'm just going to talk about this
formula for how we actually update q values. So obviously, what's gonna end up happening
in our Q learning is, we're gonna have an agent, that's going to be in the learning
stage, exploring the environment, and having all these actions and all these rewards and
all these observations happening. And it's going to be moving around the environment
by following one of these two kind of principles, randomly picking a ballot action or using
the current cue table to find the best action. And when it gets into a new state, and it
you know, moves from state to state, it's going to keep updating this cue table telling
it, you know, this is what I've learned about the environment, I think this is a better
move, we're going to update this value. But how does it do that in a way, that's going
to make sense because we can't just put, you know, the maximum value we got from moving
otherwise, we're going to run into that issue, which I just talked about, where we get stuck
in that local maxima, right? I'm not sure if I called it minimum before. But anyways,
it's local maxima, where we see this high reward. But that's preventing us if we keep
taking that action from reaching a potentially high reward in a different state. So the formula
that we actually use to update the cue table is this. So cue state action equals q state
action, and a state action is just referencing first the rows for the state and then the
action as the column plus alpha times. And then this is all in brackets, right? reward,
plus, I believe this is gamma times max Q of new states minus q state action. So what
the heck does this mean? What are these constants? What is all this? We're going to talk about
the constants in a minute, but I want to Yeah, I want to explain this formula, actually.
So let's Okay, well, I guess we'll go through the constants, it's hard to go through a complicated
math formula. So A stands for the learning rate, and gamma stands for the discount factor.
So alpha learning rate, gamma discount factor. Now, what is the learning rate? Well, this
is a little blurb on what this is. But essentially, the learning rate ensures that we don't update
our cue table too much on every observation, so before, right when I was showing you like
this, if we can go back to my Windows Ink, was it's not working. I guess I'm just not
patient enough. before when I was showing you, all I did when I took an action was I
looked at the reward that I got from taking that action. And I just put that in my cue
table right? Now, obviously, that is not an optimal approach to do this, because that
means that in the instance, where we hit state one, well, I'm not going to be able to get
to this reward of four, because I'm going to throw that you know, three in here, and
I'm just going to keep taking that action. We need to, you know, hopefully make this
move action actually have a higher value than stay. So that next time we're in state one,
we consider the fact that we can move to state two, and then move to state three to optimize
a reward. So how do we do that? Well, the learning rate is one thing that helps us kind
of accomplish this behavior. Essentially, what it is telling us and this is usually
a decimal value, right is how much we're allowed to update every single cue value by on every
single action or every single observation. So if we just use the approach before, then
we're only going to need to observe given the amount of states and the amount of actions
and we'll be able to completely fill in the cue table. So in our case, if we had like
three states and three actions, we could, you know, nine iterations, we'd be able to
fill the entire cue table, the learning rate means that it's going to just update a little
bit slower, and essentially, change the value in the cue table very slightly. So you can
see that what we're doing is taking the current value of the cue table, so whatever is already
there, and then what we're going to do is add some value here. And this value that we
add is either going to be positive or negative, essentially telling us you know, whether we
should take this new action or whether we shouldn't take this new action. Now, the way
that this kind of value is calculated, right, is obviously our alpha is multiply this by
this, but we have the reward, plus, in this case, gamma, which is just going to actually
be the discount factor. And I'll talk about how that works in a second of the maximum
of the new state we moved into. Now what this means is find the maximum reward that we could
receive in the new state by taking any action and multiply that by what we call The discount
factor, what this part of the formula is trying to do is exactly what I've kind of been talking
about, try to look forward and say, okay, so I know
if I take this action in this state, I receive this amount of reward. But I need to factor
in the reward I could receive in the next state, so that I can determine the best place
to move to. That's kind of what this Max and this gamma are trying to do for us. So this
discount factor, whatever you want to call it, it's trying to factor in a little bit
about what we could get from the next state into this equation. So that hopefully, our
kind of agent can learn a little bit more about the transition states. So states that
maybe are actions that maybe don't give us an immediate reward, but lead to a larger
reward in the future. That's what this wine max are trying to do. Then what we do is we
subtract from this the state and action, this is just to make sure that we're adding what
the difference was, in, you know, what we get from this versus what the current value
is, and not like multiplying these values, crazily. I mean, you can look into more of
the math here and plug in like some values later. And you'll see how this kind of works.
But this is the basic format. I feel like I explained that in depth enough. Okay. So
now that we've done that, and we've updated this, we've learned kind of how we update
the cells and how this works. I could go back to the whiteboard and draw it out. But I feel
like that makes enough sense, we're going to look at what the next state is, we're going
to factor that into our calculation, we have this learning rate, which tells us essentially
how much we can update each cell value by and we have this, what do you call it here
discount factor, which essentially tries to kind of define the balance between finding
really good rewards in our current state, and finding the rewards in the future state.
So the higher this value is, the more we're going to look towards the future, the lower
it is, the more we're going to focus completely on our current reward, right. And obviously,
that makes sense, because we're going to add the maximum value. And if we're multiplying
that by a lower number, that means we're going to consider that less than if that was greater.
Awesome. Okay. So now that we kind of understand that, I want to move on to a Q learning example.
And what we're going to do for this example, is actually use something called the open
AI gym, I just need to throw my drawing tablet away right there so that we can get started.
But open AI gym is actually really interesting kind of module, I don't even actually I don't
even really know the way to describe it almost tool that was actually developed by open AI,
um, you know, coincidentally by the name, which is founded by Elan Musk, and someone
else. So he's actually, you know, made this kind of, I don't really don't know the word
to describe it, I almost want to say tool that allows programmers to work with these
really cool gym environments, and train reinforcement learning models. So you'll see how this works
in a second. But essentially, there's a ton of graphical environments that have very easy
interfaces to use. So like moving characters around them, that you're allowed to experiment
with completely for free as a programmer to try to, you know, make some cool reinforcement
learning models. That's what opening a gym is. And you can look at it, I mean, we'll
click on it here, actually, to see what it is. You can see, Jim, there's all these different
Atari environments, and it's just a way to kind of train reinforcement learning models.
Alright, so now we're gonna start by just importing Jim, if you're in Collaboratory,
there's nothing you need to do here. If you're in your own thing, you're going to have to
pip install Jim. And then what we're going to do is make this frozen lake v zero, Jim.
So essentially, what this does is just set up the environment that we're going to use.
Now I'll talk more about what this environment is later. But I want to talk about how Jim
works, because we are going to be using this throughout the thing. So the open AI gym is
meant for reinforcement learning. And essentially what it has is an observation space and an
action space for every environment. Now, the observation space is what we call our environment,
right. And that will tell us the amount of states that exist in this environment. Now
in our case, we're going to be using kind of like a maze like thing, which I'll show
you in a second, so you understand why we get the values we do. Action space tells us
how many actions we can take when we do the dot n at any given state. So if we print this
out, we get 16 and four, representing the observation space. In other words, the number
of states is 16. And the amount of actions we can take in every single state is four.
Now in this case, these actions can be left down, up and right. But yes, now Nv dot reset.
So essentially, we have some commands that allow us to move around the environment, which are actually down here. If we want to
reset the environment in start back in the beginning state, then we do MV dot reset,
you can see this actually returns to us the starting state, which obviously is going to
be zero. Now, we also have the ability to take a random action, or select a random action
from the action space. So what this line does right here, say of the action space, so all
the commands that are there are all the actions we could take, pick a random one and return
that. So if you do that, actually, let's just print action and see what this As you'll see,
we get zero to write, it just gives us a random action that is valid from the action space.
Alright, next, what we have is this NB dot step in action. Now what this does is take
whatever action we have, which in this case is three, and perform that in the environment.
So tell our agent to take this action in the environment and return to us a bunch of information.
So the first thing is the observation, which essentially means what state do we move into
next. So I could call this new underserve state. reward is what reward that we receive
by taking that action. So there'll be some value right? In our in this case, the reward
is either one or zero. But that's not that important to understand. And then we have
a bool of done, which tells us did we lose the game? Or did we win the game, yes or no,
so true. So if this is true, what this means is we need to reset the environment because
our agent either lost or won, and is no longer in a valid state in the environment. info
gives you us a little bit of information. It's not showing me anything here. We're not
going to use info throughout this, but figured I'd let you know that now. Nv dot render,
I'll actually render this for you and show you renders a graphical user interface that
shows you the environment. Now, if you use this while you're training, so you actually
watch the agent do the training, which is what you can do with this, it slows it down
drastically, like probably by you know, 10 or 20 times because it actually needs to draw
the stuff on the screen. But you know, you can use it if you want. So this is what our
frozen lake example looks like. You can see that the highlighted square is where our agent
is. And in this case, we have four different blocks. We have s, f, h, and G. So S stands
for start, F stands for a frozen, is this a frozen lake. And the goal is to
navigate to the goal without falling in one of the holes, which is represented by H. And
this here tells us the action that we just took now I guess the starting action is up
because that's zero, I believe. But yes, so if we run this a bunch of times, we'll see
this updating. Unfortunately, this doesn't work very well in Google Collaboratory, the
gooeys. But if you did this in your own command line, and you'd like did some different steps
and rounding it all out, you would see this working properly. Okay, so now we're on to
talking about the frozen lake environment, which is kind of what I just did. So now we're
just going to move to the example where we actually implement cue learning to essentially
solve the problem, how can we train an AI to navigate this environment and get to the
start to the goal? How can we do that? Well, we're gonna use q learning. So let's start.
So the first thing we need to do is import Jim, import NumPy and then create some constants
here. So we'll do that we're gonna say the amount of states is equal to the line I showed
you before. So Nv dot observation space dot n actions is equal to n v dot action space
n. And then we're going to say Q is equal to NP dot zeroes, states and actions. So something
I guess I forgot to mention is when we initialize the Q table, we just initialize all blank
values or zero values, because obviously, at the beginning of our learning, our model,
or our agent doesn't know anything about the environment yet, so we just leave those all
blank, which means we're going to more likely be taking random actions at the beginning
of our training, trying to explore the environment space more. And then as we get further on
and learn more about the environment, those actions will likely be more calculated based
on the cue table values. So we print this out, we can see this is the array that we
get, we've had to beat, build a 16 by four, I guess not array, well, I guess this technically
is an array, and we'll call it matrix 16 by four, so every single row represents a state,
and every single column represents an action that could be taken in that state. Alright,
so we're going to find some constants here, which we talked about before. So we have the
gamma, the learning rate, the maximum of steps and the number of episodes. So the number
of episodes is actually how many episodes you want to train your agent on. So how many
times do you want it to run around and explore the environment? That's what episode stands
for? Max steps essentially says, okay, so if we're in the environment, and we're kind
of navigating and moving around, we haven't died yet. How many steps are we going to let
the agent take before we cut it off, because what could happen is we could just bounce
in between two different states indefinitely. So we need to make sure we have a max steps
so that at some point, if the agent is just doing the same thing, we can, you know, and
that or if it's like going in circles, we can end that and start again, with different
you know, q values. Alright, so episode Yeah, we already talked about that learning rate,
we know what that is gamma, we know what that is, mess with these values as we go through.
And you'll see the difference that makes in our training, actually include a graph down
below. So we'll talk about that to kind of show us the outcome of our training but learning
rate, the higher This is, the faster I believe that it learns, yes. So a high learning
rate means that each update will introduce larger change to the current state. So yeah,
so that makes sense based on the equation as well. Just want to make sure that I wasn't
going crazy there. So let's run this constant block to make sure. And now we're going to
talk about picking an action. So remember how I said and I actually wrote them down
here, there's essentially two things we can Do at every, what do we call it? Step right?
We can randomly pick valid action. Or we can use the current cue table to find the best
action. So how do we actually implement that into our open AI gym? Well, I just wanted
to write a little code block here to show you the exact code that will do this for us.
So we're gonna introduce this new concept, or this new, I can almost call it constant
called epsilon. I think, epsilon, I think I spelt this wrong, app salon. Yeah, that
should be how you spell it. So we're gonna start the epsilon value essentially tells
us the percentage chance that we're going to pick a random action. So here, we're gonna
use a 90% epsilon, which essentially means that every time we take an action, there's
gonna be a 90% chance, it's random and 10% chance that we look at the cue table to make
that action, now won't reduce this epsilon value as we train so that our model will start
being able to explore, you know, as much as it possibly can in the environment by just
taking random actions. And then after we have enough observations, and we've explored the
environment enough, we'll start to slowly decrease the epsilon, so that it hopefully
finds a more optimal route for things to do. Now, the way we do this is we save NP dot
random dot uniform 01, which essentially means pick a random value between zero and one is
less than epsilon and salon like, that's, I think I'm gonna have to change some other
stuff. But we'll see that action equals Nv dot action spaced out samples. So take a random
action, that's what this means store what that action is in here. Otherwise, we're going
to take the argument max of the state row in the cue table. So what this means is find
the maximum value in the cue table and tell us what row it's in. So that way we know what
action to take. So if we're in row, I guess not sorry, not row, column four and column
one, you know, that's maximum value, take action one, that's what this is saying. So
using a cue table to pick the best action. Alright, so we don't need to run this because
this is just going to be, we just I just wrote that to show you. Now how do we update the
Q values? Well, this is just following the equation that I showed above. So this is the
line of code that does this, I just want to write it out. So you guys could see exactly
what each line is doing and kind of explore it for yourself. But essentially, you get
the point, you know, you have your learning rate, reward gamma, take the max. So NP dot
max does the same thing as a max function in Python, this is going to take the max value,
not the argument max from the next state, right, the new state that we moved into, and
then subtracting, obviously, the queue state action. Alright, so putting it all together.
So now we're actually going to show how we can train and create this cue table and then
use that cue table. So this is the pretty much all this code that I have, we've already
actually written at least this block here. That's why I put it in its own block. So just
all the constants, I've included this render constant to tell us whether we want to draw
the environment or not, in this case, I'm gonna leave it false. But you can make it
true. If you want episodes, I've left that 1500. For this, if you want to make your model
better, typically, you train it on more episodes. But that's up to you. And now we're gonna
get into the big chunk of code, which I'm going to talk about. So what this is going
to do, we're going to have a rewards list, which is actually just going to store all
of the rewards we see just so I can graph that later for you guys. Then we're going
to say for episode in range episodes. So this is just telling us, you know, for every episode,
let's do the steps I'm about to do so maximum episodes, which is our training length, essentially,
we're going to reset the state, obviously, which makes sense. So state equals n v dot
reset, which will give us the starting state, we're going to say for underscore in range
max steps, which means Okay, we're going to do you know, we're going to explore the environment
up to maximum steps, we do have a done here, which will actually break the loop if we breach
the goal, which we'll talk about further. So the first thing we're gonna do is say if
render, you know, render the environment, that's pretty straightforward. Otherwise,
let's take an action. So for each time step, we need to take an action. So epsilon i think
is spelt correctly here. Yeah, I believe that's right. I'm gonna say action equals mv dot
action space, this is already the code we've looked at. And then we're gonna say is next
state reward done, underscore equals EMB, dot step action, we've put an underscore here,
because we don't really care about this info value. So I'm not going to store it. But we
do care about what the next state will be the reward from that action and if we were
done or not. So we take that action, that's what does this MV dot step, and then when
we do is, say, Q, state action, and we just update the q value using the formula that
we've talked about. So this is the formula, you can look at it more in depth if you want,
but based on whatever the reward is, you know, that's how we're going to update those q values.
And after a lot of training, we should have some decent q values in there. Alright, so
then we set the current state to be the next state so that when we run this time step again, now our agent is in the next state, and can
start exploring the environment again, in this current, you know, iteration almost,
if that makes sense. So then we say if done, so essentially, if the agent died, or if they
lost or with whatever it was, we're going to append whatever reward they got from their
last step into the rewards up here, and it's worthy of noting that the way the rewards
work here is you get one reward if you move to a valid block, and you get zero reward
if you die. So every time we move to a valid spot, we get one, otherwise we get zero. I'm
pretty sure that's the way it works at least. But that's something that's important to know.
So then what we're going to do is reduce the epsilon, if we die by just a fraction of an
amount, you know, 0.001, just so we slowly start decreasing the epsilon moving in the
correct direction. And then we're going to break because we've reached the goals, print
the cue table, and then print the average reward. Now this takes a second to train,
like, you know, a few seconds really, that one's pretty fast, because I've set this at
was 1500. But if you want, you can set this up, say 10,000, wait another few minutes or
whatever, and then see how much better you can do. So we can see that after that I received
an average reward of 0.288866667. This is actually what the cue table values look like.
So all these decimal values, after all these updates, I just decided to print them out,
I just want to show you the average reward so that we can compare that to what we can
get from testing or this graph. So now I'm just going to graph this. And we're going
to see this is what the graph, so you don't have to really understand this code if you
don't want to. But this is just graphing the average reward over 100 steps from the beginning
to the end. So essentially, I've been I've calculated the average of every 100 episodes,
and then just graph this on here, we can see that we start off very poorly in terms of
reward because the epsilon value is quite high, which means that we're taking random
actions pretty much all the time. So if we're taking a bunch of random actions, obviously,
chances are, we're probably going to die a lot, we're probably going to get rewards of
zeros quite frequently. And then after we get to about 600 episodes, you can see that
six actually represents 600, because this is in hundreds, we start to slowly increase.
And then actually, we go on a crazy increase here, when we start to take values more frequently.
So the epsilon is increasing, right. And then after we get here, we kind of level off. And
I this does show a slight decline. But I guarantee you, if we ran this for you know, like 15,000,
it would just go up and down and Bob up and down. And that's just because even though
we have increased the epsilon, there is still a chance that we take a random action and
you know, gets your reward. So that is pretty much it for this Q learning example. And I
mean, that's pretty straightforward. To use the cue table, if you actually wanted to say,
you know, watch the agent move around the thing, I'm going to leave that to you guys,
because if you can follow what I've just done in here, and understand this, it's actually
quite easy to use the cue table. And I think as like a final almost like, you know, trust
in you guys, you can figure out how to do that. The hint is essentially do exactly what
I've done in here, except don't update the cue table values, just use the cue table values
already. And that's, you know, pretty much all there is to Q learning. So this has been
the reinforcement learning module for this TensorFlow course, which actually is the last
module in this series. I hope you guys have enjoyed up until this point, just emphasis
again, this was really just an introduction to reinforcement learning, this technique
in this problem itself is not very interesting and not, you know, the best way to do things
is not the most powerful, it's just to get you thinking about how reinforcement learning
works. And potentially, if you'd like to look into that more, there's a ton of different
resources and you know, things you can look at in terms of reinforcement learning. So
that being said, that has been this module. And now we're going to move into the conclusion
we'll we'll talk about some next steps and some more things that you guys can look at
to improve your machine learning skills. So finally, after about seven hours, of course
content, we have reached the conclusion of this course. Now, what I'm going to do in
this last brief short section is just explained to you where you can go for some next steps
and some further learning with TensorFlow and machine learning artificial intelligence
in general. Now, what I'm going to be recommending to you guys is that we look at the TensorFlow
website, because they have some amazing guides and resources on here. And in fact, a lot
of the examples that we used in our notebooks were based off of or exactly the same as the
original TensorFlow guide. And that's because the code that they have is just very good,
they're very good and easy to understand examples. And in terms of learning, I find that these
guides are great for people that want to get in quickly see the examples and then go and
do some research on their own time and understand why they work. So if you're looking for some
further steps, at this point in time, you have gained a very general and broad knowledge
of machine learning and AI, you have some basic skills in a lot of the different areas.
And hopefully, this has introduced you to a bunch of different concepts and the possibilities
of what you are able to do using modules like TensorFlow. Now, what I'm going to suggest
to all of you is that if you find a specific area of machine learning or AI that you are
very interested in that you would dial in on that area and focus most of your time into
learning that. That is because when you get to a point in machine learning in AI, where
you really get specific and pick one kind of strain or one kind of area, it gets very
interesting very quickly and you can devote most of your time to getting as deep as possible
in that space. cific topic. And that's something that's really cool. And most people that are
experts in the AI or machine learning field typically have one area of specialization.
Now, if you're someone who doesn't care to specialize an area, or you just want to play
around and see some different things, the TensorFlow website is great to really get
kind of a general introduction to a lot of different areas and be able to kind of use
this code, tweak it a little bit on your own, and implement it into your own projects. And
in fact, the next kind of steps and resources I'm going to be showing you here involves
simply going to the TensorFlow website, going to the tutorial page, this is very easy to
find, I don't even need to link it, you can just search TensorFlow. And you'll find this
online. And looking at some more advanced topics that we haven't covered. So we've covered
a few of the topics and tutorials that are here, I've just kind of modified their version,
and thrown out in the notebook and explained it in words and video content. But if you'd
like to move on to say a next step, or something very cool, something I would recommend is
doing the deep dream in the generic, generic neural network section on the TensorFlow website,
being able to make something like this, I think is very cool. And this is an example
where you can tweak this a ton by yourself and get some really cool results. So some
things like this are definitely next steps. There's tons and tons of guides and tutorials
on this website, they make it very easy for anyone to get started. And with these guides,
what I will say is typically what will end up happening is they just give you the code
and brief explanations of why things work, you should really be researching and looking
up some more, you know, deep level explanations of why some of these things work as you go
through if you want to have a firm and great understanding of why the model performs the
way that it does. So with that being said, I believe I'm going to wrap up the course
now, I know you guys can imagine how much work I put into this. So please do leave a
like, subscribe to the channel, leave a content show your support. This I believe is the largest
open source machine learning course in the world that deals completely with TensorFlow
and Python. And I hope that this gave you a lot of knowledge. So please do give me your
feedback down below in the comments. With that being said again, I hope you enjoyed
and I hopefully I will see you again in another tutorial guide or series.
Bookmarking this is basically the same thing as watching it, right?
7 hours... doesnt cover callbacks and tf summary, tf records example and protobuff, and basically everything else you'll need to actually use tensorflow in production. Just another interest course that covers different areas of deep learning.
If you’ve never been introduced to tensorflow is this a good video?
So, two questions: 1) did anybody watch this? 2) if you did watch this, was it any good?
Thanks, but is it different from reading official tutorials and documentation?
Thanks for sharing
But do I get to pay for a certification?
THis is a GaMeChaNGeR
I know other freecodecamp tutorials also do this one-long-video-format.
but heck. It needs splitting into shorter sessions. oh well. download and watch offline.