The following content is
provided under a Creative Commons license.
Your support will help MIT OpenCourseWare continue to offer
high quality educational resources for free.
To make a donation or to view additional materials from
hundreds of MIT courses, visit MIT OpenCourseWare at
ocw.mit.edu. so -- OK, so remember last
time, on Tuesday we learned about the
chain rule, and so for example we saw that
if we have a function that depends,
sorry, on three variables, x,y,z,
that x,y,z themselves depend on some variable,
t, then you can find a formula for
df/dt by writing down wx/dx dt wy dy/dt wz dz/dt.
And, the meaning of that formula is that while the change
in w is caused by changes in x, y, and z, x,
y, and z change at rates dx/dt, dy/dt, dz/dt.
And, this causes a function to change accordingly using,
well, the partial derivatives tell you how sensitive w is to
changes in each variable. OK, so, we are going to just
rewrite this in a new notation. So, I'm going to rewrite this
in a more concise form as gradient of w dot product with
velocity vector dr/dt. So, the gradient of w is a
vector formed by putting together all of the partial
derivatives. OK, so it's the vector whose
components are the partials. And, of course,
it's a vector that depends on x, y, and z, right?
These guys depend on x, y, z. So, it's actually one vector
for each point, x, y, z.
You can talk about the gradient of w at some point,
x, y, z. So, at each point,
it gives you a vector. That actually is what we will
call later a vector field. We'll get back to that later.
And, dr/dt is just the velocity vector dx/dt,
dy/dt, dz/dt. OK, so the new definition for
today is the definition of the gradient vector.
And, our goal will be to understand a bit better,
what does this vector mean? What does it measure?
And, what can we do with it? But, you see that in terms of
information content, it's really the same
information that's already in the partial derivatives,
or in the differential. So, yes, and I should say,
of course you can also use the gradient and other things like
approximation formulas and so on.
And so far, it's just notation. It's a way to rewrite things.
But, so here's the first cool property of the gradient.
So, I claim that the gradient vector is perpendicular to the
level surface corresponding to setting the function,
w, equal to a constant. OK, so if I draw a contour plot
of my function, so, actually forget about z
because I want to draw a two variable contour plot.
So, say I have a function of two variables,
x and y, then maybe it has some contour plot.
And, I'm saying if I take the gradient of a function at this
point, (x,y). So, I will have a vector.
Well, if I draw that vector on top of a contour plot,
it's going to end up being perpendicular to the level
curve. Same thing if I have a function
of three variables. Then, I can try to draw its
contour plot. Of course, I can't really do it
because the contour plot would be living in space with x,
y, and z. But, it would be a bunch of
level faces, and the gradient vector would be a vector in
space. That vector is perpendicular to
the level faces. So, let's try to see that on a
couple of examples. So, let's do a first example.
What's the easiest case? Let's take a linear function of
x, y, and z. So, I will take w equals a1
times x plus a2 times y plus a3 times z.
Well, so, what's the gradient of this function?
Well, the first component will be a1.
That's partial w partial x. Then, a2, that's partial w
partial y, and a3, partial w partial z.
Now, what is the levels of this? Well, if I set w equal to some
constant, c, that means I look at the points where a1x a2y a3z
equals c. What kind of service is that?
It's a plane. And, we know how to find a
normal vector to this plane just by looking at the coefficients.
So, it's a plane with a normal vector exactly this gradient.
And, in fact, in a way, this is the only case
you need to check because of linear approximations.
If you replace a function by its linear approximation,
that means you will replace the level surfaces by their tension
planes. And then, you'll actually end
up in this situation. But maybe that's not very
convincing. So, let's do another example.
So, let's do a second example. Let's say we look at the
function x^2 y^2. OK, so now it's a function of
just two variables because that way we'll be able to actually
draw a picture for you. OK, so what are the level sets
of this function? Well, they're going to be
circles, right? w equals c is a circle,
x^2 y^2 = c. So, I should say,
maybe, sorry, the level curve is a circle.
So, the contour plot looks something like that.
Now, what's the gradient vector? Well, the gradient of this
function, so, partial w partial x is 2x.
And partial w partial y is 2y. So, let's say I take a point,
x comma y, and I try to draw my gradient vector.
So, here at x, y, so, I have to draw the
vector, <2x, 2y>.
What does it look like? Well, it's going in that
direction. It's parallel to the position
vector for this point. It's actually twice the
position vector. So, I guess it goes more or
less like this. What's interesting,
too, is it is perpendicular to this circle.
OK, so it's a general feature. Actually, let me show you more
examples, oops, not the one I want.
So, I don't know if you can see it so well.
Well, hopefully you can. So, here I have a contour plot
of a function, and I have a blue vector.
That's the gradient vector at the pink point on the plot.
So, you can see, I can move the pink point,
and the gradient vector, of course, changes because the
gradient depends on x and y. But, what doesn't change is
that it's always perpendicular to the level curves.
Anywhere I am, my gradient stays perpendicular
to the level curve. OK, is that convincing?
Is that visible for people who can't see blue?
OK, so, OK, so we have a lot of evidence, but let's try to prove
the theorem because it will be interesting.
So, first of all, sorry, any questions about the
statement, the example, anything, yes?
Ah, very good question. Does the gradient vector,
why is the gradient vector perpendicular in one direction
rather than the other? So, we'll see the answer to
that in a few minutes. But let me just tell you
immediately, to the side, which side it's pointing to,
it's always pointing towards higher values of a function.
OK, and we'll see in that maybe about half an hour.
So, well, let me say actually points towards higher values of
w. OK, any other questions?
I don't see any questions. OK, so let's try to prove this
theorem, at least this part of the theorem.
We're not going to prove that just yet.
That will come in a while. So, well, maybe we want to
understand first what happens if we move inside the level curve,
OK? So, let's imagine that we are
taking a moving point that stays on the level curve or on the
level surface. And then, we know,
well, what happens is that the function stays constant.
But, we can also know how quickly the function changes
using the chain rule up there. So, maybe the chain rule will
actually be the key to understanding how the gradient
vector and the motion on the level service relate.
So, let's take a curve, r equals r of t,
that stays inside, well, maybe I should say on the
level surface, w equals c.
So, let's think about what that means.
So, just to get you used to this idea, I'm going to draw a
level surface of a function of three variables.
OK, so it's a surface given by the equation w of x,
y, z equals some constant, c.
And, so now I'm going to have a point on that,
and it's going to move on that surface.
So, I will have some parametric curve that lives on this
surface. So, the question is,
what's going to happen at any given time?
Well, the first observation is that the velocity vector,
what can I say about the velocity vector of this motion?
It's going to be tangent to the level surface,
right? If I move on a surface,
then at any point, my velocity is tangent to the
curve. But, if it's tangent to the
curve, then it's also tangent to the surface because the curve is
inside the surface. So, OK, it's getting a bit
cluttered. Maybe I should draw a bigger
picture. Let me do that right away here.
So, I have my level surface, w equals c.
I have a curve on that, and at some point,
I'm going to have a certain velocity.
So, the claim is that the velocity, v,
equals dr/dt is tangent -- -- to the level,
w equals c because it's tangent to the curve,
and the curve is inside the level,
OK? Now, what else can we say?
Well, we have, the chain rule will tell us how
the value of w changes. So, by the chain rule,
we have dw/dt. So, the rate of change of the
value of w as I move along this curve is given by the dot
product between the gradient and the velocity vector.
And, so, well, maybe I can rewrite it as w dot
v, and that should be, well, what should it be?
What happens to the value of w as t changes?
Well, it stays constant because we are moving on a curve.
That curve might be complicated, but it stays always
on the level, w equals c.
So, it's zero because w of t equals c, which is a constant.
OK, is that convincing? OK, so now if we have a dot
product that's zero, that tells us that these two
guys are perpendicular. So -- So if the gradient vector
is perpendicular to v, OK, that's a good start.
We know that the gradient is perpendicular to this vector
tangent that's tangent to the level surface.
What about other vectors tangent to the level surface?
Well, in fact, I could use any curve drawn on
the level of w equals c. So, I could move,
really, any way I wanted on that surface.
In particular, I claim that I could have
chosen my velocity vector to be any vector tangent to the
surface. OK, so let's write this.
So this is true for any curve, or, I'll say for any motion on
the level surface, w equals c.
So that means v can be any vector tangent to the surface
tangent to the level. See, for example,
OK, let me draw one more picture.
OK, so I have my level surface. So, I'm drawing more and more
levels, and they never quite look the same.
But I have a point. And, at this point,
I have the tangent plane to the level surface.
OK, so this is tangent plane to the level.
Then, if I choose any vector in that tangent plane.
Let's say I choose the one that goes in that direction.
Then, I can actually find a curve that goes in that
direction, and stays on the level.
So, here, that would be a curve that somehow goes from the right
to the left, and of course it has to end up going up or
something like that. OK, so given any vector tangent
-- -- let's call that vector v tangent to the level,
we get that the gradient is perpendicular to v.
So, if the gradient is perpendicular to this vector
tangent to this curve, but also to any vector,
I can draw that tangent to my surface.
So, what does that mean? Well, that means the gradient
is actually perpendicular to the tangent plane or to the surface
at this point. So, the gradient is
perpendicular. And, well, here,
I've illustrated things with a three-dimensional example,
but really it works the same if you have only two variables.
Then you have a level curve that has a tangent line,
and the gradient is perpendicular to that line.
OK, any questions? No?
OK, so, let's see. That's actually pretty neat
because there is a nice application of this,
which is to try to figure out, now we know,
actually, how to find the tangent plane to anything,
pretty much. OK, so let's see.
So, let's say that, for example,
I want to find -- -- the tangent plane -- -- to the
surface with equation, let's say, x^2 y^2-z^2 = 4 at
the point (2,1, 1).
Let me write that. So, how do we do that?
Well, one way that we already know,
if we solve this for z, so we can write z equals a
function of x and y, then we know tangent plane
approximation for the graph of a function,
z equals some function of x and y.
But, that doesn't look like it's the best way to do it.
OK, the best way to it, now that we have the gradient
vector, is actually to directly say, oh, we know the normal
vector to this plane. The normal vector will just be
the gradient. Oh, I think I have a cool
picture to show. OK, so that's what it looks
like. OK, so here you have the
surface x2 y2-z2 equals four. That's called a hyperboloid
because it looks like when you get when you spin a hyperbola
around an axis. And, here's a tangent plane at
the given point. So, it doesn't look very
tangent because it crosses the surface.
But, it's really, if you think about it,
you will see it's really the plane that's approximating the
surface in the best way that you can at this given point.
It is really the tangent plane. So, how do we find this plane?
Well, you can plot it on a computer.
That's not exactly how you would look for it in the first
place. So, the way to do it is that we
compute the gradient. So, a gradient of what?
Well, a gradient of this function.
OK, so I should say, this is the level set,
w equals four, where w equals x^2 y^2 - z^2.
And so, we know that the gradient of this,
well, what is it? 2x, then 2y,
and then negative 2z. So, at this given point,
I guess we are at x equals two. So, that's four.
And then, y and z are one. So, two, negative two.
OK, and that's going to be the normal vector to the surface or
to the tangent plane. That's one way to define the
tangent plane. All right, it has the same
normal vector as the surface. That's one way to define the
normal vector to the surface, if you prefer.
Being perpendicular to the surface means that you are
perpendicular to its tangent plane.
OK, so the equation is, well, 4x 2y-2z equals
something, where something is, well, we should just plug in
that point. We'll get eight plus two minus
two looks like we'll get eight. And, of course,
we could simplify dividing everything by two,
but it's not very important here.
OK, so now if you have a surface given by an evil
equation, and a point on the surface,
well, you know how to find the tangent plane to the surface at
that point. OK, any questions?
No. OK, let me give just another
reason why, another way that we could have seen this.
So, I claim, in fact, we could have done
this without the gradient, or using the gradient in a
somehow disguised way. So, here's another way.
So, the other way to do it would be to start with a
differential, OK?
dw, while it's pretty much the same content,
but let me write it as a differential,
dw is 2xdx 2ydy-2zdz. So, at a given point,
at (2,1, 1), this is 4dx 2dy-2dz.
Now, if we want to change this into an approximation formula,
we can. We know that the change in w is
approximately equal to 4 delta x 2 delta y - 2 delta z.
OK, so when do we stay on the level surface?
Well, we stay on the level surface when w doesn't change,
so, when this becomes zero, OK?
Now, what does this approximation sign mean?
Well, it means for small changes in x,
y, z, this guy will be close to that guy.
It also means something else. Remember, these approximation
formulas, they are linear approximations.
They mean that we replace the function, actually,
by some closest linear formula that will be nearby.
And so, in particular, if we set this equal to zero
instead of approximately zero, it means we'll actually be
moving on the tangent plane to the level set.
If you want strict equalities in approximations means that we
replace the function by its tangent approximation. So -- [APPLAUSE] OK,
so the level corresponds to delta w equals zero,
and its tangent plane corresponds to four delta x plus
two delta y minus two delta z equals zero.
That's what I'm trying to say, basically.
And, what's delta x? Well, that means it's the
change in x. So, what's the change in x here?
That means, well, we started with x equals two,
and we moved to some other value, x.
So, that's actually x- 2, right? That's how much x has changed
compared to 2. And, two times (y - 1) minus
two times z - 1 = 0. That's the equation of a
tangent plane. It's the same equation as the
one over there. These are just two different
methods to get it. OK, so this one explains to you
what's going on in terms of approximation formulas.
This one goes right away, by using the gradient factor.
So, in a way, with this one,
you don't have to think nearly as much.
But, you can use either one. OK, questions?
No? OK, so let's move on to new
topic, which is another application of a gradient
vector, and that is directional derivatives. OK, so let's say that we have a
function of two variables, x and y.
Well, we know how to compute partial w over partial x or
partial w over partial y, which measure how w changes if
I move in the direction of the x axis or in the direction of the
y axis. So, what about moving in other
directions? Well, of course,
we've seen other approximation formulas and so on.
But, we can still ask, is there a derivative in every
direction? And that's basically,
yes, that's the directional derivative.
OK, so these are derivatives in the direction of I hat or j hat,
the vectors that go along the x or the y axis.
So, what if we move in another direction, let's say,
the direction of some unit vector, let's call it u .
OK, so if I give you a unit vector, you can ask yourself,
if I move in the direction, how quickly will my function
change? So -- So, let's look at the
straight trajectory. What this should mean is I
start at some value, x, y, and there I have my
vector u. And, I'm going to move in a
straight line in the direction of u.
And, I have the graph of my function -- -- and I'm asking
myself how quickly does the value change when I move on the
graph in that direction? OK, so let's look at a straight
line trajectory So, we have a position vector,
r, that will depend on some parameter which I will call s.
You'll see why very soon, in such a way that the
derivative is this given unit vector u hat.
So, why do I use s for my parameter rather than t.
Well, it's a convention. I'm moving at unit speed along
this line. So that means that actually,
I'm parameterizing things by the distance that I've traveled
along a curve, sorry, along this line.
So, here it's called s in the sense of arc length.
Actually, it's not really an arc because it's a straight
line, so it's the distance along the line.
OK, so because we are parameterizing by distance,
we are just using s as a convention just to distinguish
it from other situations. And, so, now,
the question will be, what is dw/ds?
What's the rate of change of w when I move like that?
Well, of course we know the answer because that's a special
case of the chain rule. So, that's how we will actually
compute it. But, in terms of what it means,
it really means we are asking ourselves,
we start at a point and we change the variables in a
certain direction, which is not necessarily the x
or the y direction, but really any direction.
And then, what's the derivative in that direction?
OK, does that make sense as a concept?
Kind of? I see some faces that are not
completely convinced. So, maybe you should show more
pictures. Well, let me first write down a
bit more and show you something. So I just want to give you the
actual definition. Sorry, first of all in case you
wonder what this is all about, so let's say the components of
our unit vector are two numbers, a and b.
Then, it means we'll move along the line x of s equals some
initial value, the point where we are actually
at the directional derivative plus s times a,
or I meant to say plus a times s.
And, y of s equals y0 bs. And then, we plug that into w.
And then we take the derivative. So, we have a notation for that
which is going to be dw/ds with a subscript in the direction of
u to indicate in which direction we are actually going to move.
And, that's called the directional derivative -- -- in
the direction of u. OK, so, let's see what it means
geometrically. So, remember,
we've seen things about partial derivatives,
and we see that the partial derivatives are the slopes of
slices of the graph by vertical planes that are parallel to the
x or the y directions. OK, so, if I have a point,
at any point, I can slice the graph of my
function by two planes, one that's going along the x,
one along the y direction. And then, I can look at the
slices of the graph. Let me see if I can use that
thing. So, we can look at the slices
of the graph that are drawn here.
In fact, we look at the tangent lines to the slices,
and we look at the slope and that gives us the partial
derivatives in case you are on that side and want to see also
the pointer that was here. So, now, similarly,
the directional derivative means, actually,
we'll be slicing our graph by the vertical plane.
It's not really colorful, something more colorful.
We'll be slicing things by a plane that is now in the
direction of this vector, u, and we'll be looking at the
slope of the slice of the graph. So, what that looks like here,
so that's the same applet the way that you've used on your
problem set in case you are wondering.
So, now, I'm picking a point on the contour plot.
And, at that point, I slice the graph.
So, here I'm starting by slicing in the direction of the
x axis. So, in fact,
what I'm measuring here by the slope of the slice is the
partial in the x direction. It's really partial f partial
x, which is also the directional derivative in the direction of
i. And now, if I rotate the slice,
then I have all of these planes.
So, you see at the bottom left, I have the direction in which
I'm going. There's this,
like, rotating line that tells you in which direction I'm going
to be moving. And for each direction,
I have a plane. And, when I slice by that
plane, I will get, so I have this direction here
going maybe to the southwest. So, that gives me a slice of my
graph by a vertical plane, and the slice has a certain
slope. And, the slope is going to be
the directional derivative in that direction.
OK, I think that's as graphic as I can get.
OK, any questions about that? No?
OK, so let's see how we compute that guy.
So, let me just write again just in case you want to,
in case you didn't hear me it's the slope of the slice of the
graph by a vertical plane -- -- that contains the given
direction, that's parallel to the
direction, u. So, how do we compute it?
Well, we can use the chain rule. The chain rule implies that
dw/ds is actually the gradient of w dot product with the
velocity vector dr/ds. But, remember we say that we
are going to be moving at unit speed in the direction of u.
So, in fact, that's just gradient w dot
product with the unit vector u. OK, so the formula that we
remember is really dw/ds in the direction of u is gradient w dot
product of u. And, maybe I should also say in
words, this is the component of the gradient in the direction of
u. And, maybe that makes more
sense. So, for example,
the directional derivative in the direction of I hat is the
component along the x axes. That's the same as,
indeed, the partial derivatives in the x direction.
Things make sense. dw/ds in the direction of I hat
is, sorry, gradient w dot I hat, which is wx,maybe I should
write, partial w of partial x. OK, now, so that's basically
what we need to know to compute these guys.
So now, let's go back to the gradient and see what this tells
us about the gradient. [APPLAUSE]
I see you guys are having fun. OK, OK, let's do a little bit
of geometry here. That should calm you down.
So, we said dw/ds in the direction of u is gradient w dot
u. That's the same as the length
of gradient w times the length of u.
Well, that happens to be one because we are taking the unit
vector times the cosine of the angle between the gradient and
the given unit vector, u, so, have this angle, theta.
OK, that's another way of saying we are taking the
component of a gradient in the direction of u.
But now, what does that tell us? Well,
let's try to figure out in which directions w changes the
fastest, in which direction it increases
the most or decreases the most, or doesn't actually change.
So, when is this going to be the largest?
If I fix a point, if I set a point,
then the gradient vector at that point is given to me.
But, the question is, in which direction does it
change the most quickly? Well, what I can change is the
direction, and this will be the largest when the cosine is one.
So, this is largest when the cosine of the angle is one.
That means the angle is zero. That means u is actually in the
direction of the gradient. OK, so that's a new way to
think about the direction of a gradient.
The gradient is the direction in which the function increases
the most quickly at that point. So, the direction of gradient w
is the direction of fastest increase of w at the given
point. And, what is the magnitude of w?
Well, it's actually the directional derivative in that
direction. OK, so if I go in that
direction, which gives me the fastest increase,
then the corresponding slope will be the length of the
gradient. And, with the direction of the
fastest decrease? It's going in the opposite
direction, right? I mean, if you are on a
mountain, and you know that you are facing the mountain,
that's the direction of fastest increase.
The direction of fastest decrease is behind you straight
down. OK, so, the minimal value of
dw/ds is achieved when cosine of theta is minus one.
That means theta equals 180�. That means u is in the
direction of minus the gradient. It points opposite to the
gradient. And, finally,
when do we have dw/ds equals zero?
So, in which direction does the function not change?
Well, we have two answers to that.
One is to just use the formula. So, that's one cosine theta
equals zero. That means theta equals 90 degrees.
That means that u is perpendicular to the gradient.
The other way to think about it, the direction in which the
value doesn't change is a direction that's tangent to the
level surface. If we are not changing a,
it means we are moving along the level.
And, that's the same thing -- -- as being tangent to the
level. So, let me just show that on
the picture here. So, if actually show you the
gradient, you can't really see it here.
I need to move it a bit. So, the gradient here is
pointing straight up at the point that I have chosen.
Now, if I choose a slice that's perpendicular,
and a direction that's perpendicular to the gradient,
so that's actually tangent to the level curve,
then you see that my slice is flat.
I don't actually have any slop. The directional derivative in a
direction that's perpendicular to the gradient is basically
zero. Now, if I rotate,
then the slope sort of increases, increases,
increases, and it becomes the largest when I'm going in the
direction of a gradient. So, here, I have,
actually, a pretty big slope. And now, if I keep rotating,
then the slope will decrease again.
Then it becomes zero when I perpendicular,
and then it becomes negative. It's the most negative when I
pointing away from the gradient and then becomes zero again when
I'm back perpendicular. OK, so for example,
if I give you a contour plot, and I ask you to draw the
direction of the gradient vector,
well, at this point, for example,
you would look at the picture. The gradient vector would be
going perpendicular to the level.
And, it would be going towards higher values of a function.
I don't know if you can see the labels, but the thing in the
middle is a minimum. So, it will actually be
pointing in this kind of direction.
OK, so that's it for today.