The following content is
provided under a Creative Commons license.
Your support will help MIT OpenCourseWare continue to offer
high quality educational resources for free.
To make a donation or to view additional materials from
hundreds of MIT courses, visit MIT OpenCourseWare at
ocw.mit.edu. Let me start by basically
listing the main things we have learned over the past three
weeks or so. And I will add a few
complements of information about that because there are a few
small details that I didn't quite clarify and that I should
probably make a bit clearer, especially what happened at the
very end of yesterday's class. Here is a list of things that
should be on your review sheet for the exam.
The first thing we learned about, the main topic of this
unit is about functions of several variables.
We have learned how to think of functions of two or three
variables in terms of plotting them.
In particular, well, not only the graph but
also the contour plot and how to read a contour plot.
And we have learned how to study variations of these
functions using partial derivatives.
Remember, we have defined the partial of f with respect to
some variable, say, x to be the rate of change
with respect to x when we hold all the other variables
constant. If you have a function of x and
y, this symbol means you differentiate with respect to x
treating y as a constant. And we have learned how to
package partial derivatives into a vector,the gradient vector.
For example, if we have a function of three
variables, the vector whose components are the partial
derivatives. And we have seen how to use the
gradient vector or the partial derivatives to derive various
things such as approximation formulas.
The change in f, when we change x,
y, z slightly, is approximately equal to,
well, there are several terms. And I can rewrite this in
vector form as the gradient dot product the amount by which the
position vector has changed. Basically, what causes f to
change is that I am changing x, y and z by small amounts and
how sensitive f is to each variable is precisely what the
partial derivatives measure. And, in particular,
this approximation is called the tangent plane approximation
because it tells us, in fact,
it amounts to identifying the graph of the function with its
tangent plane. It means that we assume that
the function depends more or less linearly on x,
y and z. And, if we set these things
equal, what we get is actually, we are replacing the function
by its linear approximation. We are replacing the graph by
its tangent plane. Except, of course,
we haven't see the graph of a function of three variables
because that would live in 4-dimensional space.
So, when we think of a graph, really, it is a function of two
variables. That also tells us how to find
tangent planes to level surfaces. Recall that the tangent plane
to a surface, given by the equation f of x,
y, z equals z, at a given point can be found
by looking first for its normal vector.
And we know that the normal vector is actually,
well, one normal vector is given by
the gradient of a function because we know that the
gradient is actually pointing perpendicularly to the level
sets towards higher values of a function.
And it gives us the direction of fastest increase of a
function. OK.
Any questions about these topics?
No. OK.
Let me add, actually, a cultural note to what we have
seen so far about partial derivatives and how to use them,
which is maybe something I should have mentioned a couple
of weeks ago. Why do we like partial
derivatives? Well, one obvious reason is we
can do all these things. But another reason is that,
really, you need partial derivatives to
do physics and to understand much of the world that is around
you because a lot of things actually are governed by what is
called partial differentiation equations. So if you want a cultural
remark about what this is good for.
A partial differential equation is an equation that involves the
partial derivatives of a function.
So you have some function that is unknown that depends on a
bunch of variables. And a partial differential
equation is some relation between its partial derivatives.
Let me see. These are equations involving
the partial derivatives -- -- of an unknown function.
Let me give you an example to see how that works.
For example, the heat equation is one
example of a partial differential equation.
It is the equation -- Well, let me write for you the space
version of it. It is the equation partial f
over partial t equals some constant times the sum of the
second partials with respect to x, y and z.
So this is an equation where we are trying to solve for a
function f that depends, actually, on four variables,
x, y, z, t. And what should you have in
mind? Well, this equation governs
temperature. If you think that f of x, y, z,
t will be the temperature at a point in space at position x,
y, z and at time t, then this tells you how
temperature changes over time. It tells you that at any given
point, the rate of change of
temperature over time is given by this complicated expression
in the partial derivatives in terms of the space coordinates
x, y, z. If you know, for example,
the initial distribution of temperature in this room,
and if you assume that nothing is generating heat or taking
heat away, so if you don't have any air
conditioning or heating going on,
then it will tell you how the temperature will change over
time and eventually stabilize to some final value.
Yes? Why do we take the partial
derivative twice? Well, that is a question,
I would say, for a physics person.
But in a few weeks we will actually see a derivation of
where this equation comes from and try to justify it.
But, really, that is something you will see
in a physics class. The reason for that is
basically physics of how heat is transported between particles in
fluid, or actually any medium. This constant k actually is
called the heat conductivity. It tells you how well the heat
flows through the material that you are looking at.
Anyway, I am giving it to you just to show you an example of a
real life problem where, in fact, you have to solve one
of these things. Now, how to solve partial
differential equations is not a topic for this class.
It is not even a topic for 18.03 which is called
Differential Equations, without partial,
which means there actually you will learn tools to study and
solve these equations but when there is only one variable
involved. And you will see it is already
quite hard. And, if you want more on that
one, we have many fine classes about partial differential
equations. But one thing at a time.
I wanted to point out to you that very often functions that
you see in real life satisfy many nice relations between the
partial derivatives. That was in case you were
wondering why on the syllabus for today it said partial
differential equations. Now we have officially covered
the topic. That is basically all we need
to know about it. But we will come back to that a
bit later. You will see.
OK. If there are no further
questions, let me continue and go back to my list of topics.
Oh, sorry. I should have written down that
this equation is solved by temperature for point x,
y, z at time t. OK.
And there are, actually, many other interesting partial
differential equations you will maybe sometimes learn about the
wave equation that governs how waves propagate in space,
about the diffusion equation, when you have maybe a mixture
of two fluids, how they somehow mix over time
and so on. Basically, to every problem you
might want to consider there is a partial differential equation
to solve. OK. Anyway. Sorry.
Back to my list of topics. One important application we
have seen of partial derivatives is to try to optimize things,
try to solve minimum/maximum problems. Remember that we have
introduced the notion of critical points of a function.
A critical point is when all the partial derivatives are
zero. And then there are various
kinds of critical points. There is maxima and there is
minimum, but there is also saddle points.
And we have seen a method using second derivatives -- -- to
decide which kind of critical point we have.
I should say that is for a function of two variables to try
to decide whether a given critical point is a minimum,
a maximum or a saddle point. And we have also seen that
actually that is not enough to find the minimum of a maximum of
a function because the minimum of a maximum could occur on the
boundary. Just to give you a small
reminder, when you have a function of one
variables, if you are trying to find the
minimum and the maximum of a function whose graph looks like
this, well, you are going to tell me,
quite obviously, that the maximum is this point
up here. And that is a point where the
first derivative is zero. That is a critical point.
And we used the second derivative to see that this
critical point is a local maximum.
But then, when we are looking for the minimum of a function,
well, it is not at a critical point.
It is actually here at the boundary of the domain,
you know, the range of values that we are going to consider.
Here the minimum is at the boundary.
And the maximum is at a critical point.
Similarly, when you have a function of several variables,
say of two variables, for example,
then the minimum and the maximum will be achieved either
at a critical point. And then we can use these
methods to find where they are. Or, somewhere on the boundary
of a set of values that are allowed.
It could be that we actually achieve a minimum by making x
and y as small as possible. Maybe letting them go to zero
if they had to be positive or maybe by making them go to
infinity. So, we have to keep our minds
open and look at various possibilities.
We are going to do a problem like that.
We are going to go over a practice problem from the
practice test to clarify this. Another important cultural
application of minimum/maximum problems in two variables that
we have seen in class is the least squared method to find the
best fit line, or the best fit anything,
really, to find when you have a set of
data points what is the best linear approximately for these
data points. And here I have some good news
for you. While you should definitely
know what this is about, it will not be on the test. [APPLAUSE]
That doesn't mean that you should forget everything we have
seen about it, OK?
Now what is next on my list of topics?
We have seen differentials. Remember the differential of f,
by definition, would be this kind of quantity.
At first it looks just like a new way to package partial
derivatives together into some new kind of object.
Now, what is this good for? Well, it is a good way to
remember approximation formulas. It is a good way to also study
how variations in x, y, z relate to variations in f.
In particular, we can divide this by
variations, actually, by dx or by dy or by
dz in any situation that we want,
or by d of some other variable to get chain rules.
The chain rule says, for example,
there are many situations. But, for example,
if x, y and z depend on some other variable,
say of variables maybe even u and v,
then that means that f becomes a function of u and v.
And then we can ask ourselves, how sensitive is f to a value
of u? Well, we can answer that.
The chain rule is something like this.
And let me explain to you again where this comes from.
Basically, what this quantity means is if we change u and keep
v constant, what happens to the value of f?
Well, why would the value of f change in the first place when f
is just a function of x, y, z and not directly of you?
Well, it changes because x, y and z depend on u.
First we have to figure out how quickly x, y and z change when
we change u. Well, how quickly they do that
is precisely partial x over partial u, partial y over
partial u, partial z over partial u.
These are the rates of change of x, y, z when we change u.
And now, when we change x, y and z, that causes f to
change. How much does f change?
Well, partial f over partial x tells us how quickly f changes
if I just change x. I get this.
That is the change in f caused just by the fact that x changes
when u changes. But then y also changes.
y changes at this rate. And that causes f to change at
that rate. And z changes as well,
and that causes f to change at that rate.
And the effects add up together. Does that make sense?
OK. And so, in particular,
we can use the chain rule to do changes of variables.
If we have, say, a function in terms of polar
coordinates on theta and we like to switch it to rectangular
coordinates x and y then we can use chain rules to relate the
partial derivatives. And finally,
last but not least, we have seen how to deal with
non-independent variables. When our variables say x,
y, z related by some equation. One way we can deal with this
is to solve for one of the variables and go back to two
independent variables, but we cannot always do that.
Of course, on the exam, you can be sure that I will
make sure that you cannot solve for a variable you want to
remove because that would be too easy.
Then when we have to look at all of them, we will have to
take into account this relation, we have seen two useful
methods. One of them is to find the
minimum of a maximum of a function when the variables are
not independent, and that is the method of
Lagrange multipliers. Remember, to find the minimum
or the maximum of the function f,
subject to the constraint g equals constant,
well, we write down equations that say that the gradient of f
is actually proportional to the gradient of g.
There is a new variable here, lambda, the multiplier.
And so, for example, well, I guess here I had
functions of three variables, so this becomes three
equations. f sub x equals lambda g sub x,
f sub y equals lambda g sub y, and f sub z equals lambda g sub
z. And, when we plug in the
formulas for f and g, well, we are left with three
equations involving the four variables, x,
y, z and lambda. What is wrong?
Well, we don't have actually four independent variables.
We also have this relation, whatever the constraint was
relating x, y and z together. Then we can try to solve this.
And, depending on the situation, it is sometimes easy.
And it sometimes it is very hard or even impossible.
But on the test, I haven't decided yet,
but it could well be that the problem about Lagrange
multipliers just asks you to write the equations and not to
solve them. [APPLAUSE]
Well, I don't know yet. I am not promising anything.
But, before you start solving, check whether the problem asks
you to solve them or not. If it doesn't then probably you
shouldn't. Another topic that we solved
just yesterday is constrained partial derivatives.
And I guess I have to re-explain a little bit because
my guess is that things were not extremely clear at the end of
class yesterday. Now we are in the same
situation. We have a function,
let's say, f of x, y, z where variables x,
y and z are not independent but are constrained by some relation
of this form. Some quantity involving x,
y and z is equal to maybe zero or some other constant.
And then, what we want to know, is what is the rate of change
of f with respect to one of the variables,
say, x, y or z when I keep the others constant?
Well, I cannot keep all the other constant because that
would not be compatible with this condition.
I mean that would be the usual or so-called formal partial
derivative of f ignoring the constraint.
To take this into account means that if we vary one variable
while keeping another one fixed then the third one,
since it depends on them, must also change somehow.
And we must take that into account.
Let's say, for example, we want to find -- I am going
to do a different example from yesterday.
So, if you really didn't like that one, you don't have to see
it again. Let's say that we want to find
the partial derivative of f with respect to z keeping y constant.
What does that mean? That means y is constant,
z varies and x somehow is mysteriously a function of y and
z for this equation. And then, of course because it
depends on y, that means x will vary.
Sorry, depends on y and z and z varies.
Now we are asking ourselves what is the rate of change of f
with respect to z in this situation? And so we have two methods to
do that. Let me start with the one with
differentials that hopefully you kind of understood yesterday,
but if not here is a second chance.
Using differentials means that we will try to express df in
terms of dz in this particular situation.
What do we know about df in general?
Well, we know that df is f sub x dx plus f sub y dy plus f sub
z dz. That is the general statement.
But, of course, we are in a special case.
We are in a special case where first y is constant.
y is constant means that we can set dy to be zero.
This goes away and becomes zero. The second thing is actually we
don't care about x. We would like to get rid of x
because it is this dependent variable.
What we really want to do is express df only in terms of dz.
What we need is to relate dx with dz.
Well, to do that, we need to look at how the
variables are related so we need to look at the constraint g.
Well, how do we do that? We look at the differential g.
So dg is g sub x dx plus g sub y dy plus g sub z dz.
And that is zero because we are setting g to always stay
constant. So, g doesn't change.
If g doesn't change then we have a relation between dx,
dy and dz. Well, in fact,
we say we are going to look only at the case where y is
constant. y doesn't change and this
becomes zero. Well, now we have a relation
between dx and dz. We know how x depends on z.
And when we know how x depends on z, we can plug that into here
and get how f depends on z. Let's do that. Again, saying that g cannot
change and keeping y constant tells us g sub x dx plus g sub z
dz is zero and we would like to solve for dx in terms of dz.
That tells us dx should be minus g sub z dz divided by g
sub x. If you want,
this is the rate of change of x with respect to z when we keep y
constant. In our new terminology this is
partial x over partial z with y held constant.
This is the rate of change of x with respect to z.
Now, when we know that, we are going to plug that into
this equation. And that will tell us that df
is f sub x times dx. Well, what is dx?
dx is now minus g sub z over g sub x dz plus f sub z dz.
So that will be minus fx g sub z over g sub x plus f sub z
times dz. And so this coefficient here is
the rate of change of f with respect to z in the situation we
are considering. This quantity is what we call
partial f over partial z with y held constant.
That is what we wanted to find. Now, let's see another way to
do the same calculation and then you can choose which one you
prefer. The other method is using the
chain rule. We use the chain rule to
understand how f depends on z when y is held constant.
Let me first try the chain rule brutally and then we will try to
analyze what is going on. You can just use the version
that I have up there as a template to see what is going
on, but I am going to explain it more carefully again. That is the most mechanical and
mindless way of writing down the chain rule.
I am just saying here that I am varying z, keeping y constant,
and I want to know how f changes.
Well, f might change because x might change,
y might change and z might change.
Now, how quickly does x change? Well, the rate of change of x
in this situation is partial x, partial z with y held constant.
If I change x at this rate then f will change at that rate.
Now, y might change, so the rate of change of y
would be the rate of change of y with respect to z holding y
constant. Wait a second.
If y is held constant then y doesn't change.
So, actually, this guy is zero and you didn't
really have to write that term. But I wrote it just to be
systematic. If y had been somehow able to
change at a certain rate then that would have caused f to
change at that rate. And, of course,
if y is held constant then nothing happens here.
Finally, while z is changing at a certain rate,
this rate is this one and that causes f to change at that rate.
And then we add the effects together.
See, it is nothing but the good-old chain rule.
Just I have put these extra subscripts to tell us what is
held constant and what isn't. Now, of course we can simplify
it a little bit more. Because, here,
how quickly does z change if I am changing z?
Well, the rate of change of z, with respect to itself,
is just one. In fact, the really mysterious
part of this is the one here, which is the rate of change of
x with respect to z. And, to find that,
we have to understand the constraint.
How can we find the rate of change of x with respect to z?
Well, we could use differentials,
like we did here, but we can also keep using the
chain rule. How can I do that?
Well, I can just look at how g would change with respect to z
when y is held constant. I just do the same calculation
with g instead of f. But, before I do it,
let's ask ourselves first what is this equal to.
Well, if g is held constant then, when we vary z keeping y
constant and changing x, well, g still doesn't change.
It is held constant. In fact, that should be zero.
But, if we just say that, we are not going to get to
that. Let's see how we can compute
that using the chain rule. Well, the chain rule tells us g
changes because x, y and z change.
How does it change because of x? Well, partial g over partial x
times the rate of change of x. How does it change because of y?
Well, partial g over partial y times the rate of change of y.
But, of course, if you are smarter than me then
you don't need to actually write this one because y is held
constant. And then there is the rate of
change because z changes. And how quickly z changes here,
of course, is one. Out of this you get,
well, I am tired of writing partial g over partial x.
We can just write g sub x times partial x over partial z y
constant plus g sub z. And now we found how x depends
on z. Partial x over partial z with y
held constant is negative g sub z over g sub x.
Now we plug that into that and we get our answer.
It goes all the way up here. And then we get the answer.
I am not going to, well, I guess I can write it
again. There was partial f over
partial x times this guy, minus g sub z over g sub x,
plus partial f over partial z. And you can observe that this
is exactly the same formula that we had over here.
In fact, let's compare this to make it side by side.
I claim we did exactly the same thing, just with different
notations. If you take the differential of
f and you divide it by dz in this situation where y is held
constant and so on, you get exactly this chain rule
up there. That chain rule up there is
this guy, df, divided by dz with y held
constant. And the term involving dy was
replaced by zero on both sides because we knew,
actually, that y is held constant.
Now, the real difficulty in both cases comes from dx.
And what we do about dx is we use the constant.
Here we use it by writing dg equals zero.
Here we write the chain rule for g, which is the same thing,
just divided by dz with y held constant.
This formula or that formula are the same,
just divided by dz with y held constant.
And then, in both cases, we used that to solve for dx.
And then we plugged into the formula of df to express df over
dz, or partial f, partial z with y held constant.
So, the two methods are pretty much the same.
Quick poll. Who prefers this one?
Who prefers that one? OK.
Majority vote seems to be for differentials,
but it doesn't mean that it is better.
Both are fine. You can use whichever one you
want. But you should give both a try.
OK. Any questions? Yes?
Yes. Thank you. I forgot to mention it.
Where did that go? I think I erased that part.
We need to know -- -- directional derivatives.
Pretty much the only thing to remember about them is that df
over ds, in the direction of some unit
vector u, is just the gradient f dot
product with u. That is pretty much all we know
about them. Any other topics that I forgot
to list? No.
Yes? Can I erase three boards at a
time? No, I would need three hands to
do that. I think what we should do now
is look quickly at the practice test.
I mean, given the time, you will mostly have to think
about it yourselves. Hopefully you have a copy of
the practice exam. The first problem is a simple
problem. Find the gradient.
Find an approximation formula. Hopefully you know how to do
that. The second problem is one about
writing a contour plot. And so, before I let you go for
the weekend, I want to make sure that you actually know how to
read a contour plot. One thing I should mention is
this problem asks you to estimate partial derivatives by
writing a contour plot. We have not done that,
so that will not actually be on the test.
We will be doing qualitative questions like what is the sine
of a partial derivative. Is it zero, less than zero or
more than zero? You don't need to bring a ruler
to estimate partial derivatives the way that this problem asks
you to. [APPLAUSE]
Let's look at problem 2B. Problem 2B is asking you to
find the point at which h equals 2200,
partial h over partial x equals zero and partial h over partial
y is less than zero. Let's try and see what is going
on here. A point where f equals 2200,
well, that should be probably on the level curve that says
2200. We can actually zoom in.
Here is the level 2200. Now I want partial h over
partial x to be zero. That means if I change x,
keeping y constant, the value of h doesn't change.
Which points on the level curve satisfy that property?
It is the top and the bottom. If you are here, for example,
and you move in the x direction,
well, you see, as you get to there from the
left, the height first increases and
then decreases. It goes for a maximum at that
point. So, at that point,
the partial derivative is zero with respect to x.
And the same here. Now, let's find partial h over
partial y less than zero. That means if we go north we
should go down. Well, which one is it,
top or bottom? Top. Yes.
Here, if you go north, then you go from 2200 down to
2100. This is where the point is.
Now, the problem here was also asking you to estimate partial h
over partial y. And if you were curious how you
would do that, well, you would try to figure
out how long it takes before you reach the next level curve.
To go from here to here, to go from Q to this new point,
say Q prime, the change in y,
well, you would have to read the scale,
which was down here, would be about something like
300. What is the change in height
when you go from Q to Q prime? Well, you go down from 2200 to
2100. That is actually minus 100
exactly. OK?
And so delta h over delta y is about minus one-third,
well, minus 100 over 300 which is minus one-third.
And that is an approximation for partial derivative.
So, that is how you would do it. Now, let me go back to other
things. If you look at this practice
exam, basically there is a bit of everything and it is kind of
fairly representative of what might happen on Tuesday.
There will be a mix of easy problems and of harder problems.
Expect something about computing gradients,
approximations, rate of change.
Expect a problem about reading a contour plot.
Expect one about a min/max problem,
something about Lagrange multipliers,
something about the chain rule and something about constrained
partial derivatives. I mean pretty much all the
topics are going to be there.