After seeing how we think about ordinary differential
equations in chapter 1, we turn now to an example of a partial differential equation,
the heat equation. To set things up, imagine you have some object
like a piece of metal, and you know how the heat is distributed across it at one moment;
what the temperature of every individual point is. You might think of that temperature here as
being graphed over the body. The question is, how will that distribution
change over time, as heat flows from the warmer spots to the cooler ones. The image on the left shows the temperature
of an example plate with color, with the graph of that temperature being shown on the right,
both changing with time. To take a concrete 1d example, say you have
two rods at different temperatures, where that temperature is uniform on each one. You know that when you bring them into contact,
the temperature will tend towards being equal throughout the rod, but how exactly? What will the temperature distribution be
at each point in time? As is typical with differential equations,
the idea is that it’s easier to describe how this setup changes from moment to moment
than it is to jump to a description of the full evolution. We write this rule of change in the language
of derivatives, though as you’ll see we’ll need to expand our vocabulary a bit beyond
ordinary derivatives. Don’t worry, we’ll learn how to read these
equations in a minute. Variations of the heat equation show up in
many other parts of math and physics, like Brownian motion, the Black-Scholes equations
from finance, and all sorts of diffusion, so there are many dividends to be had from
a deep understanding of this one setup. In the last video, we looked at ways of building
understanding while acknowledging the truth that most differential equations to difficult
to actually solve. And indeed, PDEs tend to be even harder than
ODEs, largely because they involve modeling infinitely many values changing in concert. But our main character now is an equation
we actually can solve. In fact, if you’ve ever heard of Fourier
series, you may be interested to know that this is the physical problem which baby face
Fourier over here was solving when he stumbled across the corner of math now so replete with
his name. We’ll dig into much more deeply into Fourier
series in the next chapter, but I would like to give at least a little hint of the beautiful
connection which is to come. This animation is showing how lots of little
rotating vectors, each rotating at some constant integer frequency, can trace out an arbitrary
shape. To be clear, what’s happening is that these
vectors are being added together, tip to tail, and you might imagine the last one as having
a pencil at its tip, tracing some path as it goes. This tracing usually won’t be a perfect
replica of the target shape, in this animation a lower case letter f, but the more circles
you include, the closer it gets. This animation uses only 100 circles, and
I think you’d agree the deviations from the real path are negligible. Tweaking the initial size and angle of each
vector gives enough control to approximate any curve you want. At first, this might just seem like an idle
curiosity; a neat art project but little more. In fact, the math underlying this is the same
as the math describing the physics of heat flow, as you’ll see in due time. But we’re getting ahead of ourselves. Step one is to build up to the heat equation,
and for that let’s be clear on what the function we’re analyzing is, exactly. The heat equation To be clear about what this graph represents,
we have a rod in one-dimension, and we’re thinking of it as sitting on an x-axis, so
each point of the rod is labeled with a unique number, x. The temperature is some function of that position
number, T(x), shown here as a graph above it. But really, since this value changes over
time, we should think of it this a function as having one more input, t for time. You could, if you wanted, think of the input
space as a two-dimensional plane, representing space and time, with the temperature being
graphed as a surface above it, each slice across time showing you what the distribution
looks like at a given moment. Or you could simply think of the graph of
the temperature changing over time. Both are equivalent. This surface is not to be confused with what
I was showing earlier, the temperature graph of a two-dimensional body. Be mindful of whether time is being represented
with its own axis, or if it’s being represented with an animation showing literal changes
over time. Last chapter, we looked at some systems where
just a handful of numbers changed over time, like the angle and angular velocity of a pendulum,
describing that change in the language of derivatives. But when we have an entire function changing
with time, the mathematical tools become slightly more intricate. Because we’re thinking of this temperature
as a function with multiple dimensions to its input space, in this case, one for space
and one for time, there are multiple different rates of change at play. There’s the derivative with respect to x;
how rapidly the temperature changes as you move along the rod. You might think of this as the slope of our
surface when you slice it parallel to the x-axis; given a tiny step in the x-direction,
and the tiny change to temperature caused by it, what’s the ratio. Then there’s the rate of change with time,
which you might think of as the slope of this surface when we slice it in a direction parallel
to the time axis. Each one of these derivatives only tells part
of the story for how the temperature function changes, so we call them “partial derivatives”. To emphasize this point, the notation changes
a little, replacing the letter d with this special curly d, sometimes called “del”. Personally, I think it’s a little silly
to change the notation for this since it’s essentially the same operation. I’d rather see notation which emphasizes
the del T terms in these numerators refer to different changes. One refers to a small change to temperature
after a small change in time, the other refers to the change in temperature after a small
step in space. To reiterate a point I made in the calculus
series, I do think it's healthy to initially read derivatives like this as a literal ratio
between a small change to a function's output, and the small change to the input that caused
it. Just keep in mind that what this notation
is meant to convey is the limit of that ratio for smaller and smaller nudges to the input,
rather than for some specific finitely small nudge. This goes for partial derivatives just as
it does for ordinary derivatives. The heat equation is written in terms of these partial derivatives. It tells us that the way this function changes with respect to time depends on how it changes with respect to space. More specifically, it's proportional to the second partial derivative with respect to x. At a high level, the intuition is that at
points where the temperature distribution curves, it tends to change in the direction
of that curvature. Since a rule like this is written with partial
derivatives, we call it a partial differential equation. This has the funny result that to an outsider,
the name sounds like a tamer version of ordinary differential equations when to the contrary
partial differential equations tend to tell a much richer story than ODEs. The general heat equation applies to bodies
in any number of dimensions, which would mean more inputs to our temperature function, but
it’ll be easiest for us to stay focused on the one-dimensional case of a rod. As it is, graphing this in a way which gives
time its own axis already pushes the visuals into three-dimensions. But where does an equation like this come
from? How could you have thought this up yourself? Well, for that, let’s simplify things by
describing a discrete version of this setup, where you have only finitely many points x
in a row. This is sort of like working in a pixelated
universe, where instead of having a continuum of temperatures, we have a finite set of separate
values. The intuition here is simple: For a particular
point, if its two neighbors on either side are, on average, hotter than it is, it will
heat up. If they are cooler on average, it will cool
down. Focus on three neighboring points, x1, x2,
and x3, with corresponding temperatures T1, T2, and T3. What we want to compare is the average of
T1 and T3 with the value of T2. When this difference is greater than 0, T2
will tend to heat up. And the bigger the difference, the faster
it heats up. Likewise, if it’s negative, T2 will cool
down, at a rate proportional to the difference. More formally, the derivative of T2, with
respect to time, is proportional to this difference between the average value of its neighbors
and its own value. Alpha, here, is simply a proportionality constant. To write this in a way that will ultimately
explain the second derivative in the heat equation, let me rearrange this right-hand
side in terms of the difference between T3 and T2 and the difference between T2 and T1. You can quickly check that these two are the
same. The top has half of T1, and in the bottom,
there are two minuses in front of the T1, so it’s positive, and that half has been
factored out. Likewise, both have half of T3. Then on the bottom, we have a negative T2
effectively written twice, so when you take half, it’s the same as the single -T2 up
top. As I said, the reason to rewrite it is that
it takes a step closer to the language of derivatives. Let’s write these as delta-T1 and delta-T2. It’s the same number, but we’re adding
a new perspective. Instead of comparing the average of the neighbors
to T2, we’re thinking of the difference of the differences. Here, take a moment to gut-check that this
makes sense. If those two differences are the same, then
the average of T1 and T3 is the same as T2, so T2 will not tend to change. If delta-T2 is bigger than delta-T1, meaning
the difference of the differences is positive, notice how the average of T1 and T3 is bigger
than T2, so T2 tends to increase. Likewise, if the difference of the differences
is negative, meaning delta-T2 is smaller than delta-T1, it corresponds to the average of
these neighbors being less than T2. This is known in the lingo as a “second
difference”. If it feels a little weird to think about,
keep in mind that it’s essentially a compact way of writing this idea of how much T2 differs
from the average of its neighbors, just with an extra factor of 1/2 is all. That factor doesn’t really matter, because
either way we’re writing our equation in terms of some proportionality constant. The upshot is that the rate of change for
the temperature of a point is proportional to the second difference around it. As we go from this finite context to the infinite
continuous case, the analog of a second difference is the second derivative. Instead of looking at the difference between
temperature values at points some fixed distance apart, you consider what happens as you shrink
this size of that step towards 0. And in calculus, instead of asking about absolute
differences, which would approach 0, you think in terms of the rate of change, in this case,
what’s the rate of change in temperature per unit distance. Remember, there are two separate rates of
change at play: How does the temperature as time progresses, and how does the temperature
change as you move along the rod. The core intuition remains the same as what
we just looked at for the discrete case: To know how a point differs from its neighbors,
look not just at how the function changes from one point to the next, but at how that
rate of change changes. This is written as del^2 T / del-x^2, the
second partial derivative of our function with respect to x. Notice how this slope increases at points
where the graph curves upwards, meaning the rate of change of the rate of change is positive. Similarly, that slope decreases at points
where the graph curves downward, where the rate of change of the rate of change is negative. Tuck that away as a meaningful intuition for
problems well beyond the heat equation: Second derivatives give a measure of how a value
compares to the average of its neighbors. Hopefully, that gives some satisfying added
color to this equation. It’s pretty intuitive when reading it as
saying curved points tend to flatten out, but I think there’s something even more
satisfying seeing a partial differential equation arise, almost mechanistically, from thinking
of each point as tending towards the average of its neighbors. Take a moment to compare what this feels like
to the case of ordinary differential equations. For example, if we have multiple bodies in
space, tugging on each other with gravity, we have a handful of changing numbers: The
coordinates for the position and velocity of each body. The rate of change for any one of these values
depends on the values of the other numbers, which we write down as a system of equations. On the left, we have the derivatives of these
values with respect to time, and the right is some combination of all these values. In our partial differential equation, we have
infinitely many values from a continuum, all changing. And again, the way any one of these values
changes depends on the other values. But helpfully, each one only depends on its
immediate neighbors, in some limiting sense of the word neighbor. So here, the relation on the right-hand side
is not some sum or product of the other numbers, it’s also a kind of derivative, just a derivative
with respect to space instead of time. In a sense, this one partial differential
equation is like a system of infinitely many equations, one for each point on the rod. When your object is spread out in more than
one dimension, the equation looks quite similar, but you include the second derivative with
respect to the other spatial directions as well. Adding all the second spatial second derivatives
like this is a common enough operation that it has its own special name, the “Laplacian”,
often written as an upside triangle squared. It’s essentially a multivariable version
of the second derivative, and the intuition for this equation is no different from the
1d case: This Laplacian still can be thought of as measuring how different a point is from
the average of its neighbors, but now these neighbors aren’t just to the left and right,
they’re all around. I did a couple of simple videos during my
time at Khan Academy on this operator, if you want to check them out. For our purposes, let’s stay focused on
one dimension. If you feel like you understand all this,
pat yourself on the back. Being able to read a PDE is no joke, and it’s
a powerful addition to your vocabulary for describing the world around you. But after all this time spent interpreting
the equations, I say it’s high time we start solving them, don’t you? And trust me, there are few pieces of math
quite as satisfying as what poodle-haired Fourier over here developed to solve this
problem. All this and more in the next chapter. I was originally inspired to cover this particular
topic when I got an early view of Steve Strogatz’s new book “Infinite Powers”. This isn’t a sponsored message or anything
like that, but all cards on the table, I do have two selfish ulterior motives for mentioning
it. The first is that Steve has been a really
strong, perhaps even pivotal, advocate for the channel since its beginnings, and I’ve
had the itch to repay the kindness for quite a while. The second is to make more people love math. That might not sound selfish, but think about
it: When more people love math, the potential audience base for these videos gets bigger. And frankly, there are few better ways to
get people loving the subject than to expose them to Strogatz’s writing. If you have friends who you know would enjoy
the ideas of calculus, but maybe have been intimidated by math in the past, this book
really does an outstanding job communicating the heart of the subject both substantively
and accessibly. Its core theme is the idea of constructing
solutions to complex real-world problems from simple idealized building blocks, which as
you’ll see is exactly what Fourier did here. And for those who already know and love the
subject, you will still find no shortage of fresh insights and enlightening stories. Again, I know that sounds like an ad, but
it’s not. I actually think you’ll enjoy the book.
Grant Sanderson is a personal hero of mine. I've had a few courses in differential equations and I've never felt completely comfortable around them - I'm excited to see what's in store for this series.
God, I cannot wait for the Fourier methods video
This man is my hero
Can’t wait to watch this. Partial differential equations pop up all the time in my field (physics). A deeper mathematical background of it would be nice!
How would this equation model a finite rod with a constant, nonzero temperature gradient? No point within the rod would change temperature because the second partial derivative with respect to space would be 0 on every point on the rod (the temperature is decreasing at a constant rate) Would the "points" and the end of rod begin to decrease, simply because they only have one "neighbor", and if so, how is this embedded in the equation?
Honestly we don't deserve Grant. I always get so filled with excitement when i See an upload. That dude really is just a fucking gem. And his god damn Animations got even better...that 3d-figure with the two variable axis was something Out of this world! I have never seen a more beautiful partial derivative graphic than that.
This video blew me away :O :O
lmao baby face Fourier
I was hoping for the colourful animation to explain how to get from a finite difference dT2/dt = \alpha( (T1-T2) - (T2-T3) ) to the second derivative.
(The constant \alpha should be proportional to h-2 where h is the uniform grid spacing h=x2-x1, then sending h->0 gives the second derivative. https://en.wikipedia.org/wiki/Finite_difference#Relation_with_derivatives )