In a moment I'm going to tell you about a certain
really nice puzzle involving the shadow of a cube. But before we get to that I should say that the
point of this video is not exactly the puzzle, per se, it's about two distinct problem-solving
styles that are reflected in two different ways that we can tackle this problem. In fact
let's anthropomorphize those two different styles by imagining two students, Alice and
Bob who embody each one of the approaches. So Bob will be the kind of student who really
loves calculation. As soon as there's a moment when he can dig into the details and get a very
concrete view of the concrete situation in front of him, that's where he's the most pleased.
Alice on the other hand is more inclined to procrastinate the computations, not because she
doesn't know how to do them or doesn't want to, per se, but she prefers to get a nice high-level
general overview of the kind of problem she's dealing with, the general shape that it has,
before she digs into the computations themselves. She's most pleased if she understands not just
the specific question sitting in front of her, but also the broadest possible
way that you could generalize it, and especially if the more general
view can lend itself to more swift and elegant computations once she does
actually sit down to carry them out. Now the puzzle that both of them are going to
be faced with is to find the average area for the shadow of a cube. So if I have a cube kind of
sitting here hovering in space, there are, a few things that influence the area of its shadow. One
obvious one would be the size of the cube, smaller cube smaller shadow. But also if it's sitting
at different orientations, those orientations correspond to different particular shadows with
different areas. And when I say find the average here, what I mean is the average over all possible
orientations for a particular size of the cube. The astute among you might point out that it
also matters a lot where the light source is. If the light source were very low close to the
cube itself then the shadow ends up larger and if the light source were kind of positioned laterally
off to the side, this can distort the shadow and give it a very different shape. Accounting
for that light position stands to be highly interesting in its own right, but the puzzle is
hard enough as it is, so at least initially let's do the easiest thing we can and say that the light
is directly above the cube and really far away. Effectively infinitely far so that all we're
considering is a flat projection, in the sense that if you look at any coordinates (x, y, z)
in space the flat projection would be (x, y, 0) So just to get our bearings, the easiest
situation to think about would be if the cube is straight up with two of its faces parallel to the
ground. In that case this flat projection shadow is simply a square, and if we say the side lengths
of the cube are s, then the area of that shadow is s squared. And by the way, anytime that I have
a label up on these animations like the one down here I'll be assuming that the relevant cube
has a side length of 1. Another special case among all the orientations that's fun to think
about is if the long diagonal is parallel to the direction of the light. In that case the
shadow actually looks like a regular hexagon, and if you use some of the methods that we will
develop in a few minutes, you can compute that the area of that shadow is exactly the square root of
three times the area of one of the square faces. But of course more often the actual shadow will
not be so regular as a square or a hexagon, it's some harder-to-think-about shape based on some
harder-to-think-about orientation for this cube. Earlier, I casually threw out this phrase
of averaging over all possible orientations, but you could rightly ask what exactly is
that supposed to mean. I think a lot of us have an intuitive feel for what we want
it to mean, at least in the sense of what experiment would you do to verify it. You might
imagine tossing this cube in the air like a die, freezing it at some arbitrary point, recording
the area of the shadow from that position, and then repeating. If you do this many many
times over and over you can take the mean of your sample. The number that we want to get at,
the true average here, should be whatever that experimental mean approaches as you do more
and more tosses approaching infinitely many. Even still, the sticklers among you could complain
that doesn't really answer the question, because it leaves open the issue of how we're defining
a "random" toss. The proper way to answer this, if we want it to be more formal, would be to first
describe the space of all possible orientations, which mathematicians have actually
given a fancy name. They call it SO(3), typically defined in terms of a certain family
of 3-by-3 matrices. And the question we want to answer is "What probability distribution are we
putting to this entire space?" It's only when such a probability distribution is well-defined that
we can answer a question involving an average. If you are a stickler for that kind of thing, I
want you to hold off on that question until the end of the video. You'll be surprised at how far
we can get with the more heuristic experimental idea of just repeating a bunch of random tosses
without really defining the distribution. Once we see Alice and Bob's solutions,
it's actually very interesting to ask how exactly each one of them defined
this distribution along their way. And remember, this is not meant to be
a lesson about cube shadows, per se, but a lesson about problem-solving told through
the lens of two different mindsets that we might bring to the puzzle. And as with any lesson
on problem-solving, the goal here is not to get to the answer as quickly as we can, but
hopefully for you to feel like you found the answer yourself. So if ever there's a point
when you feel like you might have an idea, give yourself the freedom to
pause and try to think it through. As a first step, and this is really independent
of any particular problem-solving styles, just anytime you find a hard question, a
good thing that you can do is ask "What's the simplest possible non-trivial variant
of the problem that you can try to solve?" In our case what you might say is, okay, let's
forget about averaging over all the orientations. That's a tricky thing to think about. And let's
even forget about all the different faces of the cube, because they overlap and that's also
tricky to think about. Just for one particular face and one particular orientation,
can we compute the area of this shadow? Once more, if you want to get your bearings with
some special cases the easiest is when that face is parallel to the ground in which case the
area of the shadow is the same as the area of the face. And on the other hand if we were to
tilt that face 90-degrees, then its shadow will be a straight line and it has an area of zero. So
Bob looks at this and he wants an actual formula for that shadow, and the way he might think about
it is to consider the normal vector perpendicular off of that face. What seems relevant is the angle
that that normal vector makes with the vertical, with the direction where the light is
coming from, which we might call theta. Now from the two special cases we just looked at,
we know that when theta is equal to 0, the area of that shadow is the same as the area of the shape
itself, which is s squared if the square has side lengths s. And if theta is equal to 90 degrees,
then the area of that shadow is zero. And it's probably not too hard to guess that trigonometry
will be somehow relevant, so anyone comfortable with their trig functions could probably
hazard a guess as to what the right formula is. But Bob is more detail-oriented than that. He
wants to properly prove what that area should be rather than just making a guess based on the
endpoints. The way you might think about it could be something like this. If we consider
the plane that passes through the vertical as well as our normal vector, and then we consider
all the different slices of our shape that are in that plane, or parallel to that plane, then
we can focus our attention on a two-dimensional variant of the problem. If we just look at one
of those slices, who has a normal vector an angle theta away from the vertical, its shadow
might look something like this. And if we draw a vertical line up to the left here, we have
ourselves a right triangle. And from here we can do a little bit of angle chasing, where we
follow around what that angle theta implies about the rest of the diagram. And this means the lower
right angle in this triangle is precisely theta. So when we want to understand the size of this
shadow in comparison to the original size of the piece, we can think about the cosine of that
angle theta, which remember is the adjacent over the hypotenuse. It's literally the ratio between
the size of the shadow and the size of the slice. So the factor by which the slice gets squished
down in this direction is exactly cosine of theta. And if we broaden our view to the entire square
all the slices in that direction get scaled by the same factor. But in the other direction,
the one perpendicular to that slice, there is no stretching or squishing because the face is
not at all tilted in that direction. So overall the two-dimensional shadow of our two-dimensional
face should also be scaled down by this factor of a cosine of theta. It lines up with what you
might intuitively guess given the case where the angle is 0 degrees and the case where it's 90
degrees, but it's reassuring to see why it's true Actually, as stated so far, this is not
quite correct. There is a small problem with the formula that we've written. In the
case where theta is bigger than 90 degrees, the cosine would actually come out to be
negative, but of course we don't want to consider the shadow to have negative area. At
least not in a problem like this. So there's two different ways you could solve this. You
could say we only ever want to consider the normal vector that is pointing up, that has a
positive z component. Or more simply we could say just take the absolute value of that
cosine, and that gives us a valid formula. So bob's happy because he has a precise
formula describing the area of the shadow, but Alice starts to think about it a little bit
differently. She says, okay we've got some shape, and then we apply a rotation that sort of situates
it into 3d space in some way, and then we apply a flat projection that shoves that back into
two-dimensional space. And what stands out to her is that both of these are linear transformations.
That means that in principle you could describe each one of them with a matrix, and that the
overall transformation would look like the product of those two matrices. What Alice knows from
one of her favorite subjects, linear algebra, is that if you take some shape and you consider its
area, then you apply some linear transformation, the area of that output looks like some constant
times the original area of the shape. More specifically we have a name for that constant,
it's called the determinant of the transformation. If you're not so comfortable with linear algebra,
we could give a much more intuitive description and say if you uniformly stretch the original
shape in some direction, the output will also uniformly get stretched in some direction. So the
area of each of them should scale in proportion to each other. Now, in principle Alice could
compute this determinant. But it's not really her style to do that, at least not to do so
immediately. Instead the thing that she writes down is how this proportionality constant between
our original shape and its shadow does not depend on the original shape. We could be talking about
the shadow of this cat outline, or anything else, and the size of it doesn't really matter, the
only thing affecting that proportionality constant is what transformation we're applying, which in
this context means we could write it down as some factor that depends on the rotation being applied
to the shape. In the back of our mind because of bob's calculation we know what that factor looks
like, you know it's the absolute value of the cosine of the angle between the normal vector and
the vertical. But Alice right now is just saying, "yeah, yeah, I can think about that eventually
when I want to." But she knows we're about to average over all the different orientations
anyway, so she holds out some hope that any specific formula about a specific orientation
might get washed away in that average. Now it's easy to look at this and say, "Okay,
well Alice isn't really doing anything then!" Of course the area of the shadow is
proportional to the area of the original shape, they're both two-dimensional quantities, they
should both scale like two-dimensional things. But keep in mind this would not at all be
true if we were dealing with the harder case that has a closer light source. In
that case the projection is not linear. So for example if I rotate this cat so that its
tail ends up quite close to the light source, then if I stretch the original
shape uniformly in the x direction, say by a factor of 1.5, it might have a very
disproportionate effect on the ultimate shadow, because the tail gets very disproportionately
blown up as it gets really close to the light. Again, Alice is keeping an eye out for what
properties of the problem are actually relevant, because that helps her know how much she
can generalize things. Does the fact that we're thinking about a square face and not
some other shape matter? No not really. Does the fact that the transformation
is linear matter? Yes, absolutely. Alice can also apply a similar way of
thinking about the average shadow for any shape like this. Say we have some sequence
of rotations that we apply to our square face, and let's call them R1, R2, R3, and so on. Then
the area of the shadow in each one of those cases looks like some factor times the area of the
square, and that factor depends on the rotation. So if we take an empirical average
for that shadow across the sample of rotations we're looking at right now,
the way it looks is to add up all of those shadow areas and then divide
by the total number that we have. Now, because of the linearity, this area of the
original square can cleanly factor out of all of that, and it ends up on the left. This isn't the
exact average that we're looking for, it's just an empirical mean of a sample of rotations. But
in principle what we're looking for is what this approaches as the size of our sample approaches
infinity. And all the parts that depend on the size of the sample sit cleanly away from the area
itself. So whatever this approaches, in the limit it's just going to be some number. It might be
a royal pain to compute, we're not sure about that yet, but the thing that Alice notes is that
it's independent of the size and the shape of the particular 2d thing that we're looking at. It's a
universal proportionality constant, and her hope is that that universality somehow lends itself
to a more elegant way to deduce what it must be. Now Bob would be eager to compute
this constant here and now, and in a few minutes I'll show you how he
does it. But before that, I do want to stay in Alice's world for a little bit more, because
this is where things start to really get fun. In her desire to understand the overall structure
of the question before diving into the details, she's curious now about how the area of the
shadow of the cube relates to the area of its individual faces. If we can say something about
the average area of a particular face, does that tell us anything about the average area of the
cube as a whole? For example, a simple thing we could say is that that area is definitely less
than the sum of the areas across all the faces, because there's a meaningful amount of overlap
between those shadows. But it's not entirely clear how to think about that overlap, because if we
focus our attention just on two particular faces, in some orientations they don't overlap at
all, but in other orientations they do have some overlap. The specific shape and area of that
overlap seems a little bit tricky to think about, much less how on earth we would average that
across all of the different orientations. But Alice has about three clever
insights through this whole problem, and this is the first one of them. She says,
actually, if we think about the whole cube, not just a pair of faces, we can conclude that
the area of the shadow for a given orientation is exactly one-half the sum of
the areas of all of the faces. Intuitively, you can maybe guess that half of them
are bathed in the light, and half of them are not. But here's the way that she justifies it.
She says for a particular ray of light they would go from the sky and eventually hit a
point in the shadow, that ray passes through the cube at exactly two points. There's one moment
when it enters, and one moment when it exits, so every point in that shadow corresponds to
exactly two faces above it. Well, okay, that's not exactly true. If that beam of light happened to go
through the edge of one of the squares, there's a little bit of ambiguity on how many faces it's
passing. But those account for zero area inside the shadow, so we're safe to ignore them if the
thing we're trying to do is compute the area. If Alice is pressed and she needs to justify
why exactly this is true, which is important for understanding how the problem might
generalize, she can appeal to the idea of convexity. Convexity is one of those properties
where a lot of us have an intuitive sense for what it should mean. You know, it's shapes
that just bulge out, they never dent inward. But mathematicians have a pretty clever way of
formalizing it that's helpful for actual proofs They say that a set is "convex" if the line
that connects any two points inside that set is entirely contained within the set itself. So a
square is convex because no matter where you put two points inside that square, the line connecting
them is entirely contained inside the square. But something like the symbol pi is not
convex. I can easily find two different points so that the line connecting them has to peak
outside of the set itself. None of the letters in the word "convex" are themselves convex. You
can find two points so that the line connecting them has to pass outside of the set. It's a really
clever way to formalize this idea of a shape that only bulges out, because any time that it dents
inward, you can find these counter-example lines For our cube, because it's convex, between the
first point of entry and the last point of exit it has to stay entirely inside the cube, by the
definition of convexity. But if we were dealing with some other non-convex shape, like a donut,
you could find a ray of light that enters then exits then enters and exits again. So you wouldn't
have a clean two-to-one cover from the shadows. The shadows of all of its different parts, if
you were to cover this in a bunch of faces, would not be precisely two times
the area of the shadow itself. So that's the first key insight, the face
shadows double-cover the cube shadow. And the next one is a little bit more
symbolic, so let's start things off by abbreviating our notation a little to make room on
the screen. Instead of writing Area(Shadow(Cube)), I'm just going to write S(Cube). And similarly
instead of Area(Shadow(a particular face)), I'm just going to write S(F_j), where that subscript
j indicates which face I'm talking about. But of course, we should really be talking about
the shadow of a particular rotation applied to the cube, so I might write this as S of some rotation
applied to the cube. And likewise on the right, it's the area of the shadow of that same
rotation applied to a given one of the faces. With the more compact notation at hand, let's
think about the average of this shadow area across many different rotations, some sample of R1, R2,
R3, and so on. Again, that average just involves adding up all of those shadow areas and then
dividing them by n, and in principle if we were to look at this for larger and larger samples,
letting n approach infinity, that would give us the average area of the shadow of the cube. Some
of you might be thinking, "yes, we know this, you've said this already." But it's beneficial to
write it out so that we can understand why it is that expressing the shadow area for a particular
rotation of the cube as a sum across all of its faces, or one half times that sum at least...why
is that beneficial? What is that going to do for us? Well, let's just write it out, where for
each one of these rotations of the cube we could break down that shadow as a sum across that
same rotation applied across all of the faces. And when it's written as a grid like this, we can get
to Alice's second insight, which is to shift the way that we're thinking about the sum from going
row-by-row to instead going column-by-column. For example if we focused our attention just
on the first column, what it's telling us is to add up the area of the shadow of the first
face across many different orientations. So if we were to take that sum and divide it by the
size of our sample, that gives us an empirical average for the area of the shadow of this
face. So if we take larger and larger samples, letting that size go to infinity, this will
approach the average shadow area for a square. Likewise, the second column
can be thought of as telling us the average area for the second face of the
cube, which should of course be the same number. And same deal for any other column, it's telling
us the average area for a particular face. So that gives us a very different way
of thinking about our whole expression. Instead of saying add up the areas of the
cubes at all the different orientations, we could say just add up the average
shadows for the six different faces and multiply the total by one half. The term
on the left here is thinking about adding up rows first, and the term on the right is
thinking about adding up columns first. In short, the average of
the sum of the face shadows is the same as the sum of the average of the
face shadows. Maybe that swap seems simple, maybe it doesn't, but I can tell you
that there is actually a little bit more than meets the eye to the step that
we just took. But we'll get to that later. And remember, we know that the
average area for a particular face looks like some universal proportionality
constant times the area of that face, so if we're adding this up across all the faces
of the cube, we could think of this as equaling some constant times the surface area of
the cube. And that's pretty interesting, the average area for the shadow of this cube is
going to be proportional to its surface area. But at the same time, you might complain, "Well
Alice is just pushing around a bunch of symbols here, because none of this matters if we don't
know what that proportionality constant is!" I mean it almost seems obvious. Like, of course
the average shadow area should be proportional to the surface area, they're both two-dimensional
quantities, so they should scale in lock step with each other. It's not obvious. After all,
for a closer light source it simply wouldn't be true. And also this business where we added
up the grid column-by-column versus row-by-row is a little more nuanced than it might
look at first, there's a subtle hidden assumption underlying all of this which carries
a special significance when we choose to revisit the question of what probability distribution is
being taken across the space of all orientations. But more than anything, the reason that it's
not obvious is that the significance of this result right here is not merely that these two
values are proportional. It's that an analogous fact will hold true for any convex solids, and
crucially, the actual content of what Alice has built up so far is that it'll be the same
proportionality constant across all of them. Now if you really mull over that, some of you
may be able to predict the way that Alice is able to finish things off from here. It's really
delightful, it's honestly my main reason for covering this topic. But before we get into it,
I think it's easy to under-appreciate her result unless we dig into the details of
what it is that she manages to avoid. So let's take a moment to turn our
attention back into Bob's world because while Alice has been doing all of this,
he's been busy doing some computations. In fact, what he's been working on is finding exactly
what Alice has yet to figure out, which is how to take the formula that he found for the area of a
square's shadow, and taking the natural next step of trying to find the average of that square's
shadow, averaged over all possible orientations. The way Bob starts, if he's thinking about all the
different possible orientations for this square, is to ask what are all the different normal
vectors that that square can have in all these orientations. Because everything about
its shadow comes down to that normal vector. It's not too hard to see that all those possible
normal vectors trace out the surface of a sphere. If we assume it's a unit normal vector, it's a
sphere with radius 1. And furthermore, Bob figures that each point of the sphere should be just as
likely to occur as any other, our probabilities should be uniform in that way. There's no
reason to prefer one direction over another. But in the context of continuous probabilities
it's not very helpful to talk about the likelihood of a particular individual point, because in the
uncountable infinity of points on the sphere, that would be zero and unhelpful. So instead
the more precise way to phrase this uniformity would be to say the probability that our normal
vector lands in any given patch of area on the sphere should be proportional to that area itself.
More specifically it should equal the area of that little patch divided by the total surface area
of the sphere. If that's true no matter what patch of area we're considering, that's what we
mean by a uniform distribution on the sphere. Now to be clear, points on the sphere are not
the same thing as orientations in 3d space, because even if you know what normal
vector the square is going to have that leaves us with another degree of freedom. The
square could be rotated about that normal vector. but Bob doesn't actually have to care about that
extra degree of freedom, because in all of those cases the area of the shadow is the same. It's
only dependent on the cosine of the angle between that normal vector and the vertical, which is
kind of neat. All those shadows are genuinely different shapes, they're not the same, but
the area of each of them will be the same. What this means is that when Bob wants
this average shadow area over all possible orientations, all he really needs to know is the
average value of this absolute value of cosine of theta for all different possible normal vectors,
all different possible points on the sphere. So how do you compute an average like this?
Well if we lived in some kind of discrete, pixelated world, where there's only a finite
number of possible angles theta that that normal vector could have, the average would be
pretty straightforward. What you do is find the probability of landing on any particular value of
theta, which will tell us something like how much of the sphere do normal vectors with that angle
make up, and then you multiply it by the thing we want to take the average of, this formula for the
area of the shadow. And then you would add that up over all of the different possible values of theta
ranging from 0 up to 180 degrees, or pi radians. But of course in reality there is a
continuum of possible values of theta this uncountable infinity, and the probability of
landing on any specific particular value of theta will actually be zero, and so a sum like this
unfortunately doesn't really make any sense. Or if it does make sense, adding up infinitely
many zeros should just give us a zero. The short answer for what we do instead is that
we compute an integral. And I'll level with you, the hard part here is I'm not entirely sure what
background I should be assuming from those of you watching right now. Maybe it's the case
that you're quite comfortable with calculus, and you don't need me to belabor the point here.
Maybe it's the case that you're not familiar with calculus, and I shouldn't just be throwing down
integrals like that. Or maybe you...you know, you took a calculus class a while ago
but you need a little bit of a refresher. I'm gonna go with the option of setting
this up as if it's a calculus lesson, because to be honest even when
you are quite comfortable with integrals setting them up can be
kind of an error-prone process, and calling back to the underlying definition is a
good way to sort of check yourself in the process. If we lived in a time before calculus
existed and integrals weren't a thing and we wanted to approximate an answer to this
question, one way we could go about it is to take a sample of values for theta that ranges
from 0 up to 180 degrees. We might think of them as evenly spaced, with some sort of
difference between each one, some delta-theta. And it's still the case that it would be unhelpful
to ask about the probability of a particular value of theta occurring, even if it's one in our
sample. That probability would still be zero, and it would be unhelpful. But what is helpful
to ask is the probability of falling between two different values from our sample, in this little
band of latitude with a width of delta theta. Based on our assumption that the distribution
along the sphere should be uniform, that probability comes down to knowing the area
of this band. More specifically, the chances that a randomly chosen vector lands in that band
should be that area divided by the total surface area of the sphere. To figure out that area,
let's first think of the radius of that band, which if the radius of our sphere is 1 is
definitely going to be smaller than 1. And in fact, if we draw the appropriate little right
triangle here, you can see that that little radius let's just say at the top of the band should be
the sine of our angle, the sine of theta. This means that the circumference of the band should
be 2 pi times the sine of that angle. And then the area of the band should be that circumference
times its thickness, that little delta theta. Or rather, the area of our band is approximately this
quantity. What's important is that for a finer sample of many more values of theta the accuracy
of that approximation would get better and better. Now remember, the reason we wanted this area is
to know the probability of falling into that band, which is this area divided by the surface area
of the sphere, which we know to be 4 pi times its radius squared. That's a value that you could
also compute with an integral similar to the one that we're setting up now, but for now we can take
it as a given, as a standard well-known formula. And this probability itself is just a stepping
stone in the direction of what we actually want, which is the average area
for the shadow of a square. To get that we'll multiply this probability times
the corresponding shadow area, which is this absolute value of cosine theta expression
we've seen many times up to this point. And our estimate for this average would
now come down to adding up this expression across all of the different bands, all of the
different samples of theta that we've taken. This right here, by the way, is when
Bob is just totally in his element. We've got a lot of exact formulas
describing something very concrete, actually digging in on our way to a real answer.
And again, if it feels like a lot of detail, I want you to appreciate that fact so that you
can appreciate just how magical it is when Alice manages to somehow avoid all of this. Anyway,
looking back at our expression, let's clean things up a little bit, like factoring out all
of the terms that don't depend on theta itself. And we can simplify that 2 pi divided by 4 pi to
simply be one half. And to make it a little more analogous to calculus with integrals, let me
just swap the main terms inside the sum here. What we now have, this sum that's going
to approximate the answer to our question, is almost what an integral is. Instead of writing
the sigma for sum, we write the integral symbol, this kind of elongated Leibnizian S showing us
that we're going from zero to pi. And instead of describing the step size as delta theta, a
concrete finite amount, we instead describe it as "d" theta, which I like to think of as signaling
the fact that some kind of limit is being taken. What that integral means, by definition, is
whatever the sum on the bottom approaches for finer and finer subdivisions, more dense
samples that we might take for theta itself. And at this point, for those of you who do know
calculus, I'll just write down the details of how you would actually carry this out as you might see
it written down in Bob's notebook. It's the usual anti-derivative stuff, but the one key step
is to bring in a certain trig identity. In the end, what Bob finds after doing
this is the surprisingly clean fact that the average area for a square's shadow
is precisely one-half the area of that square. This is the mystery constant
which Alice doesn't yet know. If Bob were to look over her shoulder and see
the work that she's done he could finish out the problem right now. He plugs in the constant
that he just found and he knows the final answer. And now, finally! With all of this as backdrop, what is it that Alice does to
carry out the final solution? I introduced her as someone who really
likes to generalize the results she finds. And usually those generalizations end up as
interesting footnotes that aren't really material for solving particular problems. But this is a
case where the generalization itself draws her to a quantitative result. Remember, the substance of
what she's found so far is that if you look at any convex solid, then the average area for its shadow
is going to be proportional to its surface area. And critically, it'll be the same proportionality
constant across all of these solids. So all Alice needs to do is find just a single convex solid out
there where she already knows the average area of its shadow. And some of you may see where this is
going, the most symmetric solid available to us is a sphere. No matter what the orientation of that
sphere, its shadow, the flat projection shadow, is always a circle with an area of pi r squared.
So in particular that's its average shadow area. And the surface area of a sphere, like I
mentioned before, is exactly 4 pi r squared. By the way, I did make a video talking
all about that surface area formula, and how Archimedes proved it thousands of years
before calculus existed. So you don't need integrals to find it. The magic of what Alice
has done is that she can take this seemingly specific fact, that the shadow of a sphere has
an area exactly one-fourth its surface area, and use it to conclude a much more general
fact, that for any convex solid out there its shadow and surface area
are related in the same way, in a certain sense. Wo with that she can go and
fill in the details of the particular question about a cube and say that its average shadow area
will be one-fourth times its surface area, 6s^2. But the much more memorable fact that
she'll go to sleep thinking about is how it didn't really matter that
we were talking about a cube at all. Now, that's all very pretty, but some of you might
complain that this isn't really a valid argument, because spheres don't have flat faces. When I said
Alice's argument generalizes to any convex solid, if we actually look at the argument itself,
it definitely depends on the use of a finite number of flat faces. For example, if
we were mapping it to a dodecahedron, you would start by saying that the area of
a particular shadow of that dodokahedron looks like exactly one half times the sum of
the areas of the shadows of all its faces. Snce again you could use a certain
ray-of-light-mixed-with-convexity argument to draw that conclusion. And remember the benefit
of expressing that shadow area as a sum is that when we want to average over a bunch of different
rotations, we can describe that sum as a big grid, where we can then go column-by-column and consider
the average area for the shadow of each face. And also, a critical fact was the conclusion from
much earlier that the average shadow for any 2d object (a flat 2d object, which is important)
will equal some universal proportionality constant times its area. The significance was that
that constant didn't depend on the shape itself, it could have been a square, or a cat, or the
pentagonal faces of our dodecahedron, whatever. So after hastily carrying this over to a sphere
that doesn't have a finite number of flat faces, you would be right to complain. But luckily,
it's a pretty easy detail to fill in. What you can do is imagine a sequence of different
polyhedra that successively approximate a sphere, in the sense that their faces hug tighter and
tighter around the genuine surface of the sphere. For each one of those approximations, we can draw
the same conclusion that its average shadow is going to be proportional to its surface area, with
this universal proportionality constant. So then, if we say "okay, let's take the limit of the
ratio between the average shadow area at each step and the surface area at each step..." Well, since
that ratio is never changing, it's always equal to this constant, then in the limit it's also going
to equal that constant. But on the other hand, by their definition, in the limit their average
shadow area should be that of a circle which is pi r squared, and the limit of the surface areas
would be the surface area of the sphere, 4 pi r squared. So we do genuinely get the conclusion
that intuition would suggest, but as is so common with Alice's argument here, we do have to be a
little delicate in how we justify that intuition. It's easy for this contrast of Alice and
Bob to come across like a value judgment, as if I'm saying "Look how
clever Alice has managed to be! She insightfully avoided all those
computations that Bob had to do." But that would be a very...misguided conclusion. I think there's an important way that
popularizations of math differ from the feeling of actually doing math. There's this bias
towards showing the slick proofs, the arguments with some clever key insight that lets you avoid
doing calculations. I could just be projecting, since I'm very guilty of this, but what I can tell
you sitting on the other side of the screen here is that it feels a lot more attractive to make a
video about Alice's approach than Bob's. For one, thing in Alice's approach the line of reasoning
is fun. It has these nice aha moments. But also, crucially, the way that you explain it
is more or less the same for a very wide range of mathematical backgrounds. It's much less
enticing to do a video about bob's approach, not because the computations are all
that bad. I mean they're honestly not. But the pragmatic reality is that
the appropriate pace to explain it looks very different depending on the different
mathematical backgrounds in the audience. So you watching this right now
clearly consume math videos online, and I think in doing so it's
worth being aware of this bias. If the aim is to have a genuine lesson on problem
solving, too much focus on the slick proofs runs the risk of being disingenuous. For example let's
say we were to step up to challenge mode here and ask about the case with a closer light source.
To my knowledge there is not a similarly slick solution to Alice's here, where you can
just relate to a single shape like a sphere. The much more productive warm-up to have done
would have been the calculus of Bob's approach. And if you look at the history of this
problem, it was proved by Cauchy in 1832. And if we paw through his handwritten notes, they look a lot more similar to Bob's work than
Alice's work. Right here at the top of page 11, you can see what is essentially the same
integral that you and I set up in the middle. On the other hand, the whole framing
of the paper is to find a general fact, not something specific like the case of
a cube, so if we were asking the question which of these two mindsets correlates
with the act of discovering new math, the right answer would almost certainly have
to be a blend of both. But I would suggest that many people don't assign enough weight to the
part of that blend where you're eager to dive into calculations. And I think there's some risk
that the videos I make might contribute to that. In the podcast that I did with the
mathematician Alex Kontorovich, he talked about the often underappreciated importance of
just drilling on computations to build intuition, whether you're a student engaging with a new
class, or a practicing research mathematician engaging with a new field of study. A listener
actually wrote in to highlight what an impression that particular section made. They're a Ph.D.
student, and described themselves as being worried that their mathematical abilities were starting to
fade, which they attributed to becoming older and less sharp. But hearing a practicing mathematician
talk about the importance of doing hundreds of concrete examples in order to learn something
new, evidently that changed their perspective. In their own words, recognizing this completely
reshaped their outlook and their results. And if you look at the famous mathematicians
through history, You know Newton, Euler, Gauss, all of them, they all have this seemingly
infinite patience for doing tedious calculations. The irony of being biased to show insights
that let us avoid calculations is that the way people often train up the intuitions
to find those insights in the first place is by doing piles and piles of calculations. All that said, something would definitely
be missing without the Alice mindset here. I mean think about it how sad would it
be if we solved this problem for a cube, and we never stepped outside of the trees
to see the forest and understand that this is a super general fact, it
applies to a huge family of shapes. And if you consider that math is not just about
answering the questions that are posed to you, but about introducing new ideas and constructs,
one fun side note about Alice's approach here is that it suggests a fun way to quantify the
idea of convexity. Rather than just having a yes/no answer, is it convex is it not, we could
put a number to it by saying: Consider the average area of the shadow of some solid, multiply
that by four, divide by the surface area, and if that number is 1 you've got a
convex solid. But if it's less than 1, it's non-convex, and how close it is to 1
tells you how close it is to being convex. Also, one of the nice things about the Alice
solution here is that it helps explain why it is that mathematicians have what can sometimes
look like a bizarre infatuation with generality, and with abstraction. The more examples that
you see where generalizing and abstracting actually helps you to solve a specific case, well
the more you start to adopt the same infatuation. And as a final thought, for the stalwart viewers
among you who've stuck through it this far, there is still one unanswered question
about the very premise of our puzzle. What exactly does it mean to choose a random
orientation? Now if that feels like a silly question, like, of course we know what it should
mean, I would encourage you to watch a video that I just did with Numberphile on a conundrum
from probability known as "Bertrand's Paradox". After you watch it, and if you appreciate some of
the nuance at play here, homework for you is to reflect on where exactly Alice and Bob implicitly
answered this question. The case with Bob is relatively straightforward, but the point at which
Alice locks down some specific distribution on the space of all orientations...well it's not
at all obvious, it's actually very subtle.
If we're mainly talking about how to actually solve problems, then I think that it's good for Grant to have pointed out the biases that these kinds of videos can have towards slick solutions. He's a smart guy and I think the last thing he wants to be is misleading about math, so slightly self-critical videos like this are great not just for the specific content but also as example.
Usually how I work is I just do whatever I can in order to get a solution, and it all ends up pretty messy. Kinda like going through a forest without having directions, basically a random walk. Lot's of unnecessary steps/computations/ideas etc. But then upon reflection I notice some of these redundancies and find ways to shorten the path by connecting two dots in a more direct way. Thinking and sitting on it like this for long enough will usually end up with a nice, slick proof where the underlying idea that I initially drew on (but maybe was hidden) is showcased. Sometimes the solution spontaneously flips to a totally different method only revealed by this simplification process. In a way, if Bob pays attention to what he's doing, by knowing what his computations are saying, then he can find a homotopy from his solution to Alice's.
There's also more that goes into problem solving than an individual sitting down and doing computations or playing with ideas alone in a dark room (or on a test) - which is another bias that popularizing videos can have. There is way more collaboration and discussion that happen. Why are Alice and Bob working on the problem alone? Why are they not working together while both bringing their different perspectives to the table? We tend to construct mathematics as being done by very smart - slightly crazy - men alone in their rooms (even this video helps with this). Newton gets all the credit for Calculus, but a lot of what we would call calculus was known by his time (especially for algebraic curves), including a version of the Fundamental Theorem of Calculus by Barrow, Newton's advisor. Newton and Leibniz both found ways of using infinitesimals to generalize these results and use them in broader contexts. But infinitesimals/fluxions weren't even meaningful things; they were just things that, computationally, were and were not zero at different times, it would take hundreds of years to develop meaningful tools for them. It's not like Newton went to the countryside to avoid the plague for two years and came out with the Principia, he collaborated during this time through letters and it wasn't until long after his prominence that he published the Principia.
In reality, math is created from the diversity of thought and perspectives that many people bring to the table. The Alices, the Bobs, the Carols, etc working together rather than in isolation. We generally can't solve problems by ourselves, we need each other. Whether it be remembering an idea a friend showed you one time, or straight-up talking on a board together, problem solving is a community effort that does not happen solely in an individual's brain. I typically sort-out my problems by explaining them to others, and this process/feedback helps turn it slick or illuminate the key idea needed to finish it. A move away from "crazy, genius, hero mathematicians" to "a healthy community of similarly-interested collaborators with different ways of thinking" would be good to see.
Raise your hand if you're a "Frustrated Bob": likes calculations, starts that way, then ends up with a mess of tangled integrals that you can't solve and rage quits.
Grant's reflection is truly exemplary.
I'm very happy that this is the direction the video went, not claiming they're different styles but being honest about needing both of them(and probably a little more of bob).
Far too many educators believe the 'learning styles' idea (where different students have prefered learning styles they're better in) so i'm glad we didn't have some "find out where you're an alice or a bob" kind of thing :)
I just watched this… So, people of r/math , is there a nice slick generalization to when the light source isn’t infinitely far away?
[removed]
mathologer also has a video featuring the shadow cube problem
part 1 - part 2
3b1b needs a Nobel prize for th wonderful videos he puts on YouTube.
I really like the Alice approach here. As soon as he reached the conclusion that the constant should be the same for every convex shape, at around 20 minutes, I figured out the solution.
I also really like his conclusion. I think this is a general problem with how math is presented, but it's also necessary to be this way. The reason we can understand so much math in school so quickly is because it's already neatly packed into generalisations and specialisations for the students. In hindsight a lot of this stuff seems easy and there is so much math you learn that it can't be expected of every student to go through every step by themselves. I think it's important to realise that you can still learn a lot about how math works by going straight to the neat solutions.
On the other hand, this creates a picture of how math research works that is just not realistic. The reality is that a lot of trial and error happens, that never shows up in the released papers. In turn students at university are very frustrated when they have to do a lot of work to figure stuff out. I wish there was more time at school to teach that part of doing math is to overcome that frustration and keep trying. I know that I am grossly oversimplifying things here, but I think being able to take that initial frustration is one of the most important traits to become a mathematician.
wake up babe new 3b1b video