The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Let's get started. Welcome back to 6046. Today, we start
an exciting series of algorithms for graphs. We've done a lot
of data structures. We're starting to get
back into algorithms with dynamic
programming last week. And today and the
next few lectures, we're going to see lots of
cool algorithms about graphs. First a bit of
recall-- we're starting with shortest
paths, which you've seen in 6006 in the context of
single-source shortest paths. So typically, like you do a
Google Maps query, you think of, I want to go from A to B. But what you solved in 6006
was a harder problem, which is I give you a point A-- here
it's called S for the source. I give you a source vertex. And capital V is a
set of all vertices, capital E is a set of all
edges, remember graph notation. Let's say it's a directed graph. You've got edge
weights, like the time it takes to traverse each road. And you want to
know how long's it take me to get from
S to V for all V. So this is from one given
point to everywhere. Today, we're going
to solve a harder problem, which is all-pairs. I want to go from all A to all
B. But what you saw in 6006 was single-source, where I just
give you one of the vertices, and I want to know how
to get to everywhere. The reason you saw this version
and not the A to B version is because the best way
we know to solve A to B is to solve this problem. So at least from a
theory standpoint, we don't know how to
beat Dijkstra's algorithm and Bellman-Ford's algorithm
for the A to B problem. So you get a little bit
more than what you asked for sort of for the same price. So let me remind you in a
few different scenarios what algorithms we have,
and how long they take. So the scenarios of interest
are the unweighted case, a non-negative weighted
case, the general case, arbitrary weights,
positive and negative, and DAGs acyclic graphs. These are some
interesting special cases. And you should have seen in 006
algorithms for each of them. Let's see if you remember. So what's a good algorithm
for single-source shortest paths in an unweighted graph? BFS, good, Breadth-first
Search, that takes how long? V plus E, good. That's-- for graphs, V plus
E is considered linear time. That's how long it takes
to represent the input. So you got to look at the
input, most algorithms, and the BFS is
optimal against that. But we're going to start
getting worse as we-- well, for these two situations. So for non-negative edge
weights, what do you use? Dijkstra. Ah, everyone's
awake this morning. That's impressive. And that takes how long? This is a tricky question. V log V plus E, wow, nice. So this answer kind of depends
on which heap structure you use, but this
is the best we know. If you use a Fibonacci heap,
which we don't actually cover but it's in
the textbook, you achieve log V for extract
key and constant amortized for each decreased
key operation. So sorry, this is
for extracted min, and this is for decrease key. And so this is the
best we know how to do with a Dijkstra-type approach. If you use other heaps,
you get slightly worse, maybe you get a log factor here. But this is good. This is almost as good as V plus
E. For moderately dense graphs, if E is bigger than V log
V, then these are the same. But if your graph is
sparse, like E is order V, then you lose a log factor. But hey, it's just a
log factor, not too bad. We're going to get worse. So for general weights,
what do you use? Bellman-ford. OK. Which takes how long? VE, that's the usual statement. Technically, you should
assume VE is at least V for this bound to hold. But that's the way
to think of it. So this is not nearly as good. This is a lot slower. If you think of-- we can
think of two situations. One is when E is theta V, so a
very sparse graph like a tree or planar graph or something. And we could think of
when E is quadratic. That's the dense case. So here we get, whatever,
V and V squared for BFS. For non-negative edge weights we
get V log V in the sparse case. And we get V squared
in the dense case. And for Bellman-Ford, we get
V squared in the sparse case, and V cubed in the dense case. So this is like a V
factor, a linear factor larger than non-negative
edge weights-- makes a huge difference. And finally for acyclic
graphs, what do you do? STUDENT: Dynamic programming. PROFESSOR: Dynamic programming
is one answer, yeah. That works. In some sense all
of these algorithms are-- especially Bellman-Ford
is a dynamic program. We'll see that little bit. Another interpretation? Topological sort, and
then Bellman-Ford, yeah-- say, one round
of Bellman-Ford. So Bellman-Ford actually
works really well if you know the order you
should relax the edges. And if in an acyclic graph,
you can do a topological sort, meaning you visit
all the vertices, so that whenever you visit
the right endpoint of an edge, you've already visited
the left endpoint. If you do Bellman-Ford
in that order, then you only have to do one
pass and you're done. Whereas normally, here,
you had to do it V times. So the total cost of
this is just linear. Good thing to remember,
especially on quizzes and so on. If your graph is acyclic,
you can achieve linear time. But in the general
case, Bellman-Ford is your answer
for single source. Now, these are the
best algorithms we know for each of these cases. So I'm not going to
improve them today. You saw the state
of the art 006. But for all-pair shortest
paths, we can in some sense do better, sort of. So let me just quickly
define the problem, and then tell you all
of the results we know. And also, the results
we're going to cover today. I didn't remind you of
the delta definition. I want to go over this briefly. So delta of s comma
v is the weight of the shortest path from S to
V. The weight is well-defined. Even though there may
be many shortest paths, there's one best weight. But there's some special cases. It could be infinity,
if there's no path. That's sort of by definition. Say, well, it's infinite costs
to get-- if there's no path, then we said there's
infinite weight one. And it could be minus
infinity in the presence of negative weight cycles. So let's say, if there's
a negative weight cycle on the way, if you
could reach a negative weight cycle from s, and then still
get to V from there, then the best way to get there
is to go to that cycle loop around infinitely many
times, and then go to V. OK, so the algorithms
you saw probably didn't actually compute
correctly in this case. They just said,
negative weight cycle-- I don't know what to do. But it's actually not that hard. With a little bit
more effort, you can figure out where the
negative infinities are. We're not going to rely
on that, but I'm just throwing it out there to make
this a well-defined definition. Once you have the
shortest path weights, you can also store
parent pointers, get the shortest path
tree, then you can actually find shortest paths. But again, we're not
going to talk about here. We'll focus on computing delta,
but with the usual techniques you saw in 006, you could
also reconstruct paths. So for all-pairs shortest
paths, we have a similar set-up. We have a directed graph, V,E.
And we have an edge weight function w-- in general,
could have negative weights. And our goal is to find delta of
u comma v for all u and v. OK. Single-source shortest
paths is the sort of thing that you might want to do
a few-- just given a graph, and you want to find a shortest
path from A to B. I said, this is the best way we know
how to do A to B, essentially. But all-pairs
shortest paths is what you might want to do if
you're pre-processing. If you're Google
Maps, and you want to be able to very quickly
support shortest path queries between major
cities, then you may want to first compute
all-pair shortest paths for all major cities, because
road networks don't change very much, the large scale. This is ignoring
traffic and so on. Pre-compute this, and then
given a query of two vertices, come in-- probably
get a million queries a second-- you could
very quickly know what the answer is. And this is the basis for
real world shortest paths. Typically, you don't compute
shortest paths from A to B every single time. You use waypoints along the way. And you have pre-computed
all-pair shortest paths between waypoints. So that's the motivation. Yeah, I guess in some
sense, internet routing is another situation
where at any moment you may need to
know the shortest path to get to-- the
fewest hop path say to get to an internet site. You know the IP address. You need to know where to go. You don't need to
know the whole path. You need to know the next step. But in some sense,
you're computing all-pair shortest paths. That's a more dynamic situation. OK. So here are the results we know
for all-pair shortest paths. I think I'm going to cheat,
and reuse this board. So same situations,
except I won't think about acyclic graphs here. They're a little
less interesting. Actually now, I'm
curious, but I didn't intend to talk about acyclic. And so the obvious thing to
do to solve all-pairs shortest paths is just run the single
source algorithm V times, once from each source. So I could do V times
Breadth-first Search, V times Dijkstra, V times Bellman-Ford. And now, I just need
to update my bounds. OK so VE becomes
V squared plus VE. If you're a little bit clever,
or you assume E is at least V, that becomes VE. If I run Dijkstra
V times, I'm going to get V squared log
V plus V times E. And if I run Bellman-Ford
V times, I get V squared E. OK. And over here,
everything's just going to increase by a V factor. So a little more intuitive is
to think about the sparse case. I get V squared, V squared
log V, and V cubed. Check that those match
over there-- 1 equals V. And over here, I get V cubed,
V cubed, and V to the fourth. OK. So pretty boring so far. The interesting
thing here is that we can beat the last result.
The last result, which is the slowest one,
could take as long as V to the fourth time. We can shave off
a whole V factor. So a better general
case algorithm is called Johnson's algorithm. That will be the last
algorithm we cover today. And it achieves
this bound, which is the same as running
Dijkstra V times. So it's between V squared
log V and V cubed. And that's cool, because
this is the best algorithm we know for all-pairs,
non-negative edge weight, shortest paths, just
running Dijkstra V times. Not very smart-- but it's the
best thing we know how to do. And what this says is, even when
we have negative edge weights, actually we can achieve the
same bound as running Dijkstra. This is a bit counter intuitive
because in 006 you're always told, if you have negative edge
weights, can't use Dijkstra. Turns out, in the all-pairs
shortest paths case, you kind of can. How can that be? Because this is
a harder problem. If you could solve all-pairs
shortest paths, of course you could solve single-source. And that's actually the luxury. Because it's a
harder problem, we have this VE term
in the running time, which lets us do things
like run Bellman-Ford once. And running
Bellman-Ford once will let us run Dijkstra V times. That's the reason we
can achieve this bound. But we won't be seeing
that for a while. For starters, I want to
show you some connections between all-pairs shortest
paths, and dynamic programming, and matrix multiplication,
which turn out to give-- for
dense graphs, we're just achieving V cubed
in all situations. So our first goal is going
to be to achieve V cubed time for general edge weights. So we're going to first
achieve this bound. That will be a lot easier. And then eventually, we
will achieve this bound. So the Floyd Warshall
algorithm and some of these will get very close to V cubed. All right. So we're going to start
with our first approach to solving all-pairs
shortest paths-- that is not using an existing single
source algorithm-- is dynamic programming. Someone mentioned that already. It's a natural approach. Shortest paths is kind
of dynamic programming. In fact, most
dynamic programs, you can convert to single-source
shortest paths, typically in a DAG-- not all,
but a lot of them. So we could try
dynamic programming. Now, I'm going to preach
to you a little bit about my way of thinking
about dynamic programs. If you watched the
006 OCW version, you've seen the five easy
steps to dynamic programming. And if you haven't, this will
be new, otherwise, a reminder. First thing I like to think
about in a dynamic program is, what are the subproblems? The second thing I like to think
about is, what am I guessing? I'm going to guess some
feature of the solution. Third thing is, I want to
write a recurrence relating subproblems solutions. Then, I'm basically
done, but there's a couple wrap up
things, which I'm going to have to use
another board for. So number four is, I need
to check that I can actually resolve these subproblems
in some order that's valid. Basically, this is saying
that the constraint graph on some problems
should be acyclic. Because if there's a cycle
in the constraint graph, you take infinite time. Even if you memoize, if you do
infinite recursion-- bad news. You'll never actually
finish anything, so you never actually write
anything in the memo table. So I want to make sure
that it's acyclic. I mean, this is
really the same thing we're talking about in
this erased row, which is topological ordering. Personally, I like to-- you
could argue that it's acyclic. I like to just write down,
here's a topological order. That's a nice proof
that it's acyclic. If you write that
down as for loops, then you actually
have a bottom up dp. If you just take the recurrence,
stick it inside the for loop, you're done, which
we'll do in a moment. I guess I need a row for that. And finally, you need to
solve the original problem. And then, there's
analysis and so on. But for specifying
the algorithm, these are the key
things you need to know. The hard part is figuring out
what the subproblems should be, so that your dp becomes fast. Running time is going to
be number of subproblems times time per subproblem. For each subproblem,
usually we're going to want to guess some
feature of the solution to that problem. Once we do that, the recurrence
becomes pretty trivial. Just for each guess, you
say what it should be. So these are the
really hard two steps. And then, OK, we checked
that it's acyclic. And we make sure
that we can actually solve our original problem
using one of the subproblems. Sometimes, our original problems
are some of the subproblems. I think that will happen here. But sometimes you need to do a
little bit of post-computation to get your answer. All right. So what am I going to
do for subproblems? Well obviously, I have a bunch
of different problems involving pairs of vertices. I want to find delta of u,v for
all u and v. That, I erased. But that's the problem. So I want to know what is
the weight of the shortest path from u to v? If I stop there and said
that was my subproblems, bad things are going to
happen, because I will end up-- there's no natural way to
make this thing acyclic. If I want to solve u to
v using, I don't know, u to x, and then
x to v, something like that-- there's
no way to get out of the infinite recursion loop. OK? So I need to add
more subproblems to add more features to my
solution, something that makes it so that when I
try to solve my subproblem, I reduce it to
other subproblems. Things get smaller, and so I
could actually make progress. So there are actually two
natural ways to do this. I'll call them method
one a method two. Method two is actually
Floyd Warshall. But any suggestions on
how we might do this? This is a harder problem. This is in some sense,
a kind of guessing, but it's like I'm
going to guess ahead of time that somehow
there's an important feature of the shortest path. I'm going to
parameterize by that, and somehow it's
going to get smaller. Yeah? STUDENT: [INAUDIBLE] PROFESSOR: Right,
shortest path-- it uses at most a
given number of edges. Let's parameterize
by how many edges. I think I'll use m. So using at most m edges. Very good. So good, I think it
deserves a purple Frisbee. All right, I'm getting
better, slowly. By the end the semester,
I'll be pro at frisbee. I should enter a competition. So this is, of course, an
additional restriction. But at the end of the day,
the problem I want to solve is going to be essentially
duv, let's say, n minus 1. If I want a shortest
path that does not repeat any vertices,
then certainly it has at most n minus 1 edges. So in fact, the claim would
be that duv equals duv n. I mean, and so on. If you go larger than n minus
1, it shouldn't help you. If you know that your
shortest paths are simple-- if you know that
shortest paths don't repeat vertices. So this would be if there are
no negative weight cycles. If there are no
negative weight cycles, then we know it never
helps to repeat vertices. So in that situation,
we would be done, if we could
solve this for all m. Now, slight catch--
well, how do we know there's no
negative weight cycles? You know, we could run
Bellman-Ford, I guess. That's a little tricky,
because that only finds reachable
negative weight cycles. In fact from this
picture, we will end up knowing whether there are
negative weight cycles. So there will be no negative
weight cycles, if and only if there's no negative diagonal
entry, say dvv n minus 1. So it turns out,
this algorithm will detect there's a
negative weight cycle by finding that the distance
from v to v is negative. Initially, it's going to be 0. If it turns out
to be negative, we know there's negative
weight cycle. With more work,
you could actually find all the reachable
pairs, and so on. But I'm not going to worry. I'm just going to say, hey,
a negative weight cycle. I'm going to throw my
hands up in the air and give up for today. OK. Cool. So I can solve my
original problem, if I can solve
these subproblems. And now, things are easier,
because we can essentially assume in solving this
problem that we've solved smaller subproblems
for however we define smaller. That's what's given by
this topological order. Obvious notion of
smaller is smaller m. Presumably, we
want to write this with an m in terms of this
with an m minus 1 or smaller. So this gets to
our guessing part. What feature of a
shortest path could we guess that would make
it one edge shorter? There's probably
two good answers. Yeah? The next edge, which I guess
you mean the first edge? Sure. Could guess the
first edge, or you could guess the second edge. Or no, that would be harder. Or I mean, the last edge,
that would also work. Okay, this is a harder one. Uh, nope. Good thing there's students
everywhere to catch it. Cool. So I'm going to
guess the last edge. That's just how I've
written the notes. But first edge would
also work fine. So I'll call the last
edge x comma v. We know we end by going into v. So
let's guess the vertex previous to it in the shortest path. Now of course, we don't
know what that edge is. The guessing just means try
them all as you saw last time. So now, it's really
easy to write the recurrence-- see if I can do
it without looking at my notes. So we've got duv of m. We want to write-- we want
to find the shortest path, so it's probably going to
be a min on the outside. And we're going to consider
the paths of the form-- d go from u to x using fewer edges. Right, if this is the last edge,
then we use m minus 1 edges to get to x. And then, we follow
the edge x comma v. So I'll just add on the
weight of the edge, x comma v. If x was the right answer,
this would be the cost to get from u to v
via x, where xv is a single edge at the very end. We don't know what x
should be, so we're just going to do for loop
over x for x in v. So this is using
Python notation. And that will find
the best answer. Done, easy. Once you know what
the subproblems are, once you know what
the guessing is, basically, I'm just adding
in a min and a for loop to do the guessing. So that's my recurrence, except
I should also have a base case. Here it's especially important. So base case is going to
be when m is the smallest. So that, let's say, is 0. What is the weight of getting
somewhere using 0 edges? Well, typically it's
going to be infinity. But there is an interesting
situation, where at 0 namely when u equals v.
Hey, there is a way to get from u to
itself with 0 edges. And that costs 0. But anywhere else is
going to cost infinity. There's no path. So the sort of a
definition, I should say. If it exists, otherwise
it's infinity. So then I get those infinities. But this is kind of important,
because maybe I actually use fewer than m edges. I wrote less than
equal to m here. This is also less than equal
to m minus 1 edges here. But in some sense, I'm
including the case here, where x equals v. So I just
stay at v at the very end. So it's because I have
a 0 here, I'm implicitly including a situation
where actually, I just use duv m minus 1. That's in that min here. So it's important to
get the base case right. Cool. Almost done. I need an acyclic ordering. So as I said, things get
smaller when m is smaller. So all that means is do the
for loop for m on the outside. Then do the for loop for
u in v. For those, it doesn't matter what
order you do it, as long as you've done
all of m equals 0, before you do all of m equals
1, before you do all of m equals 2. So that's the nested for loops
that gives you the right order. And so, I guess I take this
line and put it on the top. And I take this line, the
induction of the currents, put it inside the for loops,
that is my bottom-up dp. OK, I'm actually going to
write it down explicitly here for kicks. Should I bother? Uh, what the hell. So first we do a for loop on m. Then, we're going to do
the for loops on u in V. Now, inside the for loop,
I want to compute this min. I could use this
single line, but I'm going to rewrite it slightly to
connect this back to shortest paths. Because this type of
statement should look familiar from Dijkstra and Bellman-Ford. This is called the
relaxation step. OK. It probably would
look more familiar if I wrote here w of x v.
You could also write wxv. That's an alternative
for both of these-- probably should
write the same way. But in either case, we call
this a relaxation step, because-- it's kind
of a technical reason but-- what we'd
like to satisfy-- we know that shortest paths
should satisfy the triangle inequality. If you look, there's three
vertices involved here, u, v, and x. We're looking at the
shortest path from u to v compared to the
shortest path for u to x, and the shortest
path from x to v. Certainly, the shortest
way to get from u to v should be less than or equal
to the shortest path of u to x plus x to v, because
one way to get from u to v is to go via x. So this if condition
would be a violation of the triangle inequality. It means we definitely do
not have the right distance estimates if duv is
greater than ux plus xv. OK. So if it's greater, we're
going to set it equal. Because here, we know a way
to get from u to v via x. We know that it's possible to do
this, assuming our d values are always upper bounds on reality. Then, this will be an
upper bound on the best way to get from u to v. So this is clearly
a valid thing to do. Relaxations are never bad. If you start high,
these will always improve your shortest paths,
so you get better estimates. And that's exactly what
Dijkstra and Bellman-Ford did, maybe with w instead
of d, but they're all about fixing the
triangle inequality. And in general and
optimization, there's this notion of, if
you have a constraint, an inequality constraint
like the triangle inequality, and it's violated,
then you try to fix it by the successive relaxations. So that's where the
term comes from-- doesn't really matter here,
but all of our shortest path algorithms are going
to do relaxations. All the shortest path algorithms
I know do relaxations. So this is familiar, but it's
also doing the same thing here. I just expanded out
the min as a for loop over x, each time checking
whether each successive entry is better than
what I had already. And if it is, I update. So in the end, this will
compute a min, more or less. I cheated also because I
omitted the superscripts here. If I put m here, and m
minus 1, here and 1 here, it would be exactly
the same algorithm. I'm omitting all
the superscripts, because it can only help me. Relaxing more can
only get better. And if I was guaranteed
correct over there, I'll still be guaranteed
correct over here. You have to improve
the invariant, but you never--
relaxation is always safe. If you start with upper bounds,
you always remain upper bounds. You're doing at least the
relaxations over there. And so you will in the end
compute the correct shortest path weights. The advantage of that,
mainly, is I save space. And also, it's simpler. So now I only need
quadratic space. If I had a superscript,
I'd need cubic space. Right? So I did a little simplification
going from the five step process to here-- both of the
polynomial time and space, but this is a little
bit, a little bit better. But how slow is this algorithm? How long does it take? Yeah? V cubed, that would be great. V to the fourth, yeah, good. Sadly, we're not doing so great
yet-- still V to the fourth. V to the fourth, I guess
I already knew how to do. That was if I just run
Bellman-Ford V times, I already knew how to
do V to the fourth. So I haven't actually
improved anything. But at least you see, it's all
dynamic programming in there. So n here is the size of V.
That's probably the first time. Cool. I omitted 0, because
there was the base case. That's done separately in
the line that I didn't write. So that was dp one. Time for dp two, unless
there are questions? Everyone clear so far? Yeah? STUDENT: When you
iterate over x, why do you do you
every [INAUDIBLE] PROFESSOR: As opposed to--? STUDENT: Like, just adjacent
vertices [INAUDIBLE] PROFESSOR: Oh, yeah. OK. Good, fair question-- why do I
iterate over all vertices, not just the incoming? If I'm writing w of xv, I
could afford to just say, just consider the incoming vertices. And that would let
me improve probably from V to the fourth to V cubed
times-- V squared times E, I think, if you do
the arithmetic right. You could do that. It would be better. For dense graphs, it's
not going to matter. For sparse graphs
it will improve V squared to E, basically. But we're going
to do even better, so I'm not going to
try to optimize now. But, good question. When I say this at
the moment, if there is no edge from x to v,
I'm imagining that w of xv equals infinity. So that will never
be the minimum choice to use a non-edge. I should say that. If there's no edge here,
I define the weight to be infinity. That will just make
algorithms cleaner to write. But you could optimize
it the way you said. So, where were you? Yeah. Ah, perfect. OK. Other questions? More Frisbee practice? No? OK. So that was dp one. Let me do dp two. Not yet, sorry-- diversion. Diversion is matrix
multiplication. Before I get to dp
two, I want to talk about matrix multiplication. This is a cool connection. It won't help us directly
for shortest paths, but still pretty good. And it will help-- it will
solve another problem especially fast. So shortest paths is
also closely linked to matrix multiplication,
a problem we've seen a couple of times,
first in the FFT lecture, and then in the
randomization lecture for checking matrix multiplies. So you remember, you're
given two matrices, A and B. And you want to
compute C equals A times B. You've seen Strassen's
algorithm to do this. There's also these--
here A and B are squared. OK. So the n by n, product
will be n by n. So standard approach
for this is n cubed. With Strassen, if
I can remember, you can get n to the 2.807. And if you use CopperSmith
Winograd, you get 2.376. And then, if you use the
new Vassilevska-William's algorithm, you get n to
the 2.3728 and so on. And that's the best
algorithm we know now. There's some evidence maybe
you can get 2 plus epsilon for any epsilon. Turns out, those are going
to help us too much here. But what I want to show is
that matrix multiplication is essentially doing
this, if you redefine what plus and dot mean. We redefine addition and
multiplication-- talk about whether that's valid
in the moment-- so remember what is
matrix multiplication? Cij is a dot product
of a row and a column. So that's aik with bkIj. K equals 1 to n. OK. Now that sum looks
a lot like that min. Actually, more like the
way I wrote it over here, with the d's instead of the w's. This-- right-- x is the
thing that's varying here. So this is like, aik plus
bkj, except, I have plus here, whereas I have times over here. And I have a min out here,
but I have a sum over here. So it sounds crazy, but let's
define-- be very confusing if I said define dot
equals plus, so I'm going to define a new
world called circle world. So if I put a circle around
a dot, what I mean is plus. And if I put a circle around
a plus, what I mean is min. OK? So now, if I put a
circle around this dot, I mean circle everything. So I've got to circle the
summation, circle this thing. So then I get shortest paths. Crazy. So all right, I'm going to
define d to the m-th power. I should probably
circle-- whatever. Slightly different. OK, I want to define,
like, three things at once. So let me write them down,
and then talk about them. So many infinities. All right. OK. I guess, I should
write this too. OK. If I define the vert-- suppose
I number the vertices 1 through n, OK? I just assume all vertices are
an integer between 1 and n. So then, I can actually express
things in a matrix, namely the weight matrix. This kind of defines
the graph, especially if I say wij is infinity
if there is no edge. Then, this is the matrix of
all pairwise edge weights. For every i and j, I
have a weight of ij. That gives me a matrix, once
I set V to be 1 through n. Now, I'm also defining this
distance estimate matrix. So remember, we
defined duvm-- I'm going to now call it dijm,
because the vertices are integers. That is, the weight
of the shortest path using, at most, m edges. If I define it that
way, then I can put it into a matrix, which is for
all pairs of vertices ij. What is the distance,
shortest pathways that uses it at most m edges? That gives me a matrix,
d parenthesis m. Then, I claim that if I take
circle product between d of m minus 1 and w, that is
exactly what's happening here, if you stare at it long enough. This is the inner
product between row u of d to the m minus
1, and column v of w. And that's exactly what this
circle product will compute. So this is dp. But when you look at
that statement, that's saying that d to
the parentheses m is really w to the
circle power m, right? This is a definition
in some sense of power, of exponentiation,
using circle product. So when I circle
the exponent, that means I'm doing circle
exponentiation in circle land, OK? OK so far? So this is circle land. So you might say, well, then I
should compute these products using matrix multiplication. Now, just to see how
good we're doing, if I execute this operation
n times, because I have to get to d
to the n minus 1-- so it's basically d to the n. If I do this product n
times, and for each one of I spend n cubed time, then I get
an n to the four algorithm. Same algorithm in fact,
exactly the same algorithm-- I've just expressed it
in this new language. OK, there are two ideas
on the table though. One is, maybe I could use a
better matrix multiplication algorithm. Let's shelve that for a moment. The other possibility
is, well, maybe I can exponentiate faster than
multiplying by myself n times, or multiplying by w n times. How should I do it? Repeated squaring, good. You've seen that
probably in 006. Repeated squaring idea is, we
take-- to compute w-- well, I take w to the 0. I multiply it by w to the 0. Sorry, circle 0--
that's this thing. Oh, that seems weird. Let's start with 1. 1 seems better. I'm not going to get much if
I multiply that by itself. I should get exactly
the same matrix. So I take the circle product
between w to the 1, w to the 1. That gives me w to the 2. And then, I take w to
the 2 times w to the 2. Everything's circled. I get w to the 4. Oh cool, I doubled my exponent
with one multiplication. If I take w to the 4 by w to the
4, I get w to the 8, and so on. My goal is to get to n, so I
have to do this log n times. Log n squaring operations,
each squaring operation is an n cubed thing. So this is repeated squaring. And I get V cubed log V--
finally, an improvement. So we went from V to the 4,
which was in the dense case the same performance
as Bellman-Ford, running it V times. But now in the dense case,
I'm getting V cubed log V, which is actually pretty good. It's not quite V
cubed, but close. All right. I'm pointing at V cubed. This is actually the one
result that is not optimal. This is the one we
want to improve. But we're kind of-- we're
in this space right now. We're getting close to as
good is this algorithm, the Johnson's algorithm. But we still a log V Factor. So this is great,
just by translating into matrix multiplication. Now technically,
you have to check that repeated squaring actually
gives you the same result. Basically, this works because
products are associative. Circle products of
matrices are associative, which works because circle
land is a semi-ring. If you want the
abstract algebra, a ring is something that
you wear on your a finger. No. A ring is an algebra where
you define plus and times, and you have distributivity. Semi-ring, there's no minus,
because min has no inverse. There's no way from
the min to re-compute the arguments, right? No matter what you apply to
it, you can't-- you've lost information. So that's the semi-ring. Normally, you have a minus. But semi-ring is enough
for the repeated squaring to give you the right answer. However, semi-ring is not enough
for all these fancy algorithms. So if you look at Strassen's
algorithm, the one you've seen, it uses minus. There's no way to get around
that, as far as we know. So if you have no minus, n cubed
is the best we know how to do. So sadly, we cannot improve
beyond this with this technique. It sucks, but that's life. However, we can do something. If we just change
the problem, there is another problem which
this is the best way to do. So let me briefly tell
you about that problem. It's called transitive closure. Transitive closure
is, I just want to know is there a
path from i to j. So it's going to be 1, if there
exists a path from i to j. And it's going to
be 0 otherwise. OK. I guess it's kind of like
if you set all the weights to 0 or infinity. Then, either there's
going to be as 0 way path, or there's no path, meaning
there's an infinite way path. So it's not quite the same. Here, I want 1 and 0. I flipped. It used to be, this was
infinity, and this was 0. This is one saying there is a
path from i and j, 0 otherwise. If I write it this
way, and then I think about what
I need to do here, it is still in some sense
plus and min, but not really. Because I just want to
know, is there a path? So if I have a way to get
there and a way to get there, instead of adding
up those values, really I'm taking
some other operator. So I want to know OR. Yeah, exactly-- who said OR? Yeah, all right, tough one. Close, close, close. So here, we have basically
a circle product is OR, and circle sum is AND. OK? I mean plus and min
would work, but it's a little bit nicer over here. Sorry, it's the other
way around, I think. It's definitely Booleans. We want to know there is a
way to get to x, and then from x to where we're going. That's an AND. And then, to get
a path in general, it has to work for some x. So that's the OR. And this is a ring. And once you're ring,
you have negation. You can apply
Vassilevska-Williams. And you solve this problem
in n to the 2.3728. And if I just make a little
change in the dot dot dot, I can absorb the log. So you could put a log n here. And it's log n if you get
the exponent exactly right. But if you just tweak the
exponent by 0.00000001, that's bigger than log n. So we usually omit
the log there. Cool. Transitive closure-- so it's
a problem you didn't know you want to solve, but it is
actually a common problem. And this is the
best way we know how to solve it for dense graphs. OK, it beats, you know, V cubed. This is the algorithm we're
aiming for for dense graphs. For sparse graphs,
we can do better. But for dense graphs,
this is better. Finally, we get to go to
dynamic programming number two, also known as the
Floyd-Warshall algorithm. So we had this dp
in V the fourth. If we forget about
transitive closure, we've now are down
to V cubed log V. Our next goal is to achieve V
cubed, no log V. Let's do that. So again, I'm going to
express it in my five steps. First step is, what
are subproblems? And this is the key
difference, and the key insight for Floyd-Warshall is to
redefine the dij problems. To avoid conflict,
I'm going to call them cij, or in this case,
cuv, because here, the matrix product view will
not work, I think. Yeah, it won't work. So it's a totally
different universe. I'm still going to assume that
my vertices are numbered 1 through n. And now, the idea
is, first I'm going to think about the graph formed
by the vertices 1 though k, roughly. And I want to know for every
vertex u and every vertex v, what is the shortest
path from u to v, or the weight of the
shortest path from u to v that only uses intermediate
vertices from 1 through k. So actually, u and
v might not be-- they might be larger than k. But I want all the vertices
in the path to be 1 through k. This is a different way
to slice up my space, and it's the right way. This is going to do
a factor of n better. It turns out, and
that's just an insight you get from trying all the
dp's you could think of. And eventually, Floyd and
Warshall found this one, I think in the '70s. So it was easier back
then to get a new result. But I mean, this is very
clever-- so very cool idea. So now the question is,
what should I guess? Before I guessed what
the last edge was. That's not going to
be so useful here. Can anyone think of a
different thing to guess? We're trying to solve
this problem where I get to use
vertices 1 through k, and presumably I want
to use subproblems that involve smaller k,
that say involve vertices 1 through k minus 1. So vertex k is relevant. What should I guess
about vertex k? Yeah? STUDENT: Guess that vertex
k is the [INAUDIBLE] PROFESSOR: You want
to guess vertex k is the i-th intermediate vertex. That would work, but I would
need to parameterize by i here, and I lose another
factor of n if I do that. So I'd like to avoid that. That is a good idea. Yeah? STUDENT: [INAUDIBLE] visit
k, before you visit v. PROFESSOR: You're going
to guess that I visit k, and then I go to where
I'm trying to go. OK. That's not a-- OK. That's a statement. But to guess, I should
have multiple choices. What's my other choice? STUDENT: [INAUDIBLE] PROFESSOR: Yes. So either I use
vertex k, or I don't. That's the guess-- is k in
the path at all from u to v? So that's a weaker
thing than saying, k is at position i in the path. Here I'm just saying,
is k in the path at all? And that's nice,
because as you say, I already know how to get
there without using k. Because that's cuvk minus 1. And then, you just also have
to consider the situation where I go to k, and then I leave. So the recurrence is going to be
cuvk is the min of two things. One is when k is not in the
path, that's cuvk minus 1. And the other option is that
I go to x first-- or sorry, I go to k first. It used to be x. Now, I've renamed it k. I don't know why. k minus 1-- and then I go
from k to the v-- k minus 1. That's it. Min of two things-- before, I
was taking the min of n things. Before, there were n
choices for my guess. Now, there are two
choices for my guess. Number of subproblems is
the same, still V cubed. But the guessing part
and the recurrence part is now constant time
instead of linear time. So I'm now V cubed
time-- progress. OK? This is pretty cool. The old dp led us to this
world of matrix multiplication. That's why I covered it. This new dp is just
a different way of thinking about
it-- turns out to be faster, just by log factor,
but a little bit faster. I need some base cases-- cuv
of 0 is going to be-- now it's the weight of the edge uv. It's a different base case. Before, I was using 0 edges. Now, it's not using any
intermediate vertices. So that's how the weights
come into the picture, because actually there are
no weights of edges up here. So that's a little weird. The only place the weights
come in is when k equals 0. This is in some sense
still relaxation, but it's a little bit weirder,
little bit different order. I mean the key thing
here is, because the way we set up these subproblems
with the intermediate vertices, we know k is the only
vertex in question. Before it's like, well, I don't
know where you go at the end. But now we know that either
k is in there, or it's not. And in each case, we can compute
it using smaller subproblems and so we save
that linear factor. STUDENT: Is this only
for [INAUDIBLE] graphs, or is does it also [INAUDIBLE] PROFESSOR: This is
for directed graphs. u and v are ordered here. And this is the weight from u
to v-- will work just as well. It's probably a
little bit instructive to write this down as
nested for loops again. Why not? Because then, you'll see
it's just relaxation again. So I'll even write the base case
here, because it's very simple. We're doing k in
order, let's say. These are really the
same kinds of for loops. But I'll write them slightly
differently, because here we care about the order slightly. Here, we do care
about the order. Here, we don't care
about the order. Vertices, and all we're
saying is-- almost exactly the same code as before. This is, again, just
a relaxation step. We're just relaxing
different edges in different orders, basically,
because k is evolved in here. We do that for k equals 1. Then for k equals 2, and so on. But in the end, it's
just relaxations, so you can use that to prove
that this actually computes the right shortest paths. I won't do that here. But clearly, cubic time
instead of quartic. Pretty cool. That's Floyd-Warshall. It's very simple. And so a lot of
people-- if you need to solve all-pairs shortest
paths in dense graphs, this is the best
we know how to do. So this is what you
should implement. It's like, five lines of code. And you achieve this bound. But for our sparse
graphs, we can do better. And the rest of
lecture is going to be about Johnson's algorithm,
where for sparse graphs we're going to get
closer to quadratic time. We're going to match
running Dijkstra, and it's V squared log
V plus E times V, sorry. So when E is small, that's
going to be close to quadratic. When E is big, it's
going to be cubic, again. So we'll never be worse than
this Floyd-Warshall algorithm. But for sparse
graphs it's better. OK. Johnson's algorithm. I was going to make some joke
about Johnson and Johnson, but I will pass. So Johnson's algorithm--
I mean, dp is five steps, but Johnson's algorithm's only
three steps-- clearly simpler. It's actually much
more complicated, but it's all about
what the steps are. So here's the crazy idea
in Johnson's algorithm. We're going to change
the weights on the edges. And to do that, we're going
to assign weights to vertices. We're going to
choose a function h. Think of it as a height
function, I guess, that maps vertices
to real numbers. And then, we're going to
define w sub h of u,v. This is a new way to think about
edge weights that depends on h that's defined in a simple way. It's the old edge weight
plus h of u minus h of v. You could define
it the other way, but it's better to
be consistent here. So this is a way to
tweak edge weights. For every edge-- this is
for directed graphs clearly. For u, u is the beginning,
the head of the-- I don't know if it's
the head or the tail-- the beginning of the edge. v is the end of the edge. I'm going to add
on the height of h, and subtract out
the height of v. OK? Why? Because that's the definition. I want this to be greater
than or equal to 0. That's the such that. I want to assign a function
h, so that these new weights are all greater or equal to 0. This is for all u and
v. Why would I do that? STUDENT: To use
Dijkstra instead of-- PROFESSOR: To use Dijkstra
instead of Bellman-Ford, exactly. So that's step 2. Run Dijkstra on, I
guess, the usual graph. But now, this new
weight function, w sub h, if all the
weights are non-negative, I can run Dijkstra. So this will give me what I
call the shortest path sub h of u comma v for all u and v. It doesn't give me the actual
shortest path weights I want. It gives me the shortest
path weights using this wh. But I claim that's
almost the same. I claim that this
re-weighting preserves which paths are shortest. Because-- so in particular,
I claim that delta of u,v is delta sub h of u,v--
should be the other way-- minus h of u plus h of v. OK. If this was a single edge, you
can see I'm just cancelling off these terms. But in fact, I claim for a whole
path, every path from u to v gets changed by exactly
the same amount. So this is a claim about
the shortest path-- in effect, a claim
for every path from u to v, shortest or not. If I measure it in
regular weights w, versus weights w sub
h, the only difference is this fixed amount,
which depends only on u and v-- does not
depend on the path. And therefore, which paths
are shortest are preserved. And so when we compute
these shortest path weights, we can translate
them back to what they should be in the
original weighting function. And furthermore, if you
have parent pointers, and you actually find the paths,
the paths will be the same. Shortest paths will be the same. OK. So let's prove that claim. It's actually really simple. Let's look at a
path from u to v. So I'm going to label the
vertices along the path. V0 is going to be u. That's the first one, then
V1, then V2, and so on. Let's say path has length k. And Vk is v. OK, that's just
a generic path from u to v. And now, I want to compute
the w sub h of that path. Excuse me. So the weight of a
path is just the sum of the weights the edges. So I could write this as
a sum from i equals 1 to k of w sub h of Vi
minus 1 comma Vi. I think that works,
got to be careful not to get the indices wrong. OK, now, w sub h is defined
to be this thing-- w plus h of u minus h of v. So
this is the sum i equals 1 to k of w Vi minus 1 Vi plus h
of Vi minus 1 minus h of Vi. What does the sum do? Telescope. So success-- this Vi is going
to-- this negative h of Vi is going to cancel with
the plus h of Vi minus 1 in the next term, except for
the very first one and the very last one. So this is going to be
this sum, which is just the weight of the path according
to regular weight function, plus h of V0 minus h of Vk. And that is just the weight to
the path plus h of u minus h of v. Did I get it right? Nope. Yes? STUDENT: [INAUDIBLE]
subtract [INAUDIBLE] PROFESSOR: But it's not-- it's
opposite of what I claimed. So right, because it's
the other side, good. This has h on the
right hand side. This has not h on
the left hand side. But here, I have h on the
left hand side, and not h on the right hand side. So if I flip it around,
if I take these two terms, put them on the
left hand side, then I get this with the right sign. Cool, whew. Self-consistent. OK. So this was talking
about are arbitrary path. And so this is proving
the stronger thing I said, that every
path gets lengthened by this function,
which is purely a function of the endpoints. So in particular, that means
the shortest path in w land will still be the shortest
path in w sub h land-- slightly less cool name than
circle land, but oh well. All right, so this means
shortest paths are preserved. Shortest paths are
still shortest. And therefore, if I look at
the delta function, which is about the shortest path
weights, this claim holds. So that's the
proof of the claim. Cool. There's one gaping problem
with this algorithm, which is how in the
world do you find this h? If we could find h, then we
know we could run Dijkstra, and we can do this thing. And Dijkstra is going to
cost the VE plus V squared log V. I didn't say it, but
we run V times Dijkstra. All right, we run it V times. That's going to take V
squared log V plus VE to do. This is just going to
take quadratic time, V squared to update
all the weights, update all the delta functions. The missing step is how do
we find this weight function? I claim this problem of finding
h that has this property, is very closely related
to shortest paths. It's weird, but we're going
to use shortest paths to solve shortest paths. So let's do it. Step 1, finding h. What I want to do, so I
want to have w of u,v-- let me just copy that down--
plus h of u plus h of v to be greater than or equal to 0. Whoops, minus. I'm going to put the h's
on to the right hand side, and then flip it all around. So this is like saying h of
v minus h of u is less than or equal to w of
u,v for all u and v. This is a problem we
want to solve, right? w's are given. h's are unknowns. This is called a system
of difference constraints. If you've heard about linear
programming, for example, this is a special case
of linear programming. Don't worry if you haven't
heard, because this is an easy special case. We're going to solve it
much faster than we know how to solve lineal programs. It's a particular kind of thing. This is actually useful problem. You could think of,
these are maybe times that various events happen. And these are constraints
about pairs of them. Says, well, the start
time of this event minus the end time of that
event should be less than or equal to 1 second. You can use this to do
temporal programming, if you could solve
these systems. We're going to solve
these systems, when they have a solution. They don't always have a
solution, which is a bit weird, because we're relying on them
always having a solution. How can that be? Negative weight cycles. This is all going to work, when
we don't have negative weight cycles. And that's exactly going
to be the case when this system of difference
constraints has no solution. So let me show you that
in a couple of steps. First theorem is that if the
graph V,E,w has a negative weight cycle, then that
system has no solution-- no solution to the
difference constraints. This is going to be,
again, an easy proof, kind of similar to
last one actually. So consider a
negative weight cycle. Let's call it V0, to V1,
to V2, to Vk, back to V0. So the claim is the sum of
these weights is negative. And now, I'm just going to
write down these constraints, which are supposed to have a
solution, or maybe they won't. So if it has a
solution, then this must be true, where u
and v are plugged into be Vi and Vi minus 1, because
those are all edges. So I'm going to write h of
V1 minus h of V0 is less than or equal to w of V0 wV1. And then h of V2 minus
h of V1 less than or equal to w of V1 V2. Repeat that k times,
I'm going to get h of Vk minus h of Vk minus 1. And then, the last
one, the wrap around h of V0 minus h of Vk w of V--
did I get it right-- Vk V0. What do I do with
these inequalities? Sum them. Time for a Good Will Hunting
moment-- do you remember? I hope we've all seen
Good Will Hunting. I don't have a
janitor here, so I have to do all cancels
by hand-- and this. So I end up with
0 at the bottom. Everything cancels, and then,
over here I have less than or equal to the weight
of the whole cycle. I'm just adding up the
weight of the cycle. I didn't give the cycle
a name-- call it C. Now, the cycle has
negative weight. So this is less than zero,
strictly less than zero. So we're saying that 0
is strictly less than 0. That's not true. So that means
there's no way to get all of these constraints
simultaneously true-- proof by a contradiction. So that establishes a
connection in the direction we don't want it. What we want is
they're converse, which is if there's no
negative weight cycle, then there is a solution. Luckily, that is also true. But this is a little
easier to see. So now, we do the other half. OK. And this will-- I mean, it's
going to be constructive proof. So we're going to
actually know how to solve this problem with an algorithm. So it's going to be-- there
is a negative weight cycle if and only if there's no solution. So in particular, the
case we care about is if there's no
negative weight cycle, then there is a solution. We kind of care about both,
but this is the more practical direction. So let's prove it. You can already
see-- you've seen that there's a connection
in negative weight cycles. Now, I'm going to show there's
a real connection to shortest paths. Negative weight
cycles are just kind of a symptom of the shortest
paths being involved. So now, we're going
to use shortest paths. Suppose we have some graph. I'm going to draw a simple
little graph with weights. What I'd like to do is
compute shortest path from a single source
in this graph. The question is, which source? Because none of the vertices--
I guess in this case, this would be a pretty good
source, because it can reach. From here, I can
get to every node. But in general-- maybe
there's another vertex here-- draw a more
complicated picture. It could be, there's
no one vertex that can reach all the others. For example, it may be
the graph is disconnected. That's a good example. So there's no single source
that can reach everywhere. I really want to
reach everywhere. So what am I going to do? Add a new source. Call it s. I'm going to add an edge
to every other vertex. Now, I can get
everywhere from s. OK? What are the weights? 0. 0 sounds good. I don't want to change the
weights, in some sense. So I put 0, and add
0 to everything. That's not going to change much. Now, notice I add no
cycles to the graph. So if there were no negative
weight cycles before, still no negative weight
cycles, because the cycles are the same as they were before. But now, from s I
can reach everywhere. If there's no negative
weight cycles, that means there's a
well-defined, finite value for delta of s comma v
for all V. And that is h. What? It's crazy man. All right, so we add s to
V. We're going to add s comma v to e for all
V. That's the old V. And I'm going to set
the weight of s comma v to be 0 for all V. OK,
that's what i just did. And so now, delta of s
comma v is finite for all V. It's not plus
infinity, because I know there is-- it's got
to be less than 0, right? I can get from s to everywhere. So it's less than
positive infinity. It's also not negative
infinity, because I've assumed there's no negative
weight cycles anywhere. So I'm going to let h
of v be delta of s,v. I claim that just
works, magically. That's insane. Every time I see it, it's
like, got to be crazy man-- crazy but correct. That's Johnson. It's like you just pray that
this happens, and it works. Why would it happen? Why would it be that--
what do we want to say-- w of u,v plus h of
u minus h of v-- we want this to be greater
than or equal to 0. I guess I had already
rewritten this way. Neither way is the right
way, so it doesn't matter. So let's see. We have a weight of u,v. We have
the shortest path from s to u. And we have this the
shortest pathway from s to v. We want that to be
greater than or equal to 0. Why? Put this over there, and I get
delta s,v is less than or equal to delta of s,u plus
w of u,v, which is? Triangle inequality,
which is true. It turns out, this thing we've
been staring at for so long is actually just
triangle inequality. So of course we want to
compute shortest paths, because shortest paths
satisfy triangle inequality. The whole name of the
game in shortest paths is to find a place
where you don't satisfy triangle inequality and fix it. So if it makes sense, if
that's possible to do, Bellman-Ford will do it. So how we're going to do step 1? We're going to run
Bellman-Ford once. We're going to add this
source vertex, so that there is a clear source to
run Bellman-Ford from, and then, run Bellman-Ford
from there only. That will give us
a weight function for the vertices,
namely how long does it take to get from
s to those vertices. Those weights will
actually all be negative. But then, we're going
to modify all the edge weights according to
this formula, which negates some of them. So some of them are going
to go up some, some of them are going to go down. It's kind of weird. But when we're done,
all of the weights will be non-negative because
we had triangle inequality. And now, we can run
Dijkstra from every vertex. So it's like we
bootstrap a little bit. We run Bellman-Ford
once, because we know it handles negative weights. It will also tell us if there
are any negative weight cycles. That's why we want this theorem. Maybe Bellman-Ford says, I can't
satisfy triangle inequality, because there's a
negative weight cycle. I don't know what to do. Then, we know, well actually,
then there was no solution. OK, that's kind of interesting. But then, we'll have to
deal with the shortest paths-- sorry-- deal with
those negative weight cycles. I won't cover how
to do that here. But you can. And otherwise, there's no
negative weight cycles, then Bellman-Ford finds valid h. Then, we plug that h into here. Then, we have
non-negative weights. So in VE time, we've
reduced to the non-negative all-pair shortest paths. And then, we run
Dijkstra V times. Then, we get almost
our answers, but we have to modify them to get
back the correct weights on our shortest paths. And so we computed shortest
paths in V squared log V plus VE, because this is
how much Dijkstra costs, and because Bellman-Ford
takes less time. We're good. That's the magic. And that's all-pairs
shortest paths.