The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: OK let's get started. I once taught with
a professor who was lamenting the fact that as
the term progresses attendance in lecture tends to drop off. And gets pretty dramatic
by the end of the term when you're lecturing,
and nobody's there. And I asked him what
he did about it. And he thought about
it and he said, there's only two things that
can get students to come to lecture, candy and sex. Now we've already tried
candy, so today we're going to talk about sex. In fact we're going
to use graph theory to address a decades
old debate concerning the relative promiscuity
of men versus women. Now graphs are incredibly useful
structures in computer science, and we're going to be studying
them for the next five or six lectures. They come up in all sorts
of applications, scheduling, optimization,
communications, the design and analysis of algorithms. In fact next week, you're
going to see how to Stanford graduate students
became gazillionaires because they use graph
theoretic in a clever way. But let's talk about sex. The issue that we're
going to address today is one of the most talked
about, and most well studied, questions in all
of human sociology. On average, who has more
opposite gender partners, men or women? Now opposite gender is
going to be important. And by this I mean,
one boy, and one girl. All right, I'm not making
a political statement. It's just that the math is a lot
easier that way, as you'll see. Now I'd like to start
by taking a pole here to see what you
think about that. So raise your and if you
think men, on average, have more opposite gender
partners than women do. Only a few. AUDIENCE: In life or [INAUDIBLE] PROFESSOR: Um, you can-- [LAUGHTER] PROFESSOR: One on one. OK, so let's say over the course
of their lives, let's say, or over the course of
2010, that men in America have more opposite-gender
partners than women in America, say in 2010. Raise your hand if you think
men have more going on. All right a bunch of you. Raise your hand if
you think women have more opposite-gender partners? This is unusual. Maybe even more voted for
women, but it's close. Raise your hand if
you think it's equal. All right, about the same. Raise your hand if you think
there's no way to know, that it's hopeless to
really figure it out. All right, nobody goes for that. All right, good. All right well now in
the popular literature, I think the feelings are
different than expressed here. Pretty much universally,
in the literature, it's believed that men have
more opposite-gender partners than women. And in fact, you
could even think about that, if you
think about literature, the leader of the
harem is always a man. And he's got lots of women. In polygamist
cultures, it's always the man that has multiple
wives, not the reverse. Now not surprisingly,
this issue has been studied
"scientifically," I'll put in quotes, extensively, in
one of the largest studies ever done. Researchers from
University of Chicago interviewed 2,500 people, at
random, over several years. They brought them in,
on many occasions, to try to get the answer for
the question once and for all. And they wrote
this 700 page book, called The soul of Social
Organization of Sexuality: Sexual Practices in the US. Actually walking
around with this book has proved to be a
little embarrassing. Last week my 11-year-old
daughter saw it, and she goes dad, why do
you have this sex book. And I grabbed it back and said,
well that's for the course. I'm teaching. And I thought I'd
gotten away with it, and everything was fine. And then later that day she
texted all of our friends about the new news
that what do you know, her dad teaches sex ed at MIT. Anyway this study concludes
that on average men have 74% more opposite-gender
partners than women. There's one other
central claims. And this is in the US. OK now, when you think about it
that sounds maybe reasonable, might be OK. But not according to ABC News. They did a poll of 1,500
people in the country, in 2004, and concluded that the average
disparity is much greater. In particular, in
this study, they said that the average
man has 20 partners-- I'm assuming over
their lifetime-- and the average woman has six. And this gives a disparity 233%. So ABC News did a smaller
survey says that it's 233% here, much more than 74%. Now ABC News claimed
this is one of the most scientific studies ever done. And there was a 2.5%
margin of error. Now we'll actually talk about
what that means mathematically later in the term when
we do probability, and do study polling. Now of course I
should also mention that ABC News is
the one that said Al Gore won the presidential
election in 2000. Now the study is called
American Sex Survey, a Peak Between the Sheets. That doesn't sound
so scientific. And it was on TV, on
Primetime Live in 2004. The promo for this
is really good. It says, a groundbreaking ABC
News Primetime Live survey finds a range of eye popping
sexual activities, fantasies, and attitudes in this
country, confirming some conventional wisdom,
exploding some myths, and venturing where few
scientific surveys have gone before. By the end of today,
we're going to agree with that last statement. OK now who do you think's right? University of Chicago. Who votes for 74% as
being pretty close? A few of you. I've already slammed these guys. Who votes for ABC News
as being more accurate? Yeah, nobody. Who votes for no way to tell? I got some votes
there, all right. So how do you
tackle this problem? In theory we could do
our own 6.042 survey. I don't know how much
we'd really learn, and for sure I'd get fired. So I don't think we're
going to do that. But fortunately, this
is the kind of question that could be
handled, and actually answered, by graph theory,
even though it might be more interesting to
interview thousands of people, and find out what's going on. That's not as efficient
as using graphs. So let me start by
defining what a graph is. Informally graph is just
a bunch of dots and lines connecting the dots, it's
actually very simple. So here's to graph. These are the nodes,
and they're connected with these lines, called edges. And often the nodes,
and sometimes the edges, are labeled. For example, we might call this
x1, x2, x3, x4, x5, x6, and x7. So that's an example of a graph. Now this being a
math class, we got to give a formal
definition of a graph. And we'll usually use
the formal definition. A graph G is a pair of
sets often called V and E. Where V is a set of elements
called vertices or nodes. And it has to be non-empty
here in this class. And we'll go back and forth
between vertices and nodes. Even the text we use both
words interchangeably. And E is a set of
2-item subsets and V, and they're called edges. So for example, over
here in this picture, V is the set of nodes is x1, x2,
x3, up to x7, that's the nodes. And E, the set of
edges, is pairs, unordered pairs of vertices. So for example
x1, x2 is an edge. And it's the same
as the set x2, x1, doesn't matter the order here. Later in a week or so, we'll
talk about directed graphs where the order matters. x1, x3 is also an
edge here, and so one. Think we've got, let's
see, 1, 2, 3, 4, 5, 6, 7 edges in this graph. And the last one
would be x5, x7. Edges are also sometimes written
with this notation, x1 line x2, is another notation. And then later when you
talk about directed edges, we'll put a little arrowhead
on one end of this. Now the definition of a graph
is really pretty simple. Just think of it as dots
and lines, if you want. But there's often differences
in how people define graphs. For example, in this class we
don't allow the empty graph, i.e. the graph with no nodes. So we're going to insist
that every graph has to have at least one node in it. And that's just to
make the theorems we're going prove be true. Otherwise there's
some theorems that are false for the special
case of the empty graph. But we don't require the
graph to have any edges. In fact, it's possible you
have a graph with nodes, but no edges. For example, this graph. Three-node graph. So here G equals VE,
V equals x1, x2, x3. And E is just the empty set. Now for a general graph,
when you do have edges, we say that two nodes,
call them xi and xj, are adjacent if they're
connected by an edge, namely if xi xj is an edge. All right so for example,
x5 is adjacent to x7, but it's not adjacent to
x4, there's no edge there. Closely related is the
definition of the incidence. An edge E, which
is xixj, is said to be incident to its
end points, xi and xj. OK so, for example, if I
labeled that edge as E, E is the edge x1, x2, and this
incident to x1, and incident x2. Then we can talk about
the degree of a node. The number of edges
incident to a node is called the
degree of the node. So for example, what's the
degree of x5 over here? 3, so in this case, the
degree of x5 equals 3. The degree of x7 is 1. These guys all have
degree 0, there's no edges incident to them. Now in this class, we're going
to look at only simple graphs, at lease for a while. A graph is simple if it has
no loops, or multiple edges. Now a loop is an edge that only
connects up one node, that's a loop and we don't allow it. A multiple edge is we've got two
edges that are really the same, they connect the same endpoints. Also called a multi-edge. And those we're not going
to have in simple graphs. We don't allow this. We don't allow that. Any questions so far
about what a graph is? So how are we going
to use a graph to model the problem of
opposite-gender partners? That's the question we're after. So any thoughts about what
the nodes of the graph are going to represent? What is it? AUDIENCE: Males and females? PROFESSOR: People. Yeah, so we're going
to have people. In fact, there's two
kinds of people here. There's men, and women. All right we got nodes
here for the men. And in fact in America,
there's a lot of nodes here. All right, and so this
might be oh I don't know, say that's Tom Cruise
and Nicole Kidman. Now what's the edge
going to represent? AUDIENCE: Partners. PROFESSOR: Partners. They were
opposite-gender partners. And there's actually
more edges probably here. We could have Penelope
here, and Katie here. And well probably lots more, I
probably don't know them all. And Ben's over here with Nicole. And Nicole got Jude and Keith. There's actually
a website you can go to get a lot of
these things here. And Katie went with Josh. It's called whosedatedwho.com,
and you get big graph, you could start
filling in the edges. I don't know how reliable it is. Now it's really critical
that we're only looking edges from here to here. All right, so if there's an
edge between Tom and Ben, I don't want to know about it. Just opposite-gender partners. OK now in the USA,
the number of nodes here is about 300 million. About three million people. And the number of men nodes,
male nodes, call these VM, and this is VW, by the way,
I'm using cardinality notation. When I put bars
around a set, that is the denoting how
many are in the set. In the US there's about
147.6 men out of the 300. And the number of women--
oh we got a w here-- is about 152.4 million. So there's a little
bit more nodes on this side of the graph,
than that side in the US. What about the edges? Any idea of how many
edges there are here? We don't know. I sure as heck don't know
how many edges there are. So that we don't know. The cardinality of the
edge set we don't know, and we're not likely
to figure out. I don't even think
these surveys, really, can estimate that. But what we're
trying to figure out is the ratio of the
average degree of the men, to the average
degree of the women. Because the number of
opposite-gender partners you have is your
degree here, and you're looking for the
average guy degree, compared to the average
female degree here. That's what we're after. All right so let's
find that quantity. Let's let A sub m equal
the average number of opposite-gender
partners for men. And we can let A W be
the same thing for women. All right. Now we're trying to figure out
the answer to this question. What is A m, the
average guy degree, over the average woman degree. And in particular, the
University of Chicago says, they say it's 1.74. That the average guy as 74%
more opposite-gender partners than the average woman. ABC News says it's 3.33, that
is 233% more for the men, than the women. Now we're going to figure
this out what this ratio is. Just use a little
bit of math here, and a little bit
of graph theory. So let's write a
formula for A m. Well we're trying to figure out
the average degree over here. Well, that's pretty simple. We just add up all the
degrees, and divide by the number of nodes. And that'll give us
the average degree. So the average degree is
the sum of the degrees, over all men, x
in the set of men, of the degree of x, divided
by the number of men. Can somebody give me a
simpler expression for this? It doesn't have that
nasty sum in it? AUDIENCE: E. PROFESSOR: E. The
cardinality of E. I'm adding all the degrees here. Well that's just another way
of counting all the edges, because every edge shows
up once, and only once, in a degree count here. And this is where,
we use the fact we have opposite-gender partners. Because if I had
some edges over here they wouldn't get counted
in sum of the degrees here. All right so this is
just the cardinality of the number of edges,
divided by the number of men. Any questions about that? Because this is an
important statement about graphs in general. When I have a graph
like this-- which is called a bipartite
graph, we'll talk about more in a little bit. But where the edges go
from the left to the right if I sum the
degrees on the left, I'm just counting
the number of edges. All right, let's figure out a
formula for the average number of partners for the women. That simple that's just
sum x over the women. The degree of x, divided
by the number of women. Let me rewrite that
so it's clearer. What's a simpler
expression for this? AUDIENCE: [INAUDIBLE] PROFESSOR: Yeah,
this sum, adding the degrees of the women,
is just the number of edges, right. So that is cardinality of edges,
divided by the number of women. All right, well now we can
write, solve for our formula, average over men over
average of the women. That's E over VM,
divided by E over VW. Wow, this is nice. I don't know the number of edges
is, but it just canceled out. And this is just
the number of women, divided by the number of men. And in fact we know that. That's this number,
divided by that number, which is about 1.0325. So we just proved,
that on average, a man has 3%, or 3 and 1/4% more
opposite-gender partners than women. No need to do the interviews,
or spend years doing. That is the answer. And it has nothing to do
with the promiscuity of men, or women, nothing at all. So the Chicago study is way
off, and the ABC New study is completely nuts. It just can't be
right, this is a proof. Now what happened here? Well what's going on, what's
the reason for why this is true? Yeah? AUDIENCE: A male has
a female partner then the female has a male partner. PROFESSOR: Yeah. AUDIENCE: You're not looking
at like how many males are going to one female. The promiscuity isn't even
a part of the question. PROFESSOR: That's right. It takes two to tango. Every time you got a
guy, you got a women. And you have the number
of relationships going. The average for the men is that
number, divided by the men. Average for the women
is that same number, divided by the women. And so if there's
more women, they're going to have less
partners on average. Has to be. So it really was
a stupid question. It's very, very
simple to answer. Now as it turns out there are
endless studies like this, in the literature. In fact, a few years
ago the Boston Globe ran an explosive story about
the study habits of students on Boston-area campuses. And their surveys show that,
on average, minority students tended to study with
non-minority students more than the other way around. And they want on great
length consulting the experts as to why this might be true. Why is it the minority
students study with non-minority students
more than the other way around. Now can anyone tell
me why it is certainly true, and not surprising,
why that's the case? AUDIENCE: Because
they're the minority. PROFESSOR: Because
they're a minority. There's fewer minorities
than non-minorities. End of story, we don't
need this sociology PhD from down the street
to explain it to us. We're going to see a lot of
other bogus studies later. This is not unusual, especially
when we get the probability. Just every day there's a
new one in probability. Any questions about
this before we leave? Unfortunately that's most all
we'll say about sex today. OK. But now, in this example,
we used an edge in the graph to denote some kind of
affinity between two nodes. The two nodes liked
each other in some sense if they were
connected by an edge, or they had a
relationship of some kind. There's lots of examples
in computer science where you use an edge to
denote just the opposite. That the two nodes can't
be near each other, or don't like each other. For example,
consider the problem of scheduling
final exams at MIT. And they do this after they
find out all of your schedules, and they try to schedule
the exams so that you don't have to take two at once,
or there's as little of that as possible. For example, let's
do an example here. Say we look at
these five classes. Take 6041. And this may not be totally
accurate, but roughly. So I've got five
MIT classes, and I'm going to put an edge between
pairs of classes that have overlapping student enrollment. So in this case,
for example, we've assumed in the
drawing of his graph, that you can't have our
exam the same time is 6002, on the assumption there's
students in both classes. But you could have our
exam the same time as 6034. Because there's not
an overlapping student in both classes,
so the exams could be scheduled at the same time. So we've used a
graph to represent which courses can't have
their exam at the same time. Now let's also suppose we have
a set of slots for the exam. And say they're
all on a Wednesday. And the first slot is
Wednesday from 5:00 to 7:00. And the next one
is 7:00 to 9:00. And then, the next
one is 9:00 to 11:00. And then 11:00 to 1:00 in the
morning, and then 1:00 to 3:00, getting pretty late. And your job is to figure
out how not to have to use these later exam slots. You'd like to use
as few as possible so you're not going
too late night, or come before the
holidays, so you're not having exams on Christmas
and New Year's, for example. So the goal is to assign
slots to the nodes. Put every node in
a slot so you don't have nodes hooked by an
edge getting the same slot. Now this is an example
of what's called a graph coloring problem. So let's define that. Given a graph G, and K colors,
assign a color to each node, so that adjacent nodes
get different colors. All right, and then the minimum
number of colors you need is called the chromatic
number of the graph. So the minimum value of K, for
which such a coloring exist, is the chromatic
number OF the graph. And it's denoted by
this symbol chi of G. Because usually you want to
use a small number of colors. Now what does a color
represent when we're dealing with this problem? What's the meaning of a color? AUDIENCE: Time slot. PROFESSOR: A time slot, OK. So let's call this time slot
C1, C2, C3, C4, C5, so there's five possible colors. Now of course, we could color
this graph with five colors, every node could just
get its own color. But then somebody's taking
their exam from 1:00 to 3:00 AM, and that's a bit of a pain. Let's see if we can
do less than five. Let's say I give this color
one, let's give this one color one, that's OK, because
they're not connected. I can't give this one color one,
so I give it color two, say. Now this one I can't give color
one, because this guy got it, he can't get color two,
because that guy got it. So it give it color three. And well, I can't do
one, two, or three here, so I gotta go to color four. All right so 6042 will get
the 11:00 PM to 1:00 AM slot, not so good. Can we do any better? Can we get away
with three colors. Some say yes, some say no. How many people think you can
do three colors on this graph? A bunch. How many think you
can't do any better? All right, the vote
is mostly for three. Let's see. Any ideas? Anybody see how to do three? Yeah? AUDIENCE: Assign C4 to 6034 . PROFESSOR: Assign C4 to 6043. AUDIENCE: Or C1 to 6042. PROFESSOR: C-- I can't
do see C1 to 6042. It crashes, but can I do-- yeah? Put AUDIENCE: C1 in 6003. PROFESSOR: C1 in 6003. AUDIENCE: And get
rid of C1 in 6034. PROFESSOR: Get rid of-- AUDIENCE: Make it C2. PROFESSOR: Make this a C2. Oh, yeah. All right, these got C1,
they're not adjacent. These got C2,
they're not adjacent. This can now get C3. So we can have our exam from
9:00 to 11:00, which is better. All right, can anybody
do it in two colors? Can anybody offer a reason why
two colors may not be possible? Yeah? AUDIENCE: Because let's say you
could do it with two colors. PROFESSOR: Yep. AUDIENCE: 6041 and 6002
have to be different colors. PROFESSOR: Yes. AUDIENCE: 6042 can't be
C1, and it can't be C2. PROFESSOR: Yeah, good. So you can't in two colors,
because these three guys would violate that. You've got a triangle here. Each one of these guys has to
be different than the other two. So two colors can't work. You've got to have at
least three in this case. So three is optimal. We have just shown
for this graph, the chromatic number is three. All right, now in general doing
what we just did is very hard. No one knows a fast
algorithm for determining the chromatic number. In fact, it's a weird
kind of problem, because it's easy enough to
check that a coloring is OK. If somebody put a coloring
on the board, you can check, oh that works really simply. Just check every edge, and make
sure the colors are different. But figuring it out,
as best we know, you've got to try an exponential
number of possibilities. So if I had 100 nodes
here, my running time of the algorithm to check
all the possibilities would be exponential
and a hundred. Yeah? AUDIENCE: Can that number
just like the highest degree of each node, or nodes. PROFESSOR: Uh no. But it's no worse than
something like that, as we'll see a few minutes. That's a great observation. And we're going to come back
to that in a few minutes. But it's not just that. OK now in fact even figuring
out for an arbitrary graph if three colors
can be done, called the three-coloring problem,
that's really hard. No one knows how to solve that
in less than exponential time. In fact, one of these
NP-complete problems is what it's called. How many people here don't
know about NP-completeness? Is everybody-- all
right so all of you haven't seen NP-completeness. OK so there is a class of
thousands of problems-- in fact there's books list these
1,000 problems-- that are all NP-complete, somebody's proved
they belong in the class. And what that means is
that if somebody gave you a solution, like
a coloring here, it's easy to check really
quickly if it's valid. But figuring it
out is really hard. And if you figured
out how to solve one of those
thousands of problems, like suddenly you figured out
how to tell if any graph could work with three colors, you
would solve automatically all other thousands in the book. So it's this book of
problems you will constantly run into in your career
in computer science. And it's bad when
you run into one, because there's no good
algorithm to solve it known. But if you just
solved one of them, the other thousands would
suddenly be solvable quickly. Even better, you win a
million dollar prize. One of these
Millennium Prizes we talked about the first lecture. Even if you show you can't
find a fast algorithm for one of them, that means that known
of them have fast algorithms, and you also get
a million dollars. So this is the central problem
in computer science, and theory computing, is whether
or not you could solve these NP-complete problems. Now actually lots of
people have claim to do it. And in fact, there was a
lot of buzz in the community about a month ago when
actually a reputable researcher at HP Labs said he'd done it. He proved that you can't
solve NP-complete problems. And he got people going for
probably at least a week, until they discovered
a fatal flaw. And the proof was
actually bogus. So no one still knows
if you can solve these NP-complete
problems quickly. Now the problem is, in practice,
you run into these things all the time, like
MIT really does have to schedule the exams. So you've got to do something. You can't just go say, hey
it's NP-complete, so no exams this year, or whatever. That's not going to fly,
so you got to do something. So now this is a
problem-- many of you when you go into careers, you're
going to be faced with this. You got to do something. Any thoughts about an algorithm
for coloring graphs that might use a small number of colors? It doesn't have to
always work, or you're going to win a lot
of money if it does. But a simple
algorithm, you can't take either the 100 steps. You got to be linear,
probably, or quadratic time. That could get you a
small number of colors. Any thoughts about
what you'd do? Yeah? AUDIENCE: The number
of degrees and nodes? PROFESSOR: The
number-- what about it? AUDIENCE: The highest
degree and that node, the 6042 is [INAUDIBLE]. PROFESSOR: Yeah. AUDIENCE: So you could use that. PROFESSOR: Good, all right. So what do I do with
that-- so I found a node with a high
degree, there's three of them have
degree three here. What do I do with them? AUDIENCE: Pick a
different color to. PROFESSOR: Pick a
different color, that means I've colored
some of the others. If I pick a different
color, do I start with them, or do I finish with
a high degree nodes? Because you've got to
assign the colors to them. And high degree is important
to be thinking about. We're going to prove a theorem
in just a minute about related to degree and coloring. AUDIENCE: Start with them. PROFESSOR: Start with
them, and do what with it? Color? AUDIENCE: Yeah, and then assign
the ones that aren't connected [INAUDIBLE] to the same slots. PROFESSOR: OK, so I could--
here's a degree of theory now I can start with
color one for that. And then what do I do next? I pick-- its neighbors have to
get different colors, I guess. You'd start coloring
the neighbors. AUDIENCE: My first
instinct would be to color all the [INAUDIBLE]. PROFESSOR: OK. And what color
would use for them? AUDIENCE: Different ones. PROFESSOR: Different ones
if they're connected, or if they're not connected
you'd still use different ones? AUDIENCE: Only if
they're connected. PROFESSOR: Only they're
connected use different ones. And so if they're not connected,
you'd use the same colors? Yeah? You're going close, and it
actually works pretty well. The underlying principle you're
sort of thinking about here is you've got some notion
of the order in which you're going to process your graph. And you're going to start
with a high degree nodes, in your case. And as you go
along, you're going to start coloring the nodes. And you're going to make
sure you color them legally. And it sounds like you're going
to color them with a low color as you go along. And that is probably the most
basic graph coloring approach. And almost you could almost
say is a generic approach. So let's define that, and then
see prove some facts about it. Most of the graph coloring
algorithms in practice are based on this approach. And we're going to call it the
basic graph coloring algorithm. And for our graph G, with
vertices V, and edges E. So the first step is going to be
to order the nodes from 1 to n. Now in your case,
you were suggesting an ordering where I have
the high degree nodes first. All right. But for now we're not
going to specify that. We're going to make it
any ordering you want. And then we're going to
have a notion of an order on the colors, as well. And I don't know
how many colors, but they're going to be
numbered 1, 2, and so forth. And then we're going to process
the nodes one at a time, to N. We color the
nodes, what is step I, we color the Ith node V sub i
with the lowest legal color. And by the legal I mean you
don't color at the same node as another node that's already
been colored the same that it's adjacent to. All right so let's try this. In fact, this is sort of the
algorithm I used initially to color exam graph over there. All right, so
let's look at that. So let's say we-- let me
erase the colors here, and put an ordering on the nodes. So let's say I ordered
them with 6034 first, so this would be V1. Then 6041 is V2. Then V3, V4, V5. If that's my ordering, what
color would I assign to 6034? AUDIENCE: One. PROFESSOR: One, C1, I'd
color it first to get C1. What color does 6041 get? C1, as well, it's the
lowest possible color that's legal, and is not hooked
to this guy, so C1 is legal. What color do I give here? C2. Then I color this one next
C-- can't do C2, can't do C1, so I pick C3. And then I get to
6042 last, and I can't do one, two, or
three, so I do four. All right so algorithm,
with that ordering, gave four colors. However we know there's a way
to do a different ordering that gives us three colors. In particular,
let's see if we do this what happens if we
use this other ordering. Let me erase these. Say that's V1, V2, V3, V4, V5. Now I get C1, this
will be C2, C1. What's this one get? C2. Ah, much better. C3. So different orderings result
in different numbers of colors here. So the whole art now becomes
finding a clever ordering. And so many people have
already had good ideas, pick the largest
degree nodes first. And in fact, if you simulate
the algorithm on lots of graphs, you do better on average when
you color the larger degree nodes first. And then if you start to
use more exotic orderings, you can do even better. If you take a lot of
graphs that are out there, and run your algorithm,
and see how well you do, you do better with more
sophisticated orderings. In fact, this was my
senior thesis back when I was
undergraduate student. I was trying to figure out
better and better orderings that worked for graphs. And at the time it caused
a bit of a problem. I was a undergraduate
at Princeton. And Princeton, to
this day I think, still has exams after the
holidays, the Christmas holidays, New Year's holidays. And the students wanted to have
the exams before Christmas, because they hated going
home for the holiday, and then you've got to
worry about your exams when you come back. And the faculty said
no, there's no way to get them all compressed
into a small number of days. Now I wasn't aware of
all that of the time. But my thesis was go
figure out good ordering. So I tried lots of
different orderings. And I tried the
largest degree first, and recursive versions of that
actually worked very well. And then tried it on the
Princeton exam graph. And lo and behold, you could
actually squish it down, so you could give all
the exams, I think was, 4 and 1/2 days, plenty of time
to give them before Christmas. Which caused a fair of
scandal at the time, because then the faculty
had to come clean that they just didn't
want to bother having the exams before Christmas. Now this algorithm is
an example of what's known as a greedy algorithm. Now in a greedy algorithm
it's always simple. You just go one
step after the next, taking the best you
can do at each stop. You never go back and try
to make things better. You never do hill climbing, if
you're familiar with that term. You just always keep it simple,
one thing after the next, very fast. Sometimes it works
great in practice. Sometimes it doesn't. But it's always where you start,
some simple approach like this. Now this algorithm
actually, even if you don't try to
monkey with the ordering, even for a worst case
ordering of the nodes, that actually does pretty
good for a lot of graphs. And in fact, it
does really well-- as somebody already
asked about-- if all the nodes
have low degree. So let's state
that as a theorem. And then we're
going to prove that. So if every node in a graph
G has degree, at most, d-- so that's the biggest
degree in the graph, D-- then this basic algorithm uses, at
most, d plus 1 colors for G. No matter what the
ordering is, you'll never do worse than
d plus 1 colors. So what's the value of d for
our exam graph over here? d is 3. Every node has degree,
at most, three. And so it says, that no matter
what ordering you picked here, you'd get at most four colors. Now you might do better. In fact, we found an
ordering that got three. So it's possible to do better. So let's prove this fact
because this makes a difference. Say you have a graph
with hundreds of nodes. But every node has
degree, at most, three. Well that says you only
need four colors even, if the graph has 1,000 nodes,
and that's very useful. So in that kind of
situation it does very well. So let's prove that. Any ideas as to what proof
technique we're going to use? AUDIENCE: Invariant. PROFESSOR: Invariant, close. Not quite an
invariant, but close. AUDIENCE: [INAUDIBLE] PROFESSOR: What? AUDIENCE: Well
ordering principle. PROFESSOR: You know well
ordering principle, yeah, we're going to use the
equivalent version of that. We're going to use induction. If you like well-- it's
equivalent to well ordering. If you like well ordering
you could do it that way. I think it's easier
using induction here. So the proof is by induction. All right so the
first thing we need is an induction hypothesis. Any thoughts about what the
induction hypothesis should be? Yeah? AUDIENCE: If you have a
graph with n nodes then where the degree of any nodes
is less than [INAUDIBLE] then you can do it. PROFESSOR: That's great. You're going to do really
well on the midterm, because you put an
n into this thing, but there's not an
n here to start. What are most
people going to do-- we used to ask this actually. We asked this once on
a test many years ago, and it was an utter disaster,
because did everybody do? May be one student, or
two, put an n into there. But what's the naturally thing
to do to induct on here when you look at this statement? You're going to induct on d,
because the first thing you do is you make this be your
induction hypothesis. There's only one thing
to use, so you're going to have your
predicate be p of d, and it's going to be that. Now It didn't occur to us that's
what everybody was going to do, but it should have. They all did that and
it was a disaster. Because if you do
this, well you've got to take a graph with
maximum degree d, or d plus 1 in the inductive step,
pull out all the nodes with degree d plus 1 to get
a graph with now degree d. And that's a mess. You just pulled out a lot
of nodes, potentially. Color that in d plus 1 colors,
now put all that junk back in. And say only used
one more color. Nightmare. And these were MIT
students under pressure. It was a nightmare. So that does not work. And in fact, we will ask
an induction question on graphs on every test
you take in this course. It will happen. And so usually,
with induction, you take this as your
induction hypothesis. With graphs, you
have to be careful. And worst part about
this is we tell people when this doesn't work, use a
stronger induction hypothesis. So students tried
to make a stronger, but they're still stuck on d,
and it was still a disaster. With graphs, you do
something different. And the first thing you
do with a graph, usually, is put n in here. And if it doesn't work with
n, the number of nodes, you put in e the
number of edges. And induct on that. And so what you said is
exactly the right thing to do. Don't do this, or least don't
spend too much time on it. Pretty quickly try this. If every end node graph--
if every node in an n node graph G has
degree at most degree, then the basic algorithm uses
at most d, plus one colors. And now you induct on n. And almost always on graphs,
that's the first thing to try. Even if it's not in
your theorem statement. Any questions about that? Well let's start
with this, and see if we can make this one work. So what's the next
step in our proof? What do we got to do? Base case. And the base case will
be, not n equals 0, because we can't have a zero
node graph, but n equals 1. And how many edges do we have? Zero. If there's one node,
we don't allow loops, so it's zero edges, which means
that the degree of our graph has to be zero. There's no edges. And of course there's
only one node, so one color is going
to work, and that happens to equal d plus 1. All right, so the
base case is true. For one node graphs,
you can always use d plus 1 colors,
where d is the max degree. All right, next we have
the inductive step. So here we assume P n is
true for the induction. And now we look at an
n plus 1 node graph to show P n plus 1 is true. So we let G be any
N plus 1 node graph. We got to show you can
color it in d plus 1 colors. And let's let d be the
max degree, the largest degree in G. We've got to show we can
color it in d plus 1 colors. Well the basic
algorithm, let's say. First thing we do is
we order the nodes in an arbitrary order. And we're going to show
whatever order you pick is OK. All right so what are the nodes? Anyway at all. Now how am I going to use
the induction hypothesis? I know, I can assume,
the for any N node graph I can color it in the
max degree plus 1 colors. How am I going to use that
to help me color G here, the n plus 1 node graph? Any thoughts? Yeah? AUDIENCE: [INAUDIBLE] PROFESSOR: Yeah, let's
create an n node graph by looking at these
nodes, and taking this one out of the time being. Remove the last V n plus
1 node in the order. That leaves an n node graph. So let's write that down. We remove the n plus 1 from G.
And that creates a new graph, call it G prime with vertices,
V prime and edges, E prime. So we create a new graph
by removing that node. And we remove all the
edges tied to that node. So for example over
here, the last node was 6042, so we take out
6042, and all these edges. And this is a graph
that we're left with. That graph has n nodes. What's the maximum
degree in G prime? When I pull out a node, can
the degree of any node go up? No, I'm just taking stuff out. So I know that G prime has
maximum degree, at most, d. The degree didn't
go up of any node. Might have gone down,
but it didn't go up. So G prime has max degree, at
most, d, and it has n nodes. So we can use the
induction hypothesis P n. It says that the basic
algorithm uses d plus 1, at most, d plus 1 colors
for nodes V1 to V n. Any questions about that? So if this were the n plus first
node, last node in the ordering take it out. The basic algorithm now,
take the same order here, V1, V2, V3, V4, basic, we'll
color that in d plus 1 colors. And all I have left is
to give this guy color, and I'll have color G. Question? No. All right. So by induction I've colored
these guys, V1 to V2, and d plus 1 colors, all
that I have left to do is color V n plus 1. And hopefully we're not
going to use color d plus 2, because then we sort
of-- it wouldn't work. We got to use one of
the first d plus 1. All right, so let's
look at V n plus 1. And let's call its
neighbors in G, U1, U2, Ud. It has, at most d neighbors,
because every node in G has, at most, degree d. A neighbor's a node
you're adjacent to. All right so, V n plus 1 has at
most d neighbors, is adjacent to, at most, d other nodes. Now what does that
mean about the color I can use on V n plus 1? What do I know about what
color I can use for that? Yeah? AUDIENCE: It can't be any of
the colors of U1, U2, and so on. PROFESSOR: It can't be any
one of these colors that were assigned here. That's true. So how many colors
got ruled out? At most d, and how
many am I working with? d Plus 1. So I got one left
that I can use safely. OK. So this means there
exists at least one color in my set of d plus 1 colors. It's not used by any neighbor. And we're going to give
V n plus 1 that color. All right. So now I've colored every node
in G, the n plus 1 node graph, safely using a total
of d plus 1 colors. So that means the
basic algorithm uses, at most, d plus 1 colors,
on G. That means P n plus 1 is true-- whoops-- and
the induction is complete. Any questions? Yeah. AUDIENCE: Could you also
start from the other way, and start 1, go to 2 nodes,
3 nodes at each step keeping all nodes at all other nodes. [INAUDIBLE] PROFESSOR: What do you mean by
keeping all nodes connected? AUDIENCE: [INAUDIBLE] each
node has an edge connecting to each other one. PROFESSOR: OK so, then
I get a specific graph. I start with this, I add a
node and make it adjacent. I add a node and
make it adjacent. AUDIENCE: [INAUDIBLE] PROFESSOR: Yeah. So you've constructed
a particular graph. This is actually called, for
the n nodes, it's called Kn, is the n node complete
graph, also called a clique, like a clique of friends, where
everybody likes everybody, in a clique. And in fact for n here,
for those n nodes, what's the max degree? Max degree is n minus 1. What's the chromatic
number of this graph? What's the minimum
number of colors? [INTERPOSING VOICES] PROFESSOR: And they all
have to be different, which is d plus 1. So you have built
a special graph for which the optimum of
number colors is d plus 1. But that is not a proof that
this is true for all graphs. Because you've looked at
a particular graph here. AUDIENCE: [INAUDIBLE] PROFESSOR: What's that? AUDIENCE: [INAUDIBLE] It means
that you can still use your less than or equal to sign. PROFESSOR: I see,
so you'd add a node, and it's only connected
to a few of them. AUDIENCE: No, it's
connected to all of them, but it still implies
that you need less than or equal to the colors. It turns out it
happens to be equal to. PROFESSOR: Yes, in
this case that's right. So you've made an
argument for this case where it actually
is equal, but that only worked for this graph. AUDIENCE: [INAUDIBLE]
worse case. PROFESSOR: It is the worst
case, so it meets the bound. It shows you cannot
improve this bound. Yeah, is there a
question up there? AUDIENCE: All I was going
to say is that you've proved it's the worst case. PROFESSOR: Right, so
what you've done here is you've shown that I could not
make that theorem any stronger. I could not replace
it with d here. All right. Because you've given
an example where I can't get d colors, where
the maximum degree is d. But that doesn't-- To get
a proof for a theorem, I got to go through all this. That wouldn't give me
a proof of the theorem. They're not equivalent. One's an upper bound, one's
an existence of a lower bound. This shows that for any graph,
you need at most d plus 1. So any graph, at most. That shows there is a graph
that you need at least. And they are not equivalent. All right. One is for all, and upper bound. The other is there
exists a lower bound. So different in two
ways that are important. This kind of proof is very
typical for what you'll see with induction in graphs. And you'll get a lot
of practice with it. Are there any other
questions on this proof? OK. All right, see we've seen
now, by that example, we can't improve the theorem. In some cases,
though, the theorem is way off, for some graphs. Can anybody think
of a graph where the bound we get from the
theorem, of d plus 1 colors, is way off from the actual
chromatic number you need, the number of colors you need? Yeah? AUDIENCE: [INAUDIBLE] PROFESSOR: What is it? AUDIENCE: A graph [INAUDIBLE]
two sets of [INAUDIBLE] PROFESSOR: Good, OK. Yes, so what if
we did this graph. Let me draw it out. So you've got a bunch of nodes
here, bunch of nodes here. And every node here is
connected to every node over the other side. And if this is an n no
graph, and I've got n over 2 on each side, what's
my degree here? What's my max degree
of this graph? AUDIENCE: N over 2. PROFESSOR: N over 2. So d is n over 2. What's the chromatic number? How many colors do
I need for this? Two. All right, so d plus
1 is way off of two. There is a even worse example. Yeah? AUDIENCE: That graph where you
have one node center that's connected to a bunch of nodes
regularly distributed about. PROFESSOR: Yeah, the star graph. All right, so I got one of
the center, I got n minus 1 outside. So here the maximum
degree is n minus 1, just like a complete graph. But how many colors do I need? Two. So it's even worse here. All right now what about
the basic algorithm? How well does the basic
algorithm do on this graph? Or to the vertices some way? Color on one [INAUDIBLE]
lowest color. How many colors is
it going to use? AUDIENCE: Two. PROFESSOR: Two. It doesn't matter the vertices. V1, V2, V3, V4, because
I'll color this one 1. What am I going
to call that one? 1. Then I get to the center,
what am I going to color it? 2. And now all the arms,
what do they get colored? They all get 1. Whatever order you pick,
you get two colors. All right so now
there's a difference between the theorem just gives
you an upper bound, it says, at most, d plus 1 colors. But in fact the algorithm
can do a lot better than that, as on this example. So the algorithm
might be a lot better. Everybody see that
what we're doing here? How the algorithm is
better than the bound we proved by the theorem,
even though the bound was pretty good for some graphs. Now it turns out--
I mean we're not going to win a million
dollars for this algorithm. And in fact, this algorithm
is sometimes very bad. And a really bad example
it's very close to this. In fact actually this one, let's
look at how well does basic do one this one here. Make some ordering. V1, V2, V3. What's the basic algorithm
going to do on this complete-- it's called a complete bipartite
graph, is what's this called. I'll define bipartite
in a minute-- but what's the basic
algorithm do here? Any idea-- does it take n over
2 colors, or does it take 2? Any ideas? 2. So take a vertex, and the first
one, say V1s here, get C1. As long as I keep picking
vertices over on this side, they're going to get C1. As soon as I get to
a vertex over here, what color does it have to get? AUDIENCE: C2. PROFESSOR: C2
because it's touching the very first one we had here. So when I get
vertices over here, they're all going to be C2. When I go back over here,
they're going to be back to C1. So actually basic does good
here too, gives you two colors. Yeah? AUDIENCE: [INAUDIBLE] PROFESSOR: Ah, those
two aren't connected. But this case, if I've
got a vertex over here it is, by definition, connected
to the vertex over here. Because every
possible edge is here. But that's a great idea. What if they weren't
all connected, that's actually a great idea. In fact, the nasty example
for the basic algorithm is very much like that. Let's draw it. Because so far,
the basic algorithm is pretty much done
perfectly on all the graphs we looked at even when
the theorem wasn't tight. So here is a nasty graph. And it is very close to the
graph we just look like, where all the edges are there. In this case, all
the edges are there, except for the one
straight across. So if this is-- the
edge denotes likes, this is a world where you like
everybody but your spouse. All right, so you have
an edge to every one, except the one directly
across from you. No edge there, and so forth. So it has almost every edge,
but it's missing these edges. Now the basic algorithm
might do well here. What would be a good
ordering for this graph to label these V1 through Vn? Yeah? AUDIENCE: Go through
everything on the left side, and then the right side. PROFESSOR: Yeah, that's right. Because then color 1, color
1, color 1, all the way down. One color for the left,
what does this one get? Color 2, because it's
hooked up against. And these all get color 2,
so I've used two colors. Really good. Basic algorithm's looking great. Now here's a harder question. Can you figure
out a bad ordering for this graph, where I use
a lot more than two colors. AUDIENCE: [INAUDIBLE] PROFESSOR: What is it? AUDIENCE: It starts at the
top of the cross, and then the next level then across. PROFESSOR: Very good. V1, V2. Just as natural, really,
if think about it, to order it this way. All right. What color does V1 get? C1. What color does V2 get? AUDIENCE: C1. PROFESSOR: C1 because
it's not hooked up here. What color does V3 get? AUDIENCE: C2. PROFESSOR: C2. What about V4? AUDIENCE: C2. PROFESSOR: C2. It's not hooked up. It can't get one,
because that's up here. And it's not the two, so it
gets two What color does V5 get? AUDIENCE: C3. PROFESSOR: C3. Because it's hooked
up to one to two. V6 ? AUDIENCE: C3. PROFESSOR: C3, it's hooked up
to one and two, but not three. And you can see
what's happening here. All the way down
here he's hooked up to all the n over
2 minus 1 colors. So he also takes C n over 2. So if you pick that
ordering, not so good. You use n over two colors. So it really matters
the ordering. Now I should say graphs
like-- actually any questions about what we did here? About this? All right, now I should
say that graphs like this have a special name, they're
called bipartite graphs. And that's important
to remember. All right, so a graph G
is said to be bipartite if the vertices can be split
into two sets, or partitioned, and we'll call them a
left set, and a right set, so that all the edges connect
a node in the left set, to a node in the right set. So in fact, a lot
of today we've been looking at bipartite graphs,
because the nodes are here. Like the men, and the women, and
the edges only go from the left to the right. And that is called bipartite. And it's called
bipartite because you can do it with two
colors, or in two pieces. So you don't win
a million dollars for deciding whether
or not a graph can be colored in two colors. That's easy. You'll even do it for
homework one of these times. You do win the million dollars
for deciding if a graph can be colored in three colors. That's really hard to do. Now coloring problems come up
in all sorts of applications. You know with this company,
Akamai, that came out of MIT, we've talked about. We run a network
of 75,000 servers. And they're used to distribute
content on the internet, and so forth. And we have to
deploy a new version of our software
on those servers, pretty much every week. We're pushing new software out. And you can't deploy on every
server at the same time, because you've got
to take down a server to deploy new software on it. Got to take it
out of commission. And so we can't just take
down all 75,000 servers, because then all the
Facebook, and Netflix, and all those sites would stop. That would be bad. And we can't do them one at a
time, because there's 75,000. And it takes a few
hours for each one to get the traffic off,
stop it, load new software, and turn it back on. And it would take us years
to do one software install, which we got to do every week. So we've got to figure out a
schedule for how many servers you take down at a given
time, and which ones. And it turns out
pairs of servers have certain critical functions. So there's certain pairs of
servers you can't take down at the same time. So we have a gigantic 75,000
node coloring problem, where there's edges
between servers. Nodes are servers,
and there's an edge between if you can't install
new software at the same time. And so when it
turns out, when you run one of these graph
coloring algorithms on it, you could do it
with eight colors. It just turns out that way. So that means there's
eight waves of install that go on to the network. And now eight times
a few hours each means that we can do it in a
day, and you can manage it. You know on a much
smaller scale, the same problem exists
for register allocation, for variables. Here you've got to assign
every variable to register. But you can't have
variables that are active at the
same time associated with the same register. And you want to minimize the
number of registers you need. So again, you have the
graph coloring problem. The number of colors is the
number of registers you need. And two variables can't get
the same color if their active at the same time, so you
put an edge between them. The most famous example
of graph coloring is the map coloring problem,
with the four coloring theorem. And so here, every
country is a node. Adjacent countries have
an edge between them, because you don't want to
color adjacent countries the same color,
or you can't tell they're different countries. Now the last example
we can talk about is an important problem
in communication theory, communication networks, where
again coloring comes up. Now here you need to assign
frequencies to radio stations, or the cell towers. It comes up in mobile networks,
or just in with radio stations. And if two towers have
an overlapping area, they can't be given
the same frequency, so you get collisions
between the towers. And frequencies
are very expensive. Companies pay the
government a lot of money to get certain spectrum. So suppose you had this problem. Here's tower A, this is A's
range, where it reaches. Here's tower B, so
it overlaps some with A. Here's tower C. Here's
tower E. And here's tower D. All right now the
question would be, how many radio
frequencies do you need? What's the minimum number
of frequencies you need to enable all the towers here? We could make that be a graph. There's a node for each tower. And an edge between
towers, if they overlap. C doesn't overlap
with B, E does. E overlaps here. And then D overlaps here. So how many frequencies do
you need for this graph? AUDIENCE: Four. PROFESSOR: Four would
work, three is better. Can you do two? No you can't do two,
because you got here. But you could do three. You could do one,
two, three, two, one. This problem comes up-- AUDIENCE: [INAUDIBLE] PROFESSOR: Did I screw up? Ooh, no I can't do that. One, two, yeah much better. All right, this problem
comes up all over the place. I'm certain you'll see it
sometime in your career, you'll have some problem, or
you're scheduling something, and it's really a graph
problem in disguise. OK that's it for today.