Mega-R3. Games, Minimax, Alpha-Beta

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Today we're going to be talking about games. And I know you guys as well as I hope I do. The main thing that you guys want to talk about with games is how to do that alpha-beta thing. Because it's pretty confusing. And it's easy to get lost in a corner or something. Whereas doing the regular minimax, in my experience, most 6034 students can do that. And they do it right pretty much all the time. However, we're going to focus on all the different components of games. And I put up two provocative silver star ideas up on the board, which will come into play here. The Snow White principle is a new name. And it has never been revealed until today. Because I made up name recently. So you will be the first people to hear it and decide if it works better than the term "grandfather clause" for the thing that I'm trying to describe. Because most grandfathers don't eat their children. So here we've got a beautiful game tree. It has nodes from A through R. This is our standard game tree from 6034. We've got a maximizer up at the top who's trying to get the highest score possible. The minimizer is her opponent. And the minimizer is trying to get to the lowest score possible. And it's really unclear who wins or loses at each point. They're just trying to get it to the highest or the lowest score. All right, so let's do a refresher. Hopefully the quiz didn't put people into such panic modes that they forgot Monday's lecture. So let's make sure that we can do regular minimax algorithm on this tree and figure out the minimax value at A. So let's see how that works. All right, as you guys remember, the game search when using regular minimax is essentially a depth first search. And at each level, it chooses between all of the children whichever value that the parent wants. So here after it would choose the maximum of K and L, for instance. But that's getting ahead of ourselves. Because it's a depth first search. So we best start at the top. I'll help you guys up for a while. So we're doing A. We need the maximum of B, C, D, depth first search. We go to B. We're looking for the minimum of E and F. So having looked at E, our current minimum of E and F is just 2 for the moment. So this is going to be less than or equal to 2. All right, then we go down to F, which is a maximizer. And its children are K and L. So now I'm going to start making you guys do stuff. So what do you think? What is going to be the minimax value at F? The minimax value at F, what will that be? AUDIENCE: [INAUDIBLE] PROFESSOR: So that level is a maximizer-- max, min, max. F is a maximizer. K and L themselves are minimizers. But they're pretty impotent minimizers. Because they don't get to choose. They just have to do K or L. So the minimax value is three. And yeah, the path it would like to go is K. So we'll say that the minimax value here is 3. It's in fact exactly equal to 3. So if this is 3 and this is 2, then everyone, we know that the value of B is? AUDIENCE: [INAUDIBLE] PROFESSOR: I hear 3 and 2. Which one is it? AUDIENCE: 2. PROFESSOR: 2, that's right. So the value here is 2. Great, let's go down into this branch. So C is going to be the minimum of G and 6. But we don't see that yet. Because we're doing a depth first search. It's going to be the minimum of G. Now we need the maximum of M and N. We're going to need the minimum. M is the minimum of Q and R. So let's switch sides. The minimum of Q and R is? AUDIENCE: 1. PROFESSOR: Let's see. That's right, it's 1. So M has a value of 1. But I'm going to stay over here. Because M has a value of 1. Knowing that, then we know that G has a value of? AUDIENCE: 7 PROFESSOR: That's right. 7 is higher than 1. And since G is a 7, we now know going up to C that C has a value of? AUDIENCE: 6. PROFESSOR: Yes, C has a value of 6. That's the minimum 6 and 7. So now I'm going to go back down. Because we've done one of the other sub-trees. This is a 6. All right, great. Now we're going to go down to D. Hopefully it won't be too bad. These things usually aren't terrible. Because they're made to be pruned a lot in alpha-beta. So let's see, in D, we go down to I. And that's just a 1. We go down to J. And let's see, what's the minimax value of J? AUDIENCE: It's 20. PROFESSOR: That's right. 20 is the maximum of 20 and 2. Great, so what's the minimax value of D? Everyone said it-- 1. All right, so what's the minimax value at A? AUDIENCE: 6. PROFESSOR: 6 is right. 6 is higher than 2. It's higher than 1. Our value is 6. And our path is-- everyone-- A, C, H. That's it. Great, is everyone good with minimax? I know that usually a lot of people are. There's usually a few people who aren't. So if you're one of the people who would like some clarifications on minimax, raise your hand. There's probably a few other people who would like some too. OK. AUDIENCE: When you're doing this minimax, whatever values are not showing, you keep going down the tree and then just look at whether you're trying to find the minimax. And just whatever values you get you go back up one? PROFESSOR: Yes. The question was, when you go to do the minimax-- and let's say you got E was 2, and you know that B is going to be less than or equal to 2, but you don't know F yet. The question is, do you go down the tree, find the value at F, and then go back up? The answer is yes. By default, we use a depth first search. However, in non alpha-beta version, just regular minimax, it turns out it probably doesn't matter what you do. I suggested doing a depth first search to get yourself in the mindset of alpha-beta. Because order is very, very important in alpha-beta. But here, I don't know, you could do some weird bottom up search. Whatever you want, it's going to give you the right answer unless it asks what order they evaluated. But here's a hint. The order they're evaluated in is depth first search order. So without even doing anything, E, K, L, Q, R, N, H, I, O, P are the order of starting evaluation in this tree. AUDIENCE: [INAUDIBLE] PROFESSOR: So the question is, nodes like M and G we don't have to put values next to. Technically, if we were doing this very formally, and we couldn't remember, and I wasn't up there among the people, we would put 1 there. So at M, we would put a 1. But people remembered that. So we didn't do it. But then at G, we would put a 7. So if we were writing it out very formally, we would have a 1 and a 7. And at this D, we would have a 1. And then at the A, we would put a 6. And then that's the answer. Also, we've even put things like less than or greater than part way along the way. However, I believe that our alpha-beta search is going to definitely fulfill everyone's quota of pedantically putting lots of numbers next to nodes on a game tree. And so once you've done alpha-beta, if you can do it correctly, you'll think, oh, minimax, oh, those were the days. It's going to be easy. Because alpha-beta is a little bit more complicated. There's a lot of things that trip people up here. For alpha-beta, however, I will erase some of these numbers for the moment. They're still right. But we do it a little differently. So what do alpha-beta and beta add to this formula? Well, this is all sort of a winning formula, except for it's not. Because it takes too long. But it's a very nice formula. U is the maximizer, say. I would try to think, if I do this, what's he going to do? And then if he does that, what am I going to do? And then what is he going to do if I do that, et cetera, et cetera, all the way to the bottom. With alpha and beta, we add in what I like to call nuclear options. I'd like in this game of maximizer minimizer-- you can think of it as like the Cold War or the Peloponnesian War, except the Peloponnesian War didn't have nukes, so probably the Cold War. And in the Cold War, or any situation where you're up against an adversary who-- actually, this doesn't really work as well for the Cold War. But in any situation where you're up against an adversary whose only goal in life is to destroy you, you always want to find out what the best thing you can possibly do is if they hit that button and send nukes in from Cuba, or if they send fighter pilots, or whatever is going on. So the idea of alpha and beta is that they are numbers that represent the fail-safe, the worst case. Because obviously in the Cold War, sending nukes was not a good plan. But presumably, us sending nukes would be better than just being attacked and killed. So the alpha and beta represent the worst possible outcome you'd be willing to accept for your side. Because right now, you know you're guaranteed to be able to force the conflict to that point or better. So the alpha is the nuclear option, the fail-safe, of the maximizer. Nuclear options-- alpha is maximizer's nuclear option. And beta is the minimizer's nuclear option. So we ask ourselves-- and people who were paying attention at lecture or wrote stuff down know the answer already-- what could we possibly set to start off? Before we explore the tree and find anything, what will we set as our nuclear option, as our sort of fail-safe? We could always fall back on this number. So you could set 0. You could try to set some low number for the maximizer. Because if you set a high number for the maximizer as its fail-safe, it's going to be really snooty and just say, oh, I won't take this path. I already have a fail-safe that's better than all these paths. If you set, like, 100, you have no tree. Our default usually, in 6034, is to set negative infinity for alpha or negative some very large number if you're doing it in your lab. So if we set negative infinity as a default for alpha, that negative infinity is basically maximizer loses. So the maximizer goes in thinking, oh my god, if I don't look at this game tree, I automatically lose. He's willing to take the first path possibly presented. And that's why that negative infinity is a good default for alpha. Anyone have a good idea what a good default for beta is, or just remember? Positive infinity, that's right. Because the minimizer comes in, and she's like, oh crap, the maximizer automatically wins if I don't look at this node here. That makes sure the maximizer and the minimizer both are willing to look at the first path they see every time. Because look on this tree. If 10 was alpha, the maximizer would just reject out of hand everything except for P. And then we wouldn't have a tree. The maximizer would lose, because he would be like, hmm, this test game is very interesting. However, I have another option-- pft, and then you throw over the table. That's 10 for me, because you have to pick up the pieces. I don't own this set. I don't know. So that is why we set negative infinity and positive infinity as the default for alpha and beta. So how do alpha and beta propagate? And what do they do? The main purpose of alpha and beta is that, as we said, alpha-- let's say we have some chart of values. Alpha, which starts at negative infinity, is the worst that the maximizer is willing to accept. Because they know they can get that much or better. It starts out, that's the worst thing you can have. So it's not a problem. Infinity is the highest that the minimizer is willing to accept. That's beta. As you go along, though, the minimizer sees, oh, look at that. I can guarantee that at best the maximizer gets 100. Haha, beta is now 100. The maximizer says, oh yeah? Well I can guarantee that the lowest you can get me to go to is 0. So it's going to be 0. And this keeps going on until maybe at 6-- note, not drawn to scale. Maybe at 6, the maximizer said, haha, you can't make me go lower than 6. And the minimizer says, aha, you can't make me go higher than 6. And then 6 is the answer. If you ever get to a point where beta gets lower than alpha, or alpha gets lower than beta, then you just say, screw this. I'm not even going to look at the remaining stuff. I'm going to just prune now and go somewhere else that's less pointless than this. Because if the alpha gets higher than the beta, what that's saying is the maximizer says, oh man, look at this, minimizer. The lowest you can make me go is, say, 50. And the minimizer says, that's strange. Because the highest that you can make me go is 40. So something's generally amiss there. It usually means that one of the two of them doesn't even want to be exploring that branch at all. So you prune at that point. All right, so given that that's what we're looking for, how do we move the alphas and betas throughout the tree? There's a few different ways to draw them. And some of them I consider to be very busy. Probably in recitation and tutorial you will see a way that's busier and has more numbers. Technically, every node has both an alpha and a beta. However, the one that that node is paying attention to is the alpha, if it's a maximizer, and the beta if it's a minimizer. So I generally, for my purposes, only draw the alpha out for the maximizer and only draw the beta out for the minimizer. Very rarely, but it happens, they'll sometimes ask you, well, what's the beta of this node, which is a maximizer node? So it's good to know how it's derived. But I think that it wastes your time to write it out. That's my opinion. We'll see how it goes. So the way that it works, the way that alpha and beta works, is the Snow White principal. So does everyone know the story of Snow White? So there's a beautiful princess. There's an evil queen stepmother. Mirror mirror on the wall, who's the fairest of them all, finds out that it's the stepdaughter. So much like in the real world, in Snow White, the stepdaughter, Snow White, had the beauty of her parents. She inherited those. However, much like in the real world, maybe or perhaps not, the stepmother had an even better plan. She hired a hunter to sort of hunt Snow White, pull out Snow White's heart, and feed it to her so that she could gain Snow White's beauty for herself. How many people knew that version of the story? A few people. That's the original version of the story. Disney didn't put that in. The hunter then brought the heart of a deer, which I think in Disney the hunter did kill a deer arbitrarily, but it was not explained that that's why he was doing it. So in alpha-beta, it's just like that. By which I mean you start by inheriting the alpha and beta of your parents. But if you see something that you like amongst your children, you take it for yourself-- the Snow White principle. So let's see how that goes. Well, I told you guys that the default alpha was-- AUDIENCE: Negative infinity. PROFESSOR: Negative infinity. So here alpha is negative infinity. And I told you that the default beta was positive infinity. We're doing a depth first search here. All right, beta is infinity. All right, so we come here to E. Now, we could put an alpha. But I never put an alpha or a beta for one of the terminal nodes. Because it can't really do anything. It's just 2. So as we go down, we take the alpha and beta from our parents. But as we go up to a parent, if the parent likes what it sees in the child, it takes it instead. So I ask you all the question, would the minimizer prefer this 2 that it sees from its child or its own infinity for a beta? AUDIENCE: 2 PROFESSOR: It likes the 2. That's absolutely right. So 2. All right, great, so now we go down to F. What is F's alpha? Who says negative infinity? Who says 2? No one-- oh, you guys are good. It's negative infinity. Technically, it also will have a beta of 2. But we're ignoring the beta. And the alphas that have been progressing downward from the parents-- negative infinity. That's why I called it the grandfather clause before. Because you would often look up to your grandparent to see what your default number is. So we get an alpha of negative infinity. We then go down to the K. It's a static evaluation. And now I'm going to start calling on people individually. So hopefully people paid attention to the mob, who were always correct. All right, so we go down to K. And we see a 3. F is a maximizer node. So what does F do now? AUDIENCE: Switches its alpha to 3. PROFESSOR: Yes, switches its alpha to 3, great. All right, so that's already quite good. It switches alpha to 3. It's very happy. It's got a 3 here. That's a nice value. So what does it do at L, the next node? It's gone to K, went back up to F. Depth first search, the next one would be L, right? AUDIENCE: [INAUDIBLE] PROFESSOR: Well, technically F could take L's value of 0 if it liked it better than 3. But it's a maximizer. So does it want to take that? AUDIENCE: [INAUDIBLE] PROFESSOR: OK, that technically would be correct. But I'm sorry. I burdened you with a trick question. In fact, we don't look at L at all. Does everyone see that? I'll explain. The alpha at F has reached 3. But the beta at B is 2. So B looks down and says, wait a minute. If I go down to F, my enemy's nuclear option, my enemy is the worst it can be for-- the best it can be for me is 3. F is trumpeting it around. I was thinking of eating his heart, or whatever, but I didn't want to. But it's going to be 3. It's going to be 3 or higher down there at F. There's no way I want that. I already have my own default escape plan. And that's 2. That's going to be better than whatever comes out of that horrible F. So screw it. And we never look at L. Does everyone get that? That is the main principle of alpha-beta pruning. If you see an alpha that's higher than the beta above it-- as I said, if alpha goes up above the beta-- or if you see a beta, like if there's a beta down here, and it's lower than the alpha above it, prune it. Stop doing that. And the question is, who prunes? Who decides that you don't look at L? The person who is thinking not to look at L is always up higher by at least two levels. So up here, B is saying, hmm, I don't want to look at L. Because F is already so terrible for me that it's just beyond belief. If this is 100, it might be 100. Even if it's lower, I'm still going to get a three. There's a sanity check that I've written that I sort of came up with just in case you're not sure that you can skip it. Because on a lot of these tests, we ask you, which one's do you evaluate, which ones do you skip, right? Or we just say, which ones do you evaluate, and you don't write the ones that you skip. Here's my sanity test to see if you can skip it. Ask yourself, if that node that I'm about to skip contained a negative infinity or some arbitrarily small number, negative infinity being the minimizer wins, would it change anything? Now that I've answered that, if it contained a positive infinity, would it change anything? If the answer is no both times, then you're definitely correct in pruning it. So look at that 0. If it was a negative infinity, minimizer wins, what would happen? The maximizer would say, I'm not touching that was a 10 foot pole, choosing 3. The minimizer would say, screw that, I'll take E. Let's say it was a positive infinity. The maximizer would say, eureka, holy grain, I win. The minimizer would say, yeah, if I'm a moron, and go down to F, and then would go to E and take 2. So no matter what was there, the minimizer would go to E. And you could say, well, what if it was exactly 2? But still the maximizer would choose K. The minimizer would go to E. So there's no reason to go down there. We can just prune it off right now. Does everyone agree, everyone see what I'm talking about here? Great, so we're now done with this branch. Because beta is 2. So now we're up at old grandpappy A. And he has an alpha of negative infinity. Everyone, what will he do? He'll take the 2. It's better than negative infinity for him. It's not wonderful. But certainly anything is better than an automatic loss. All right, now our highest node is a 2. So let's keep that in mind for our alpha. OK, so let's go over here. Let's see, so what will be the value at C? What will be the beta value? AUDIENCE: [INAUDIBLE] PROFESSOR: You go back to which one? To G. I'm not at G yet. I'm actually just starting the middle branch. So I'm going to C. And what's going to be its starting beta before I go down? AUDIENCE: Infinity. PROFESSOR: Infinity, that's right-- default value. It's easier than it seemed. All right, so yes, beta is equal to infinity. This should be better erased. I think it's confusing people. Great, OK, so beta is equal to infinity at C. Now we go down depth first search to G. What's going to be our alpha at G? AUDIENCE: Minus infinity. PROFESSOR: Ahh, it would seem so. However, take a look up at the great-grandpappy A. It seems to have changed to 2. So this time it's 2. Why is it 2 instead of negative infinity? Why can we let A be so noxious and not start with saying, oh, I automatically lose? Well, A knows that no matter how awful things get in that middle branch, he can just say, screw the whole middle branch. I'm going to B. That's something that the minimizer can't do. And we have to start at infinity for the minimizer. But the maximizer can. Because he has the choice at the top. Does everyone see that? He can just say, oh, I'm not even going to C. Yeah, shows you. I'm going to A and taking the 2. So therefore alpha is actually 2 at G. All right, great, so we've got an alpha that's 2 at G. We're going to go down to M. It's a minimizer. All right, what's going to be our beta value at M? AUDIENCE: [INAUDIBLE] PROFESSOR: Or which is the beta default, minus or positive infinity? What would be the minimizer? AUDIENCE: Positive. PROFESSOR: Positive infinity, that's right. M is going to be a positive infinity for beta. Again, it picks it up from C. Great, now we get to some actual values. So we're at some actual values. We are at Q. So what's going to happen at M when M sees that Q is 1? AUDIENCE: [INAUDIBLE] PROFESSOR: What is beta? It says infinity. I'm sorry, it's hard to read. Beta is infinity at M. AUDIENCE: OK, so it's going to minimize, right? So it's going to be like, OK, [INAUDIBLE]. PROFESSOR: That's right. So they're going to put beta to 1. Because it sees Q. Great, so my next question is, what's going to happen at R? AUDIENCE: [INAUDIBLE] PROFESSOR: Very smart. You've detected my trap. The question is, does it look at R? The answer is, no. It doesn't look at R. Why doesn't it look at R? Does everyone see? Yeah, alpha is now greater than the beta below it. Beta has gotten lower than alpha. This is the same thing I was talking about before, when we figured out that the alpha here is 2. The maximizer says, wait a minute. The maximizer G says, if I go to M, the best I'm getting out of this is 1. Because if this is negative infinity, the minimizer will choose it. If this is positive infinity, he'll choose 1. The best I'm going to get out of here is 1. If that's the case, I might as well have just gone to B and not even gone to C. So I'm not going to go to M. I'll go to N, maybe. Maybe N is better. Does everyone see that? Great, so let's say that the maximizer does go to N. So what's going to happen with this alpha? AUDIENCE: [INAUDIBLE] PROFESSOR: That's right, it's going to be 7. 7 is better than 2. And the maximizer has control to get to that seven, at least if it gets to G. All right, now the minimizer at C-- we'll do everyone this time. The minimizer at C, seeing that 7, what does the minimizer do? Anyone? So it sees the 7. What does it do to its beta? It takes the 7-- better than infinity, anyway. And yeah, then it checks H. And everybody, again, what happens at H? It takes the 6. It's lower than 7. All right, now we'll go back to having people do it on their own. Well, all the way back to the top, what does A do when it sees the 6 coming out of C? AUDIENCE: Changes to 6. PROFESSOR: Changes to 6, that's right. Alpha equals 6. Great-- homestretch, people, homestretch. So the minimizer, everyone, has a beta of infinity. And if I wasn't a static node, it would have an alpha of 6. But it is a static node. So it just has a value of 1. So since it has a value of 1, everyone, the beta becomes 1. And what next, everyone? AUDIENCE: Prune. PROFESSOR: Prune, that's right. Why prune? Well, this time it's A himself who can prune. A says, well darn, if I go to D, I'm going to get 1 or something even worse than 1. I might as well take my 6 while I have it, prune all the rest all the way down. Everyone see that? Everyone cool with that? It's not too bad if you take it one step at a time. We did it. Our question is, which nodes are evaluated in order? Our answer is, everyone-- E, K, Q, N, H, I. OK, not so obvious, I guess. A few people followed me. But it is E, K, Q, N, H, I. It's just depth first order. And we pruned some of them away. Great, so that is alpha-beta. Any questions about that before I give some questions about progressive deepening? All right, we've got a bunch. So first question. AUDIENCE: [INAUDIBLE] nodes like F, B, C, and D? PROFESSOR: The question is, when asked for the order of evaluation, are we excluding F, B, C, and D? The answer is we're talking about here static evaluation. The static evaluator is a very important and interesting function. And I'll get back to something a few students have asked me about the static evaluator later and try to explain what it is. It's basically the thing that pops out those numbers at the bottom of the leaves. So when we ask, what is the order of nodes that were statically evaluated, we mean leaves only. That's a good question. Any other questions? Let's see, there was one up here before. But it's gone. It might have been the same one. Question? AUDIENCE: So a similar question. When you say, static nodes, that just means the leaf nodes? PROFESSOR: Means the leaf nodes, that's right. The question is, does static nodes mean the leaf nodes. The answer is yes. AUDIENCE: And so static evaluation is when you compare the value of a static node to something? PROFESSOR: Static evaluation is when you get that number, the static node. Let me explain. Unless someone else has another question about alpha-beta, let me explain static values. Because I was about to do that. There is a question about alpha-beta. I'll come back to both of yours after I answer this. AUDIENCE: You were mentioning [INAUDIBLE]. And I'm a little bit confused. If you're looking at one node, and you're seeing either grab the value from the grandparent or grab it from the-- PROFESSOR: So it always starts-- the question is, what is the Snow White principle? How does it work? Every node always starts off with taking the value of the same type, alpha or beta, from its grandparent. It always starts that way. Now, you say, why the grandparent? Wouldn't it take it from the parent? It actually does. But I'm not drawing out the alphas at all the minimizer levels. Because they don't do anything. They're only even there to pass them down. So all of the values pass down, down, down, down, down to begin. Every node, in fact, starts off with its grandparents with its parents' values, OK? But then when the node sees a child, it's completely done evaluating. It's finished. It can't be in the process. Let's say C. When C sees that G is completely done with all of its sub-branches and is ready to return a value, or if it's just a static evaluation, then it's automatically completely done. Because it has no children. A static value like K of 3 is automatically completely done. It's got a 3. Similarly, when we came back to G after going to N, and we knew that the value was 7, that was completely done. The value was definitely 7. There was no other possibilities. AUDIENCE: That's after looking at the children, right? PROFESSOR: Yes. So once you're done with all the children of G, then G comes up and says, guess what? Guess what, guys? So technically before that, you would have said that G's alpha is greater than or equal to 1 when we looked at Q. And then we looked at M. We'd say, it's equal exactly to 7. We're done here. And then at that point, when it's fresh and ripe and has all of its highest value or its best value, that's when the parent can eat its heart and gain that value itself. So that's when C says, for instance, oh man, I have an infinity. I really like that 7 better. And it takes the 7. But then it saw H. And it said, oh man, that's a 6. That's even better than 7. So it took the 6. AUDIENCE: So shouldn't the alpha take 7 then? PROFESSOR: So alpha takes 6. Because C is a minimizer. C took the 7 from G, but then right after that C saw H and took the 6. Because 6 is even lower than 7. And then alpha took the 6. Because 6 was higher than 2. AUDIENCE: So it's not going to look below the branch? PROFESSOR: Yeah, the problem is that the maximizer doesn't have control there. The minimizer has got control at C. And the minimizer is going to make sure it's as low as possible. The maximizer at A, his only control, or her only control, is the ability to send either way to B or C or D. And then at that point, at C, the minimizer gets to choose if we go to G or H. And it's never going to choose G. Because G is higher than H. All right, awesome, was there another question? All right, let's go back to static evaluations. When I first took this class, I had some weird thoughts about static evolutions. I heard some students ask me this. I almost got a question about it onto one of the tests, but it was edited to some other weird question that was m to the b to the d minus 1 or something like that at the last minute. So I'm going to pose you guys the actual question that would have been on one of the older test, which is the following. I had a student who came to me and said, you know, [INAUDIBLE], when we do this alpha-beta pruning, and all this other stuff, we're trying to assume that we're really saving that much time by getting rid of a few static evaluations. In fact, when we do progressive deepening, we're always just counting, how many static evaluations do we have to do? And he said, I look at these static evaluations. And there's just a 3 there. It takes no time to do the static evaluation. It's on the board. It takes much longer to do the alpha-beta. It's faster by far to not do alpha-beta. So I then tried to explain to that student. I said, OK, we need to be clear about what static evaluations are. You guys get it easy. We put these numbers on the board. A static evaluation-- let's say you're playing a game like chess. Static evaluation takes a long time. When I was in 6170, Java [INAUDIBLE], the class that used to exist, we had a program called Anti-Chess where I used my 6034 skills to write the AI. And the static evaluator took a long time. And we were timed. So getting the static evaluator faster, that was the most important thing. Why does it take a long time? Well, the static evaluator is an evaluation of the board position, the state of the game, at a snapshot of time. And that's not as easy as just saying, oh, here's the answer. Because in chess, first of all, not only did I have to look at how many pieces I had, what areas that I controlled. Also-- well, it was anti-chess. But that's not withstanding. Let's pretend it's regular chess. I also had to look, if it was in regular chess-- and I still had to do this in anti-chess-- if my king was in check. And what that meant is I had to look at all of my opponent's moves, possible moves, to see if anyone of them could take my king. Because in regular chess, it's illegal to put your king into check. So you better not even allow that move. And regardless, getting into checkmate is negative infinity for you. So it takes a really long time to do static evaluations, at least good ones, usually. You want to avoid them. Because they're not just some number on the page. They are some function you wrote that does a very careful analysis of the state of the game and says, I'm good to heuristically guess that my value is pi, or some other number, and then rates that compared to other states. Does that make sense to everyone? So the answer to the hypothetical question that might have been on the old test, when the person said, I've got this great idea where you do tons of static evaluation, and you don't have to do this long alpha-beta, is, don't do that. The static evaluations actually take a long time. Does that clear it up for people who asked me before about what is a static evaluation, why are the leaf nodes called static? And you might ask, why are some of these static just arbitrarily? The answer is, when you're running out of time to expand deeper, and you just need to stop that stage of the game-- maybe it's just getting too hairy, maybe it's spreading out too much, you have some heuristic that says, this is where I stop for now-- it's a heuristic guess of the value. It's kind of like those heuristic values in the search tree. It's a guess of how much work you have left to get to the goal. Here, you say, well, I wish I could go deeper. But I just don't have the time. So here's how I think I'm doing at this level. It's not always right. And that's going to lead us into the answer to one of the questions about progressive deepening. So I'll put up the progressive deepening question really quickly. So the question is this. Let me see, this is a maximizer-- yes. Suppose that we do progressive deepening on the tree that is only two levels deep. What is progressive deepening in a nutshell if you don't remember from the lecture? The idea is this. In this tree, it doesn't work. But in trees that actually branch like 2 to the n, it doesn't take that much time to do some of the top levels first and then move on to the bottom levels. Just do them one at a time. So let's say we only did it up through J. We only did the top two levels of the tree. We'd like to reorder the tree so that alpha-beta can prune as much as it possibly can, at least we hope. So let's pretend that we had a psychic awesome genius friend who told us that the static values when we went up to two levels-- remember, when we go to two levels, F, G, and J have to get a static value, right? Because we're not going down. We do a static evaluation. They get the exact correct numbers-- 3, 7, and 20. Genius, brilliant. All right, so if that happens, what is the best way that we could reorder that tree? Oh yeah, so it's A, B, C, D with values of 2, 3, 7, 6, 1, 20. I'll draw that. This is the non-reordered tree. Let's see, so it's 2, 3, 7, 6, 1, 20. So what's the best way to reorder? Well, first of all, does anyone remember what Patrick said when he talked about progressive deepening? Usually no one does, so don't worry about it. Because at that time you guys didn't think, oh, I have to do this for the quiz. You were just thinking, oh man, we've already heard alpha-beta and all this other stuff. And this is just a small fact. But it's a very important fact. And now you know you have to do it for the quiz. So you're probably going to remember it. The way you do it is you try to guess, and you say, which one of these is going to be a winner? Whichever one I think is going to be a winner at that level, I put first. Why is that the case? Well, something interesting you may have noticed here-- whenever you have a winner, like the middle node, or whenever you have whatever is the current best for your alpha, you sort of have to explore out a lot of that area. Like for instance, the left node was our current best at 2. The middle branch was our current best, at that time was 6. It was the total best. We had to explore a good number of nodes. But on the right, we just saw, oh, there's 1. We're done. We cut everything off. In other words, the branch that turns out to be the one that you take, you have to do a pretty good amount of exploration to prove that it's the right one. Whereas if it's the wrong one, you can sometimes with just one node say, this is wrong, done. So therefore, if the one that turns out to be the eventual winner is first of all, then it's really easy to reject all the other branches. Do people see that sort of conceptually a little bit, that if you get the best node right away, you can just reject all the wrong ones pretty quickly? That's our goal. So how can we, quote, "get the right one," the best one right away? Well, here's how we do it. Let's say we're at B. Which one is the minimizer likely to pick assuming that our heuristic is good and that these guesses are pretty much close to the truth? It turns out they're perfect, so this is going to work. So which one will the minimizer pick if it has to choose between E and F, do we think? AUDIENCE: E. PROFESSOR: E, perfect. Which one will it pick between G and H? AUDIENCE: H. PROFESSOR: H. Which one will it pick between I and J? AUDIENCE: I. PROFESSOR: OK, so what we're saying is we think it's going to pick E. We think it's going to pick H. We think it's going to pick I. So first of all, we should put E before F, H before G, and I before J. Because we think it's going to pick those first. Those are probably our best ones to invalidate a poor branch. So now between 2, 6, and 1, which is what we think we're going to get, which one do we think the maximizer is going to take? AUDIENCE: 6. PROFESSOR: 6. Then if it couldn't take 6, what would be its next best choice? 2, then 1. That's just our order-- simple as that. It couldn't be anything easier that evolves really complex trees, a huge number of numbers, and reordering those trees. So C-- you guys told me C, B, D. You told me C, B, D, I think? Yeah, those are the ones the maximizer likes. And then the ones the minimizer likes you told me was H, and before G. Because H is smaller than G. You guys told me E before F. And you guys told me I before J. And you guys would be correct in all regards. We have 6, 7, 2, 3, 1, 20. All the minimizers choose from smallest to highest. The maximizer chooses from highest to lowest of the ones that the minimizers will take. And if we did that, you can see we would probably save some time. Let's see how much time. Let's say we looked at H first. Well, if we looked at H first, we would still have actually had to look at Q and N. However, we would not have had to look at K. Do people see why? If we already knew this branch was 6, as soon as we saw 2 for the beta here-- 2 is less than 6-- we could have pruned. We still would have had to look at I over here. Because you have to look at at least one thing in the new sub-branch. And it actually only would have saved us one node-- oops. So it winds up that in total, how many nodes would we have evaluated if we did that little scheme of reordering? Well, we normally had to do six-- E, K, Q, N, H, I. How many do we evaluate if we do this progressive deepening scheme? How many times do we run the static evaluator, which of course you know the static evaluator takes a long time? Anyone have a guess? I told you the only one we don't evaluate is K. Raise your hand. I won't make anyone give this one. So I said the only one we save on is K. So we still do E, Q, N, H, and I over here. There's two possible answers that I will accept. So you have a higher chance of guessing it. Anyway? Does everyone agree that we did six before? If we didn't do any progressive deepening, we just did E, K, Q, N, H, I. And now we're not doing K. OK, people are saying five. All right, good. That's not the right answer. But it at least shows that you can do taking away the one. We did at least five over here. There's two possible answers, though. Because look over there. In order to do the progressive deepening, we had to do those static evaluations, right? So we either did all those static evaluations and these five-- E, K, Q, N, H, I-- static evaluations. Because we didn't do the K. Or we might have saved ourselves. Because maybe we were smart and decided to cache the static values when we were going down the tree. It's an implementation detail that on this test when we asked that question we didn't say. What I mean by cache is when we did it here and saw that E was a 2, and then here-- oh, we have to do the static value at E. If we were smart, we might have made a little hash table or something and put down 2 so we didn't have to do a static evaluation at E. And if that happened, well, we save E, H, and I, and we do three fewer. Does everyone see that? However, that's still more than six. So it didn't save us time. So you might say, oh, progressive deepening is a waste of time. But it's not. Because this is a very, very small, not very branchy tree that was made so that you guys could easily do alpha-beta and take the quiz, and it wouldn't be bad. If this was actually branching even double at each level, it would have, what, 16 nodes down here at the bottom. Then you would want to be doing that progressive deepening. So now I ask you a conceptual riddle question. It's not really that much of a riddle. But we'll see if anyone wants to answer. Again, I won't call on you for this. According to this test, a student named Steve says, OK, I know I have to pay to do the progressive deepening here. But let's ignore that. Because it's small in a large tree, right? It's not going to take that much. Let's ignore the costs of the progressive deepening and only look at how much we do here. He says, when it comes to performing the alpha-beta on the final level, I'm guaranteed to always prune at least as well or better if I rearrange the nodes based on the best result from progressive deepening. Do you agree? AUDIENCE: [INAUDIBLE] PROFESSOR: Can I repeat it? OK, the question is, ignoring the cost that we pay progressively deepening here-- just forget about it-- at the final step, at the final iteration, the question is, am I guaranteed to do at least as well or better in my alpha-beta pruning when I reorder based on the best order for progressive deepening? Here certainly we did. But the question is, is Steve guaranteed? Answer? AUDIENCE: [INAUDIBLE] PROFESSOR: What did you say? AUDIENCE: [INAUDIBLE] PROFESSOR: That's the answer and the why, which we asked to explain. The answer we got is, doesn't that depend on the heuristic? Perfectly correct. The answer is, no, we're not guaranteed, and it depends on the heuristic. So if we were guaranteed, that would be our heuristic was godlike, like this heuristic. If your heuristic already tells you the correct answer no matter what, don't do game search. Just go to the empty chess board, put all the pieces in the front rows, and run static evaluator on that. And it'll say, oh, it looks like with this game not started that white is stupid, so black will win in 15 turns. And then you're done. And you don't do a search. We know that our heuristic is flawed in some way. It could be very flawed. If it's flawed so badly that it tells us a very bad result of what's actually going to happen, even though we think the minimizer is going to go to H, maybe it's wrong by a lot and it goes to G. It could take us up an even worse path and make us take longer. Question? AUDIENCE: If it's the heuristic, how could you cache the values so you didn't have to recalculate them later? PROFESSOR: The question is, how can you cache the values if it's a heuristic so you don't have to recalculate them later? The answer is, it wouldn't help if there weren't these weird multi-level things where we stop at E for some reason, even though it goes down to five levels. The way you could cache it is it is a heuristic. But it's consistent. And I don't mean consistent from search. I mean it's a consistent heuristic in-- the game state E is, let's say that's the state where I moved out my knight as the maximizer, and the minimizer said, you're doing the knight opening, really, and then did a counterattack. No matter how we get to E, or where we go to get to E, that's always going to be state E. It's always going to have the same heuristic value. It's not like some guy who goes around and just randomly pulls a number out of a hat. We're going to have some value that gives us points based on state E. And it's going to be the same any time we go to state E. Does that make sense? It is a heuristic. But it's always going to give the same value at E no matter how you got to E. But it could be really bad. In fact, you might consider a heuristic that's the opposite of correct and always tells us the worst move and claims it's the best. That's the heuristic that the minimizer program did to our computer, perhaps. In that case, when we do progressive deepening and we reorder, we'll probably get the worst pruning possible. We might not. But we may. So in that case, you're not guaranteed. I hope that's given a few clues. In tutorial, you guys are going to see some more interesting problems that go into a few other details. I at least plan on doing [INAUDIBLE] interesting game problem from last year, which asked a bunch of varied things that are a little bit different from these. So it should be a lot of fun, hopefully, or at least useful, to do the next quiz. So have a great weekend. Don't stress out too much about the quiz.

Info

Channel: MIT OpenCourseWare

Views: 83,774

Rating: undefined out of 5

Keywords: alpha beta, minimax, game tree, static evaluation, heuristic, pruning, branch, progressive deepening, snow white principle

Id: hM2EAvMkhtk

Channel Id: undefined

Length: 50min 55sec (3055 seconds)

Published: Fri Jan 10 2014