How do Chess Engines work? Looking at Stockfish and AlphaZero | Oliver Zeigermann

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] and Twitter and DJ articles so you wanna follow me on Twitter the slights follow this talk you find them under this link but I will share the link at the end of the talk as well good um so let's kick this off so would you like some toy chests who's good at chess few people I totally suck so don't worry so I'm really bad at it so um a lot of people got really excited when they saw games that alpha0 played against stockfish and you will learn more about these two chess engines mhm and one position that was very exciting for a lot of people is in this case alpha zero plays black and this is stockfish and this is well it looks a little bit better for white if you just look at it like it's a bit crumbled there and then it's pressed into the corner and this is how the game evolved and a lot of people get really excited because they saw why didn't evolve at all or almost didn't move very much what black was very good at finding this very strong position and eventually defeating white so of a zero actually won this game against stockfish and this is very strange because the whole time stockfish thought it was in delete was and still in this position here i thought it was very strong even though obviously even for someone like me who doesn't know chess very well or black was it was leading good so on how does this work so why does stockfish play very differently than alpha zero so stock fish is a classic chess engine so who is played against stock fish anyone who knows stock fish its name okay whew it doesn't matter much so you could replace it with almost all traditional chess engine including the blue that defeated Kasparov they all had the same basically the home the same strategies and the question always is which is the next move that I'm going to play you have to find out which move is the best and then choose this one so you could ask and I did this as well why not just look at the position and say okay is this a good position or not even if it's the best position then we will just take this and this is what stockfish does it does more but people come to that so there's an evaluation function looking at a certain position it tells you is this a good position or not and those are handcrafted features that means the program I did that and it only works on resolved positions that means there's no captures or checks going on and if it's not resolved it searches a little bit ahead to get to a position where there everything is resolved so how does it do it depending on which part of the game you're in with game or in game you get certain points for material and you get certain evaluation if something is trapped so it can you move a piece or not can you use it pawn structure like other the pawns actually good in position are they they are they attacked how do they look like it's your king safe do you have outpost like pieces that are not protected by other pieces and then there's a lot of other stuff that will make total sense but this is hand crafted so someone thought about and said this sounds like a good idea and this evolved over time that's what stockfish - does and the question is is this good enough is it good enough to just look at at a position and say okay this position looks good let's take this and probably not so consider this position so I'm bad at chess but why does to move this is the right pieces this is structure obviously it's it's it's set up on me to be like this well not by me by someone else so what could why do one move could be just to move sorry to move Oh still sorry okay moving this in Java we call it to um that it's a rock in English we can move it down here just one position and then if you if you looked at this position and and did the scoring of this position it's still good because you didn't lose anything you didn't get better but you also didn't get worse but then in the next move if you did this the Queen moves from here to there and you're checkmate but you save the rook so that even though that position look good the next position looked extremely bad so that means I'm just looking at one position it's not enough you need to look ahead so what else could you do who's good at chess what would you do anyone yes yes exactly so you play this how do you know it's that classic position or are you just good at chess you're just good okay I get it so that's really good um the local evaluation is really bad because you will you will know okay what happens next Queen has to take rook very bad but at the same time that's staying late because you can't move your your King anymore and then that's at least pretty good so the point here is you need to look ahead and the further you can look ahead the bad ideas okay and then people are asked me so that's the case Kim you know just pre-compute all the perfect moves and then stall that in a table you think that's doable who think that we can do that okay none of you I and you can't obviously so I'm going through the complete search tree of a game like expanding all possible positions is in this order for tic-tac-toe feasible I would say Connect four is forgiven that's pretty large number chess really not number backgammon yeah go really big and to give you a comparison yes that's the number of atoms in my body well maybe a few more but anyway really bad joke sorry for that atoms in the earth and then Milky Way and then even if we took all of the atoms in the whole universe not so good right so this probably won't work maybe if you don't have to stall all the moves but we would have to go through all the moves to find the perfect moves so not feasible sorry um what what can be doing instead so ideas we can't do that what would you do so you can't like go through the whole surgery what would you do what are you tipping to yes just a few steps and that would mean limit the depth of the surgery right what else yes I think you want to say something right yes no yes so you would prune the the whole surgery and you can also say you limited breath so you don't do all the breath of the search tree but you just expand a few moves so what's typically being done classically is in a two-player game use you view the game as one player tries to maximize that means a very positive outcome and the other player tries to minimize you'll restrict the search depth and this is how it could look right and this as graphic is going to be a little bit overwhelming so I will moderate it a little bit it's called minimax the classic algorithm that we are looking at and we're looking ahead for half moon so we are restricting the search sheet in depth not in breadth first of all and one of the players tries to maximize and this is the computer player in this case one tries to minimize and the computer player will be circles but don't look at it too closely because it's a complete thing so first of all we in this position it's the computer players move that's the Maximizer and what we do we expand four steps ahead because that's we just decided that's that's how far we are looking ahead and instead of scoring each of these positions directly we start from down here so we are we will say we will start here we use the evaluation function that I showed you before and this says this is 10 this is positive for me and this is even more positive plus infinity means we will win in this case so but since in the move before that it was the minimizer maybe the human player playing so the human player will always choose the smaller one because the trash so we will know if everyone plays perfectly and the evaluation function is really good the minimizer will choose this and not this which makes sense right I mean you will choose this so that means the value of this one it's not the value that you will use using the evaluation function but the value that you propagate from down here same here that's again the maximise of the computer and can choose between ten or five and would choose ten because it tries to maximize and then up here again human player can choose between 10 and minus ten and trying to minimize so it will use minus ten and then up here it's it's the computer player and can choose between minus 10 and minus seven both bad but it will choose the minus seven and that's the minimax algorithm and you each of you I guess all of you could program this in less than a day and it will probably without any tweaks defeat most of us which is very surprising on on this machine so because compute power is so immensely available even something like this will be extremely strong surprisingly enough just using the local evaluation function would be really bad so this is good the further you can look ahead the more compute power you need but it's already pretty good so someone said we will not explore all the possible moves and this makes a lot of sense and what actually all the machines really do is they do alpha better pruning and it's really involved to to look through it so it shows a very simple example so let's say you are here you're trying to choose am I going to do this or some of these moves and if you expand this one here to the end you will see what whatever you do you will win in this branch anyway you will win oh yeah this is a very extreme example but it could be that you see that you always get a very very good result in this branch here so and if you find that out you will not expand this branch here because you will say why would I I'm good at this this other branch and this is very extreme example I'm so pruning means you will cut these other possibilities here effectively this means that you often can search twice as deep as without the pruning so that's already an optimization trick so how does it break in details so apart from that stockfish puts a lot of other heuristics into this so first of all it proves other parts of the tree because it will no that's a silly move we know no one whatever gone is ever going to do this the order the the possible moves which we are going to explore first and this is very critical because if you're expanding the the bad news first you will not be able to prove and if you find something that looks really good you do you think ok this looks very promising it can iteratively deepen the search for this special move and try to find out is it really good and for very bad moves it will not go very deep in the search tree that's an opening book being used like perfect known opening strategies and then once you only have six pieces which is not a lot like three versus three it's not a lot but then there's table base in games because you know how to play an endgame so this is just thought this is it more or less there's a lot of like very efficient implementation there's a lot of parallelization but this is how stockfish plays in starfish used to be number one are already told you like the secret it used to be number one game engine or for chess in the world and all the recent versions play like this more or less but they're still good enough to play on superhuman level on commodity hardware it's hard to implement something like this but it would run on this machine it will probably beat cast now that's not a thing this is much more interesting for us machine learning people so this is not machine learning this is maybe you might call it AI but what we have now this is reading machine learning this is alpha zero who has heard of alpha zero before few people good good question there's no idea what this what this alpha dude zero thing does okay so alpha zero evolved from I think alpha go to alpha go 0 to alpha zero and this was the first algorithm architecture I don't know that defeated world class go players go was supposed to be a game that machines can never play because it has a very high branching factor that means from every move you can play a lot of other moves and then there's no real rules and it's very hard you need to be creative so people thought machines are never going to do this but they did they defeated aliens so um my drawings are notorious so I'm trying to do as bad sorry not a bets bad as good as I can but it's pretty tough for me first they try so the idea always is I have a current position and I want to determine the next move and once that is done in gameplay I get a new position all the games over and that gives me a scoring because I would know at the end of the game I either lost which would be minus one there's a draw which would be zero in my world or of a ghost world and the one would mean I won so the question is how do I get from here to there and you see there's a lot of room and you see there's a lot of room over there so this visualization it will grow a little bit there's a couple of ideas and this first one I think that's pretty obvious we are replacing the written evaluation function within your network so instead of writing down the rules we let it learn how tricky but we will see and the hope is that this new evaluation phantom will see a position like we saw in the initial example where we saw like this this position really looks good and I'll see that the first glance because it looks good um can we make machine do this as well and the answer obviously is yes but we will see and convolutional neural networks are really good at this looking at a position understanding is this good or not and as I said our hope is that we get a more holistic view of the whole game using a neural network so how does it look like in our image here so this didn't change a lot but we have a model that is in your network its multi-layer but the important part is it has two heads that means two outputs the input is going to be the current position not actually like really the pixels but rather like which which piece is where so by the way if there are questions please ask any time so and the model what does it do it gives us two predictions a policy prediction and value prediction anyone who's who are already so reinforcement learning will know something about it the rest I will explain and we will use these let's see how so we have a single network which is also surprising because often there are many networks because we need one for training and one for playing and then maybe we need two networks because we are trying to predict two different things but we only have one and this is ResNet style that means we have shortcuts in the network and because of that we can make really deep and really powerful and this is not the idea of the Alpha alphago or zero people this is an old idea old means a few years old from Microsoft as I said intro the spot position and the outputs two one is the policy hat so-called which is the move probabilities that means which of the possible moves that I have this has the most likelihood of me winning the game okay that was a little bit complex but it's actually very simple so of we'll see an example in you understand and the second output is looking at this current position who's going to win is it the minus one that means I'm going to lose is it closer to zero that means not to schewe and plus one means I'm going to win we get these two outputs from the neural network again example so we are sticking in this position and the value that we are getting from it in this example is I think this should be minus 0.7 that means it's pretty likely not a good position for us and then we get let's say there are three possible next moves and the first move get the probability of dot three the second one dot one and the third one dot 4 that means this has 30% chance of being the best next move and then 10 that doesn't add up right it should that's right before that it should let's say this is dot six so it should add up to one so we're probably going to take this one so that's how it looks like so this is who wins probably not us but looking at this position only but what is likely to be the best move and we could play like this but there's no search involved so far we know just by using a zero look ahead you're probably not going to be good so we need to add something search and this search is different from the first search very different it's called Monte Carlo tree search and it doesn't limit the search tree in depth but in breadth so big always going down to the end but we are not playing all the games just very few no good let's see how this works so I think I corrected the number here and this drawing forgotten the other sorry for that so we put pushing this evaluation through something called Monte Carlo tree search as I said and this is can be viewed as a policy improvement and I think they that's also the official name that means you get these probabilities as a prior you push it through this and you hopefully get better probabilities here and let's say those are the new probabilities here and in game play you always choose the best which makes sense you always want to play the strongest move and you're really playing so in this case we would choose this position and this this tiny squares just the tiny version of this big square of that makes sense so we have pieces somewhere here so we choose this we either end the game with his new position or we're starting it all over again the tricky part is actually how this works so otherwise it's it's pretty simple so okay I set this already so we're using the original output from the model the policy had to kick off a simulation so this is this is a prior knowledge if you think this is going to be the probabilities now we explore the game tree and you will see in a second how we do this based on these priors we come back to this just that you know in a certain point of this of this thing we play a simulation and we are playing a simulation again using the model probabilities so we stick positions into the model you will understand in the next slide and make this very powerful we choose a certain number of iterations like how many play the games are we going to play until we make a decision and then alpha 0 in the original alpha 0 it's 800 just to give you a number so not very many also as a side note the number of position expanded in this approach is 2 orders of magnitude lower than in stockfish so it's not brute force but it puts a lot of energy into deciding which are the best position to explore and at the end you saw this in the in the visualization you get new probabilities based on that simulation so first of all we need to understand what Monte Carlo experiments who knows what a Monte Carlo experiment is she knows yeah good so instead of calculating the real perfect number we make random experiments and simple so this is very good example the classic one I stole this that's why it looks good we trying to find out what is pi and you see that's that's moving image so it's the number of samples and then the higher the number of samples goes the closer we get to PI and what do we do so we randomly generate points between 0 0 and 1 0 1 and then we find out is this within that circle here and if it is we counted s within and if it's not be counted as outside of the circle and using this information this ratio we calculate pi obviously we need to have a measurement of RV outside that circle yes or no that's cheaper than PI itself so we're just calculating the distance from 0 0 to this to this further point and if it's if it's closed a smaller than 1 it's within the circle if it's larger it's without outside the circle and that's it's just a way of doing things where you can't perfectly measure something and we're also doing this for the gameplay but it really gets a little bit more complex again something that could overwhelm you so I will guide you through that so a Monte Carlo tree search via 4 faces in the first phase we're just looking at this now it's called selection that means imagine that our search tree is already built up like this and using a certain strategy we will we will consider which of these moves looks most promising or which of these moves are we going to expand next and that's called selection in this case we're choosing this one and the next phase we expand it that means we choose one random next move this one and the numbers inside these circles I mean how many simulations have we run within that circle and how many of them did we win and that means for this one we ran three simulations and we won three of them so this looks really promising that's probably the reason why we chose this one to expand next and again in the same fashion as this mimics minimax thing before the dark ones are the well one of the players and the white ones are another player that's why this says okay I played six simulations this Plus this and I only won one of them because this one won five of that yes so basic mathematics is always a very bit hard for me so but we still choose this one yeah it's our game now we choose this one and now we simulate to the end and that's important because we don't restrict the depth but we always simulate to the end of the game because often that's the only reliable information are we going to win from this position or not because we can't use all of the search free because it's just a big we sample over it that means we play out the random number of games and use the the input from the model to find out which is probably best moves that we're going to play next and in the end we will account you collect and then let's say this is going to be a win again no it's not of it yeah it's it's a win for the other player right okay so we could write right exactly so that means one more one more play yeah right you're right it is even for us so one more win again lost so this looks pretty good so right um those are the four faces and we will repeat that all over and I heard that what the Cardiff Research has a lot of applications for so it's not invented by the Alpha goal or offer zero people I think it's ten years old maybe maybe fifteen but not very old but well one way of doing it and now you get the the element of a random sampling into it so we are no longer accurate or anything oh yeah I said this before we need to choose which of the of the of the moves to expand next which I could be going to say selecting this two ways that you have to pick balance a little bit and you can see like how do you go on holiday do you always go to the same destination because you know that's good like I might do my Arc or something you know that's good Majorca you go over there are you trying to explore new places you've never seen and it might be that I don't know somewhere else Cabrera great and I should go there and try that out and that's the same decision that we're making here are we exploiting that means are we always going to choose moves with high average win ratios or are we going to explore moves which we haven't explored a lot yet and that's a constant that you can't wit tweak a little bit and the paper on alpha 0 is very not very accurate on how you choose this but you could experiment or that if you implement it okay so this is Monte Carlo tree search and now going back to slides so three slides four slides to this image again so now game play is actually done that's how gameplay works we have all the parts in place we use the model we use the Monte Carlo tree search if we make the the next best move and then we're probably going to win or maybe not what's the caveat in this I mean is this going to work oh I mean why is there right side there and the better part of the talk on the right side what is missing sorry sure so training from the neural network first it's initialized randomly so it makes random very bad predictions just something you don't know so it's not at all reliable so if you didn't have a train network this wouldn't play well it would be horrible so you need to train it and going back no sorry going forward a few slides this brings us to the sorry third idea and we're going to use reinforcement learning to change that and it's a very special style of reinforcement learning I looked at my learning before this is very different but it still could count as reinforcement learning and it's done by making this chess engine play against itself I think it was in the 70s wargames anyone yes so the computer played against itself and they found out okay nuclear war is not a good thing so we actually doing this here how would this look like now so I changed my drawing a little bit so now we are in self play mode that means training mode over here training will take place so we're pushing over some stuff from here to there and the model is between that because we are using it for prediction obviously but we are also using it for training and I also this changed in a small a subtle way because we're not always choosing the best move here but we might sample over these possibilities also explore moves that don't look very promising at least at the beginning of the game so it changed a little bit reinforcement learning in a nutshell everyone always shows this graphic which I also stole um you have something that's called an agent sometimes it's called I think an actor so it's not always clear and the important thing is it's active it does something it can make something like a chess engine it does something namely executing an action and this action affects an environment in our case the gameplay so you make a move you move upon you get get an effect in the environment and the environment which would be the the simulation of the of the chess game it gives you an observation that means new position or what move did the opponent then make and it also gives you a reward and the reward in this case would be did we win or not because this is our reward it's it's chosen arbitrarily because this is our reward and we only get it at the very end of the game that has implications on how we are going to train the model here ok so we are trying to maximize the cumulative reward and that means this holds true for all reinforcement learning we are specialized but this is how reinforcement learning works more or less always good we have special cases but we will look into that this is again overwhelming but if you look at it for a minute it becomes really interesting because that's a taxonomy of reinforcement learning algorithms this is model free policy optimization and all the interesting stuff takes place here maybe a little bit there that means there model free meaning they have no idea how the world actually works so if you let this play a Super Mario game and there's no simulation of the game itself it doesn't know how Superman really works it just tries out things and learns from that and on the right on the right side on the other side um they are called model-based reinforcement learnings and this this single thing here this alpha zero thing in this hierarchy here that has a given model that means alpha zero doesn't have to learn the rules of chess but instead the programmers of alpha zero they give it to the algorithm that's why after the euro is pretty strong in in chess it's pretty strong go and speedy strong shogi but that arguably it might be possible to learn the rules of the game itself you just get punishments if you cheat or something but the rules have been given to alpha zero in advanced it means it cannot make and then valid move and get a bad reward for that not possible that's the design decision they put into it not sure why the one questions it but maybe people do but this is this is very very unusual so some learn the model like the word models of what's the name of the Schmitt hooba who I will be invented this something even learn the model of the world but we have been given that model okay interesting so as I said we get the reward at the very end of the game so this there's no other way at least in this case to judge did we did you make good moves then to check did we win the game or not and because of that we need to record the trajectory that means all the moves that we did from the beginning to the end because we need to score all of them how does this look like it gets more complex but I think as you see this evolving our guide you through this so record a trajectory here that means this was the first position then the next and so on and so on eventually after a certain amount of moves someone won the game or draw so it's finished at this position so we record all the positions from beginning of the game to the end of the game what we also record is the outcome like how did it end maybe minus one maybe one um and then we also record all the intermediate policy predictions here like how did we how did we say are the probabilities for this move which you don't see up there that's very hard to show it now and how was the probability for this and we might not even choose the best one all the time because in training mode be simple over that that means with 50% chance we take this move 10% chance this and 40% chance this so we might take a bad move here and we might still win but it doesn't matter much but because we are training now so this is what we stick into the model that means the input as this arrow here hope goes into the model and is is of the same structure as this one so works this model makes a prediction its context-free it just looks at the position makes a prediction and this might be the prediction but now we know better because we know did we went on out so we can see okay the prediction that you're making doesn't match the actual outcome of the game and then also it makes a prediction for the policy but since we improved the policy here the Monte Carlo tree search and it should be better now we're also sticking in the improved policy and telling it so since we improve the policy we are pretty sure that's the better policy and compared to this policy so that's first step in training that means we're using the same model and also surprisingly if you do this in in the self play mode we just have this single one model it used to be very different in alphago zero if just one model you train it constantly and we use the improved versions immediately after it improved so I haven't shown how we improved it yet well or how we trained it at least not in detail so it looks a little bit similar to supervised learning so the training itself doesn't differ but the way we collect the data um as you saw we play games against ourselves and the outcome of the training also influences how we collect next data that's one of the ideas of reinforcement learning I set this model starts at randoms first it's randomly initialized very bad we collect the data like the outcomes of the games and collect what once games finished we collect this as a sorry trajectory from beginning to end also set this we use the model to collect new data now the losses are interesting I said there's a difference between the prediction and the real outcome and we use mean squared error for the value head means the final outcome and I said a sauce before plus one for winning the draw minus one for losing and we use cross-entropy for the policy head because now the probability is updated and probably better than the original put prediction of the model good I think this is the final view of the whole thing so be calculating the losses that means the predicted one and this one we calculate the mean squared error we calculate cross entropy between these predicted ones and the really ones we calculate the gradients like you would do in all neural networks and optimizer updates the model that means it finds out which of the parameters of the model brings down that loss and then it tweaks them in the way that it knows well in a way that they optimize our nose often people use atom just the normal optimality like you would use for all new networks and then you're done more or less that means you've gone full circle you update it in the model and then all at the same time actually self play continues this is very very much simplified because in the real world all this runs in parallel many threads do self play many threads do the training actually the trajectories are stored in the replay buffer which can get really long really large like up to half a million of moves and with each training step we sample from it we take a random set of samples mini-batch I think we just take 500 samples or something means we take 500 of these and stick them in at the same time so I simplified a little bit in order not to make it too complex so let's sum that up continuous self play we are always using the current model as I said in alphago it was different because we were always playing with the best known model and then we trained a new one and then we evaluated it complex has been thrown away move computer probabilities were computed by the model and then improved by Monte Carlo tree search we choose the moves by sampling over probabilities and in the endgame we are doing it really that means we are always taking the highest probability and in real gameplay we do this anyway and in early stages of the game we sample over the probabilities so hastily that means if there's there's a prediction of 0.5 only in 50% we will take it and that means in the beginning of the game we explore even bad moves a little bit better or moves that we think are better because we might be wrong we will say at the beginning our model is still pretty bad so we are also trying move set don't seem so good I said this if you actually have a replay buffer it means all the moves go in the replay buffers it's fixed size it's FIFO that means if you stick it more all that one's get dropped so half a million I think is a good number of thinking of the of the replay buffer size yes so only at the end of the game it gets put into the replay buffer and it has the complete outcome that means who won and you may understand depending on which players on the move now it gets toggle to either plus 1 or minus 1 if it's not a draw because for one one outcome is good and the other outcome is bad you have to move history the trajectory which I can't pronounce maybe a drink some water and then I try again and you have to predict the probabilities I saw the shoulders in the image and you also saw hopefully that this data matches what the model outputs anyway so it's it has been engineered like this and this constantly reach yes yes correct yes so that that's totally correct and not even that is 100% correct because I just put it in the replay buffer and won it when it is actually used for training I don't know because it's it's trained in in randomly sampled mini batches so it might not be used at all but this is rather unlikely yes but only at the very end true this is also very surprising because that would mean maybe going back I mean it's surprising that it works because that mean even for the first position here this one it gets an evaluation if you want with that of one that means even though that position might be like the start position we will still get an evaluation of one and even all the intermediate positions also get the evaluation of we are going to win with this one and only because in a lot of these well let's set the start position it's always the same but we should get a lot of evaluations of one and a lot of minus one and a lot of draws so this should even out but I'm still surprised it works I think your question went in that direction right why does work okay good I'm surprised and I'm surprised by a lot of these things that it actually works so after admit that good how does this perform actually and since we are really good at in in time now I might show you full game but we will see this claim this is marketing material from deep nine so deep mind came up with this offer zero thing and it moves really fast I have no idea how to stop it so I need to talk over it but I think you get the point alpha zero always it's this thing down here and the curve that gets plotted at the end always surpasses all the other levels the other opponents after some time that was against stockfish shogi I don't know Elmo but it always surpasses then this is go these are the old versions of the same algorithm always surpasses them but this is fishy because it's marketing with you so this is what they tell us but I looked at it and they didn't play this in the open so you don't have the sauce can't reproduce that so they just claim they did that and they shared the the the moves of the of the games but not how it came to it and then people were criticizing superior hardware that alpha zero was using like really advanced TP use a lot of parallelization there was an outdated version of stockfish they played against stockfish 8 and we have stockfish 14 now which also doesn't sound fair as I said the game engine that offer zero game it is not available anywhere neither the source is not something that you can use the test that out which I know other scientists but for me this doesn't sound good anyone a scientist around here or did scientific work anyone you did Oh what what would you think of that that sounds fishy doesn't it does it who thinks that that's not suspicious at all okay that's even if you think we'd probably wouldn't raise your hand at this point but also I looked I don't look into papers I don't do this because I don't understand I don't like formulas but I did because I wanted to know how would you implement that it has so many caveats that's not possible to reemployment that and a lot of subsequent papers address that and said okay we don't know how they meant it we did our own implementation and make our own experiments because this is not in the paper forever zero so not so good I would say but someone implemented as in the open leader chest zero and it also works like this I can donate some of my compute power on my maybe gpu-based system probably not this one and then collectively people trained the Vida Ches zero engine which is implemented based on the ideas that I just showed you and they try to make it better and better and better and last year around that time it absolutely sucked it was really bad but it got better because a lot of people donated processing power and retrained that that model now in April it actually became the individual because there's no official but it became the official chess champion off of chess engines and it's beat stockfish by scoring 162 of 300 points good point good point it must be because this has been this has been controlled by independent people I think it must be a very recent version but I won't want to know myself let's see oho okay stockfish let's see if you find it oh no what is this yeah what does it want anyone seeing it version stockfish maybe I search for version what are you saying version not good let's see I'm very sorry I would like to know but I don't know it I'm pretty sure it must be the most recent version at this point but I don't see it maybe someone finds out tweet it I would also like to know did it say oh no no this is this is alpha zero making headlines by beating previous versions of stockfish so I think it must be the most recent one but I'm sorry I can't answer this even though it I think it's very important I should maybe you help me with that later so I'm not saying stockfish is is it's less powerful than Leda chassis roux I'm just saying this different approach can compete and I think last week or the other day I read something that another competition stocked which won again so I'm not saying this is the best technology you can have there's no way of improving it I do say it can be of similar strength so I looked at some of the the chess matches recent ones using either a zero because now you can look at all the parameters you it's all played in the open everyone can use the chess engine to play and I'm not good at chess but I understand the power of of the positions that stockfish had very very hard time finding out that they were really powerful so this this night here in fact if he controls these two rocks they can't move anymore we're taking out of the game I'm very very nicely protected also by the paddle rooks here and what excited people who are really good at chess not me because I suck it's the way this game evolves because leader chess you use the King as an active element of play instead of like protecting it and hiding somewhere it did a Queen's her King walk that means it moved the King up there and like integrating it into the whole gameplay and actually defeated stockfish doing that and people say this was very unusual for chess engine especially for very strong assertions and engine and they haven't seen this before and they were like really excited and I think chess players also learned like us programmers but you got a lot of excitement from them so this is a different kind of gameplay and even people say we learned a lot by watching leader chess zero play against stockfish how we can improve our own gameplay even very very professional players I have a full match and I will just talk over it and replay it because we do seem to have time this never happened to me I typically run out of time and it's not interesting yet but it will become interesting um leader chassis replace white and this is all boring stockfish place black things happen I need to talk or word because it's boring now so nothing too special still looks similar but I can trust me on that reaches zero playing white is trying to dominate this part of the board and it doesn't matter much what happens on the other part of the board you see it gets a very strong position on the left side but in the back of its mind stockfish prepares a pawn take I know this because I watched the game and it tries to take one of these pawns and you see that it tries to make up a trap and it's very sophisticated and now it's very close to taking this pawn I have to pause it for a second because sorry oops did it take the pawn I can't stop it now sorry okay did he take the pawn already I concentrated on on the controls did it take the pawn on the right side okay good no I really didn't know so um because I think if you're good at chess you see what what stockfish has in mind and it tries to outsmart leader chess 0 by preparing this trap and taking one of these pawns but the important part is leadership doesn't care at all it knows it will take it but it just doesn't care it just continues evolving this very strong aggressive position on this side of board because it knows okay whatever you do on that side I see what you're doing I'll let you do that I don't mind and then it happens as it should happen pawn take yes so leave that chess is one pawn ahead at this point and it will take another pawn will be two pawns ahead and very very confident it's going to win even though who's a really good chess player or better than bad you're a good chess player would you say this position is good for black looking at the board just that's visually looking at it no right so why can't we do this I mean you're probably I guess you don't beat sock fish right you don't it has a limited effect and you can't move in much yeah so you see that but suckish doesn't but Lina sees it and the intuition not not sure if it is if it's too behind that is because it looks at the board visually using a convolutional neural network it sees that's a strong position and it's always positioned over material so leader j0 very strong position it doesn't care I think it will lose the third pawn and stockfish will be three pawns ahead at a certain point but it will know that this is a very very strong pawn it can be and there will be some brilliant moves at later which are also very surprising this move for example pawn b7 you can watch how this ends it's pretty fast but it's just 30 seconds from now and not quite sure if it really was three pawns ahead I think only two but you will see I think you will see white is going to win at a certain point and black will resign and but until a certain step this handcrafted evaluation function wasn't good enough to tell that's a bad situation for us and they're even versions of these are chess view I think they're called anti stockfish because they look at the at the that's that's that's cheating a little bit they look at the evaluation function of stockfish and they will find out how to outsmart that and they play very strong against stockfish but very bad badly against anything else so just um the rest is just an endgame so like will resign trust me so let's see position for material that that should have been now acceded the title of the talk because leave a zero happily exchanges material for a very good position and I think this one we know the version of stockfish nineteen no that's not possible it was ten okay thank you so much is it I thought it was 14 no maybe I'll mix that up okay good I always get confused with numbers so wrapping it up already so why is this interesting for machine learning conference I should maybe I should have said it before the talk but well you're still here because it uses techniques that are well-known reinforcement learning we have convolutional neural network not new we have Monte Carlo research also known you and combining them in a very smart way not relying on one of them we could actually create well not me other people could create something that's that's outstanding maybe not in the results but in the way it plays and then how we can learn from it I found this very interesting and I talked before before that for centuries chess playing has been dominated by a better search and this is really old technology and it just got better because of little bit improvements in the algorithms but mainly compute power that's what I said the new paradigm also said that we learn the evaluation function we use Monte Carlo tree search instead of of a better search and someone asked me yesterday why not use of a better search there's no obvious answer it but non obvious answer that in the paper and the answers on this evaluation function it has a very large variation in the quality of its predictions sometimes it's very much off it doesn't doesn't have a good estimation but the idea is because we are reassembling over so many outcomes that we that this evens out so it doesn't impact the overall result too much but of a better search it's always propagates the best result up and this messes with with a high variation quality of the evaluation function so they tried of a better search but it doesn't perform well so you actually have to have a Monte Carlo tree search which is surprising about but good and we used reinforcement learning but in a very very unusual way so if you went to the the introduction talk on reinforcement learning he would have seen none of that um almost none of that Oh a lot of time as it seems perfect and I would crime the way of playing this overall very you it's powerful but it's also very inspiring and since we're out of time maybe if time for we don't so we don't have time for one question talk to me after the talk if you have questions or anything um the slides are down there I will tweet that using dj''q heart Xhosa if you're interested yes and I so I also have references till the papers if you really want to read them I doubt that but you can do that so with that thank you very much [Applause] [Music] [Applause] [Music]

Info

Channel: Machine Learning Conference

Views: 16,948

Rating: 4.8616352 out of 5

Keywords: ml conference 2019, machine learning conference, machine learning, artificial intelligence, ai, developer talk, videos for developers, oliver zeigermann, embarc, chess engine, chess computer, chess algorithm, alphazero, deep blue, resnet architecture, stockfish vs alpha zero

Id: P0jd8AHwjXw

Channel Id: undefined

Length: 60min 23sec (3623 seconds)

Published: Fri Aug 09 2019