Training an unbeatable AI in Trackmania

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Each of these cars is controlled by an Artificial  Intelligence (AI) in the racing game Trackmania.   And this AI is designed to improve over time  through trial and error. The longer it trains,   the better it gets. With enough training,  it should be able to find the best lines,   to drift perfectly, it might even become  unbeatable. At least that's the theory. Actually,   I've already tried to build such an AI, several  times, and it could drive but I've played this   game for years and it wasn't enough to beat  me. Still I think it has some potential.   So about six months ago, I decided to give this  project one last chance, and to offer the AI   an opportunity for revenge. This video is the  conclusion of a three-year journey to make an   AI that could beat me in Trackmania. And this  time it got much better than I had expected.   But first let's start with this very simple  track, to better visualize what this AI does.   The AI uses something called an artificial neural  network. Basically, it's some kind of mathematical   tool which roughly models how a brain works. Every  tenth of a second, this neural network receives a   few numbers which describe what's happening in  the game. In response, it outputs new numbers   specifying the action to perform. Hopefully, the  network will select actions that ensure the AI   finishes this track as quickly as possible. But  this will only happen if the network is configured   correctly, that's the tricky part. For this, I'm  using a method called Reinforcement Learning.   Here's how it works. The AI starts from scratch  with zero prior knowledge about anything. So at   first its decisions are quite random. But for  every action it takes it's rewarded, depending   on how good that action was. The faster the AI  progresses along the track the higher the reward.   So with each new attempt, the AI explores  the game and gathers data from it. Now the   idea of reinforcement learning is  to use this data to progressively   tweak the neural network, in a way that  reinforces actions leading to more reward.   Actually, all the cars I'm showing are controlled  by slightly different versions of the neural   network, and they represent successive states of  the AI as it learns. In each new attempt, the AI   is trying actions based on some of its current  knowledge. Sometimes it goes well, sometimes   not. but whatever happens, the AI can use this  fresh knowledge to update its decision process,   And it's through this trial and error loop that  the AI gradually learns the game, all by itself,   until it has mastered it. At least, that's  the theory ! In practice, it's been a real   nightmare to get this thing working properly...  Let's go back a few months in this project. This where the AI's first decent attempts on this  track, compared with my personal best. It wasn't   my first time with reinforcement learning, and the  AI seemed to pretty much understand what it had to   do. But it would often get stuck in a sub-optimal  strategy, unable to come up with anything better.   In particular, the AI loved to hit the walls.  And in a way it makes sense. When the AI   decides to smash into a wall at full speed,  it actually collects more rewards initially,   and it's only later that it turns out to be  a bad decision. Such conflict between short   and long-term reward is one of the many things  that makes reinforcement learning difficult to   get right. And when you observe the AI doing  weird things, it's quite tricky to figure out   where the problem lies. Maybe there is a bug  in my code ? Or maybe it's the method itself,   is the reward signal clear enough ? I could  let it train a little longer to see, but I   don't know.. It could be that the AI doesn't see  enough to locate the walls. And did I configure   the learning algorithm properly ? Should I try a  different algorithm ? This seems interesting... Let's check out the comments on my last  video, I might find some help there.   Oh maybe it's because the AI can't break ? No.  That I'm sure is not the issue. But for the rest,   well it's hard to know. So like my AI, I entered  a trial and error loop of guessing what to fix,   re-running the training, and waiting to see if it  got better. Usually it didn't. This was a painful   process. Especially because the training part  needs to run for a long time before producing   any results. It's hard to learn on your own from  scratch. An AI like this usually needs to gather   a lot of data to even begin to understand what's  happening. So each time I want to test an idea,   I need to let these training sessions run during  hours on my laptop, before getting any useful   feedback. That's why I went back to this project  with such a simple track and without enabling the   brake. Every simplification of the decision making  space tends to make the problem easier and quicker   to solve, and I found it easier to progress  this way. And eventually, as the days passed,   all these efforts started to pay off. After many  small adjustments in my code, the AI finally   stopped hitting the walls and it was getting  closer and closer to my time. [Music] Until   finally, it happened. And this  was only the beginning. [Music] [Music]   Three years after starting this project, I had  finally trained an AI that I would probably   never be able to beat. And I have to admit:  after playing this game for so many years,   it was a strange feeling to be outmatched  like that by a computer program !   But this track: it's quite simplistic. It was  time to challenge the AI one step further. Last year, I spent a few hours trying  to get a good time on this map.   I then trained an AI on the same  map, without much success. Now   it's time to try again and see if  the AI can get its revenge. [Music] [Music]   Overall, the training method remains the same as  on the first track. The main difference lies in   the observation input the AI gets when it drives.  Just like on the first track, The AI gets a few   car metrics, such as its current speed. It also  gets its position relative to the road centerline.   But on this track, the road layout is no  longer repetitive, the AI must anticipate   the upcoming turns. So I added new inputs to  encode the map path for the next three corners. To make the video easier to follow,  I'm only showing the attempts from   the starting line. But in reality, the AI  regularly spawns anywhere on the map during   its training. This prevents it from  focusing too much on the first turns. Here's an overview of the first 5 hours of  training. Some attempts are already ahead of   my personal best in the first few turns.  This looks much more promising than last   year. But the AI is not sufficiently robust,  and it never maintains its lead for long.   At least for now. One thing that surprised me  is that the AI takes most turns very inside. It   appears that the game's physics are slightly more  complex around these road edges. For this reason,   I had to include a few more inputs to make  sure the AI understood everything that was   going on. Adding these two inputs gives the  full orientation of the car, and these inputs   indicate which wheels are in contact with  the road, and whether they are sliding. [Music] After nearly 9 hours of training,  the AI managed to complete the map,   with a pretty good pace. It completely crushed  the AI from last year, finishing only 13 seconds   behind my personal best. And it continued  to complete the map on subsequent attempts,   it was becoming very consistent. But  could it also drive faster ? [Music] [Music] [Music]   That's it. After playing this game for 35 hours,  the AI was already faster than me on this map.   But I wasn't ready to give up yet. Now it was  my turn to train. This time, I will try with the   brake. Braking doesn't add a huge advantage on  this map. Still, it makes it possible to drift,   which can save a few tenths of a second here  and there. This time, I was able to finish the   map in 4min36. Of course, this gives me an  unfair advantage over the AI, since I don't   allow it to break yet. But I was still curious  to see how close it would get to this new time. Well, not only did the AI beat this new time, but  it wasn't even close. More than 5 seconds ahead.   Now, it seemed it was time to admit the  superiority of the AI on this track.   But only on this track, right ? Well, I tested  the AI on another map where it had never trained   before. And it was quite good ! I also tried  to make it drive the training track in reverse,   and it adapted pretty well to this new  context. But overall, on these unseen tracks,   the AI is less precise, it makes more mistakes,  and sometimes it just gets completely confused.   Especially when it's approaching a long  straight line for some reason. All this   stuff relates to the question of generalization.  There would be much more to say on this subject,   I've tested a whole bunch of other things like  modifying the road surface, adding obstacles, even   changing the car's physics. But let's forget all  that for now, I'd prefer to return to the training   map. Because even there, there is something  about this AI that doesn't fully convince me.   Of course I agree, the AI completely crushed me  on this map. But does it really drive faster than   me ? I mean, my personal best run contains quite  a few mistakes. Trust me, it's super hard to get   through these 230 corners at a good pace without  failing. The AI on the other hand is extremely   consistent. And I think that's what makes it  so strong, in this kind of endurance scenario.   However, if I focus on just one part of the  map, until I drive a run with no errors that   I'm fully satisfied with, will the AI still  be faster than me ? This is the third and   final map where I will face the AI. And if it  beats me here, then I will be fully convinced. [Music] This time, the AI didn't beat me once.  It's becoming clear that without brakes,   the AI is badly disadvantaged. So far, I've  always disabled the brake on the AI. Braking   makes the game much more complicated to understand  and master. I was afraid it would make AI training   too complex. Maybe I was wrong. Anyway, I  think it's now time to make the AI drift.   Here's the plan: I'm going to retrain the AI  on the long map with the brake available. Then,   I will compare this new AI with my best run on  the shorter map. This will be our final dual,   and it will determine, once and for  all, which of us is driving faster. [Music] Thanks to the brake, the AI is  now slightly faster than before.   But for some reason, it doesn't drift. That's  a surprising choice. Just out of curiosity,   I also tried to train it with the brake on  the first track, and the result is the same:   it's faster, but it doesn't drift. Though I'm  not sure drifting is useful on the first track,   I'm pretty confident it saves time on the other  maps. And yet, the AI chose not to drift. Actually   I found a few rare cases where the AI does some  kind of drift. It appears to deliberately clip   the road corners to unbalance the car, and  initiate a drift. Definitely not the most   straightforward approach ! It's not so surprising  to see the AI struggling. The road is very narrow,   and the many curves prevent high speeds. And  in Trackmania, there is usually only one way   out in such cases: by using a trick known as the  neo-drift. By applying this precise sequence of   actions correctly, it's possible to trigger a  drift, even at relatively low speeds. It's the   kind of trick that's pretty hard to discover  on your own. I think the AI must have stumbled   across it at least once during its many hours  of trial and error, but it probably didn't have   enough driving experience to explore further  in that direction. Such trick won't bring any   advantage if it's not properly mastered. So, can  an AI learn to neo-drift from scratch ? Probably,   but maybe we could also help it a little. from now  on, the AI will get a big reward bonus whenever   it's drifting. To detect if the car is drifting,  I'm looking at this input. When it's high enough,   it basically means that the car is pointing  in a different direction from where it's   actually going, which usually means that it's  drifting. Ok, let's restart the training. [Music]   Well, I guess the AI outsmarted me ! Apparently,  it found a way to constantly trigger the reward   bonus, just by spamming these weird action  patterns at low speed. Ok new rule: now the   reward bonus only applies when the car speed is  high enough. Let's restart the training again. [Music]   Now this looks good. The AI has  clearly mastered the neo-drift.   So well in fact, that it even chains multiple  drifts in straight lines, to get more rewards.   Obviously, in terms of speed alone, its current  strategy isn't effective. So let's continue the   training without the bonus. Now that it discovered  how to drift, the AI shouldn't forget it.   [Music] [Music]   Over the next hours, the AI  learned to drift more wisely,   only where it saves time. Its  driving looks cleaner than ever,   and it destroyed its previous record on the  endurance map. It's now 16 seconds ahead of me.   I think it's time to answer our question: aside  from its endurance skills, is the AI still faster   on a shorter map ? To answer that, I made the  AI drive this final level many times and I   selected its best run. This one. This run is the  culmination of a three-year journey to create an   AI that can beat me in Trackmania. But I also did  my best to drive a challenging run myself, putting   into practice my years of experience in this  game. Now it's time to find out who is faster.   [Music] [Music] [Music] After showing that it was more precise,  more consistent, the AI again outpaced   me in this final test. And it seems time  to accept that I can't beat it anymore.   But is this AI truly unbeatable ? On this last  level, I'm sure not. Some of its lines are still   quite far from optimal, and I'm certain that  many better players than me could easily drive   a faster time. But on the first two levels, I'd  be quite surprised if anyone can beat the AI.   If you want to try it for yourself, the game is  free and these tracks can be downloaded online.   Of course, all these maps are quite simple,  and we've only just scratched the surface of   this complex and beautiful game. Now  that the AI is working pretty well,   it deserves to be challenged a little more. But  that's for another time. This video already took   way too long to make. If you'd like to see more,  and if you want to support this YouTube channel,   I just opened a Patreon page to which you  can subscribe. This would help me to continue   spending time on this project in the future,  and to make more videos. I promise, I will not   make it another year and a half before the next  one. But I still need a few days break before !   [Music]
Info
Channel: Yosh
Views: 13,232,517
Rating: undefined out of 5
Keywords:
Id: Dw3BZ6O_8LY
Channel Id: undefined
Length: 20min 41sec (1241 seconds)
Published: Sat Sep 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.