Training an unbeatable AI in Trackmania

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Each of these cars is controlled by an Artificial Intelligence (AI) in the racing game Trackmania. And this AI is designed to improve over time through trial and error. The longer it trains, the better it gets. With enough training, it should be able to find the best lines, to drift perfectly, it might even become unbeatable. At least that's the theory. Actually, I've already tried to build such an AI, several times, and it could drive but I've played this game for years and it wasn't enough to beat me. Still I think it has some potential. So about six months ago, I decided to give this project one last chance, and to offer the AI an opportunity for revenge. This video is the conclusion of a three-year journey to make an AI that could beat me in Trackmania. And this time it got much better than I had expected. But first let's start with this very simple track, to better visualize what this AI does. The AI uses something called an artificial neural network. Basically, it's some kind of mathematical tool which roughly models how a brain works. Every tenth of a second, this neural network receives a few numbers which describe what's happening in the game. In response, it outputs new numbers specifying the action to perform. Hopefully, the network will select actions that ensure the AI finishes this track as quickly as possible. But this will only happen if the network is configured correctly, that's the tricky part. For this, I'm using a method called Reinforcement Learning. Here's how it works. The AI starts from scratch with zero prior knowledge about anything. So at first its decisions are quite random. But for every action it takes it's rewarded, depending on how good that action was. The faster the AI progresses along the track the higher the reward. So with each new attempt, the AI explores the game and gathers data from it. Now the idea of reinforcement learning is to use this data to progressively tweak the neural network, in a way that reinforces actions leading to more reward. Actually, all the cars I'm showing are controlled by slightly different versions of the neural network, and they represent successive states of the AI as it learns. In each new attempt, the AI is trying actions based on some of its current knowledge. Sometimes it goes well, sometimes not. but whatever happens, the AI can use this fresh knowledge to update its decision process, And it's through this trial and error loop that the AI gradually learns the game, all by itself, until it has mastered it. At least, that's the theory ! In practice, it's been a real nightmare to get this thing working properly... Let's go back a few months in this project. This where the AI's first decent attempts on this track, compared with my personal best. It wasn't my first time with reinforcement learning, and the AI seemed to pretty much understand what it had to do. But it would often get stuck in a sub-optimal strategy, unable to come up with anything better. In particular, the AI loved to hit the walls. And in a way it makes sense. When the AI decides to smash into a wall at full speed, it actually collects more rewards initially, and it's only later that it turns out to be a bad decision. Such conflict between short and long-term reward is one of the many things that makes reinforcement learning difficult to get right. And when you observe the AI doing weird things, it's quite tricky to figure out where the problem lies. Maybe there is a bug in my code ? Or maybe it's the method itself, is the reward signal clear enough ? I could let it train a little longer to see, but I don't know.. It could be that the AI doesn't see enough to locate the walls. And did I configure the learning algorithm properly ? Should I try a different algorithm ? This seems interesting... Let's check out the comments on my last video, I might find some help there. Oh maybe it's because the AI can't break ? No. That I'm sure is not the issue. But for the rest, well it's hard to know. So like my AI, I entered a trial and error loop of guessing what to fix, re-running the training, and waiting to see if it got better. Usually it didn't. This was a painful process. Especially because the training part needs to run for a long time before producing any results. It's hard to learn on your own from scratch. An AI like this usually needs to gather a lot of data to even begin to understand what's happening. So each time I want to test an idea, I need to let these training sessions run during hours on my laptop, before getting any useful feedback. That's why I went back to this project with such a simple track and without enabling the brake. Every simplification of the decision making space tends to make the problem easier and quicker to solve, and I found it easier to progress this way. And eventually, as the days passed, all these efforts started to pay off. After many small adjustments in my code, the AI finally stopped hitting the walls and it was getting closer and closer to my time. [Music] Until finally, it happened. And this was only the beginning. [Music] [Music] Three years after starting this project, I had finally trained an AI that I would probably never be able to beat. And I have to admit: after playing this game for so many years, it was a strange feeling to be outmatched like that by a computer program ! But this track: it's quite simplistic. It was time to challenge the AI one step further. Last year, I spent a few hours trying to get a good time on this map. I then trained an AI on the same map, without much success. Now it's time to try again and see if the AI can get its revenge. [Music] [Music] Overall, the training method remains the same as on the first track. The main difference lies in the observation input the AI gets when it drives. Just like on the first track, The AI gets a few car metrics, such as its current speed. It also gets its position relative to the road centerline. But on this track, the road layout is no longer repetitive, the AI must anticipate the upcoming turns. So I added new inputs to encode the map path for the next three corners. To make the video easier to follow, I'm only showing the attempts from the starting line. But in reality, the AI regularly spawns anywhere on the map during its training. This prevents it from focusing too much on the first turns. Here's an overview of the first 5 hours of training. Some attempts are already ahead of my personal best in the first few turns. This looks much more promising than last year. But the AI is not sufficiently robust, and it never maintains its lead for long. At least for now. One thing that surprised me is that the AI takes most turns very inside. It appears that the game's physics are slightly more complex around these road edges. For this reason, I had to include a few more inputs to make sure the AI understood everything that was going on. Adding these two inputs gives the full orientation of the car, and these inputs indicate which wheels are in contact with the road, and whether they are sliding. [Music] After nearly 9 hours of training, the AI managed to complete the map, with a pretty good pace. It completely crushed the AI from last year, finishing only 13 seconds behind my personal best. And it continued to complete the map on subsequent attempts, it was becoming very consistent. But could it also drive faster ? [Music] [Music] [Music] That's it. After playing this game for 35 hours, the AI was already faster than me on this map. But I wasn't ready to give up yet. Now it was my turn to train. This time, I will try with the brake. Braking doesn't add a huge advantage on this map. Still, it makes it possible to drift, which can save a few tenths of a second here and there. This time, I was able to finish the map in 4min36. Of course, this gives me an unfair advantage over the AI, since I don't allow it to break yet. But I was still curious to see how close it would get to this new time. Well, not only did the AI beat this new time, but it wasn't even close. More than 5 seconds ahead. Now, it seemed it was time to admit the superiority of the AI on this track. But only on this track, right ? Well, I tested the AI on another map where it had never trained before. And it was quite good ! I also tried to make it drive the training track in reverse, and it adapted pretty well to this new context. But overall, on these unseen tracks, the AI is less precise, it makes more mistakes, and sometimes it just gets completely confused. Especially when it's approaching a long straight line for some reason. All this stuff relates to the question of generalization. There would be much more to say on this subject, I've tested a whole bunch of other things like modifying the road surface, adding obstacles, even changing the car's physics. But let's forget all that for now, I'd prefer to return to the training map. Because even there, there is something about this AI that doesn't fully convince me. Of course I agree, the AI completely crushed me on this map. But does it really drive faster than me ? I mean, my personal best run contains quite a few mistakes. Trust me, it's super hard to get through these 230 corners at a good pace without failing. The AI on the other hand is extremely consistent. And I think that's what makes it so strong, in this kind of endurance scenario. However, if I focus on just one part of the map, until I drive a run with no errors that I'm fully satisfied with, will the AI still be faster than me ? This is the third and final map where I will face the AI. And if it beats me here, then I will be fully convinced. [Music] This time, the AI didn't beat me once. It's becoming clear that without brakes, the AI is badly disadvantaged. So far, I've always disabled the brake on the AI. Braking makes the game much more complicated to understand and master. I was afraid it would make AI training too complex. Maybe I was wrong. Anyway, I think it's now time to make the AI drift. Here's the plan: I'm going to retrain the AI on the long map with the brake available. Then, I will compare this new AI with my best run on the shorter map. This will be our final dual, and it will determine, once and for all, which of us is driving faster. [Music] Thanks to the brake, the AI is now slightly faster than before. But for some reason, it doesn't drift. That's a surprising choice. Just out of curiosity, I also tried to train it with the brake on the first track, and the result is the same: it's faster, but it doesn't drift. Though I'm not sure drifting is useful on the first track, I'm pretty confident it saves time on the other maps. And yet, the AI chose not to drift. Actually I found a few rare cases where the AI does some kind of drift. It appears to deliberately clip the road corners to unbalance the car, and initiate a drift. Definitely not the most straightforward approach ! It's not so surprising to see the AI struggling. The road is very narrow, and the many curves prevent high speeds. And in Trackmania, there is usually only one way out in such cases: by using a trick known as the neo-drift. By applying this precise sequence of actions correctly, it's possible to trigger a drift, even at relatively low speeds. It's the kind of trick that's pretty hard to discover on your own. I think the AI must have stumbled across it at least once during its many hours of trial and error, but it probably didn't have enough driving experience to explore further in that direction. Such trick won't bring any advantage if it's not properly mastered. So, can an AI learn to neo-drift from scratch ? Probably, but maybe we could also help it a little. from now on, the AI will get a big reward bonus whenever it's drifting. To detect if the car is drifting, I'm looking at this input. When it's high enough, it basically means that the car is pointing in a different direction from where it's actually going, which usually means that it's drifting. Ok, let's restart the training. [Music] Well, I guess the AI outsmarted me ! Apparently, it found a way to constantly trigger the reward bonus, just by spamming these weird action patterns at low speed. Ok new rule: now the reward bonus only applies when the car speed is high enough. Let's restart the training again. [Music] Now this looks good. The AI has clearly mastered the neo-drift. So well in fact, that it even chains multiple drifts in straight lines, to get more rewards. Obviously, in terms of speed alone, its current strategy isn't effective. So let's continue the training without the bonus. Now that it discovered how to drift, the AI shouldn't forget it. [Music] [Music] Over the next hours, the AI learned to drift more wisely, only where it saves time. Its driving looks cleaner than ever, and it destroyed its previous record on the endurance map. It's now 16 seconds ahead of me. I think it's time to answer our question: aside from its endurance skills, is the AI still faster on a shorter map ? To answer that, I made the AI drive this final level many times and I selected its best run. This one. This run is the culmination of a three-year journey to create an AI that can beat me in Trackmania. But I also did my best to drive a challenging run myself, putting into practice my years of experience in this game. Now it's time to find out who is faster. [Music] [Music] [Music] After showing that it was more precise, more consistent, the AI again outpaced me in this final test. And it seems time to accept that I can't beat it anymore. But is this AI truly unbeatable ? On this last level, I'm sure not. Some of its lines are still quite far from optimal, and I'm certain that many better players than me could easily drive a faster time. But on the first two levels, I'd be quite surprised if anyone can beat the AI. If you want to try it for yourself, the game is free and these tracks can be downloaded online. Of course, all these maps are quite simple, and we've only just scratched the surface of this complex and beautiful game. Now that the AI is working pretty well, it deserves to be challenged a little more. But that's for another time. This video already took way too long to make. If you'd like to see more, and if you want to support this YouTube channel, I just opened a Patreon page to which you can subscribe. This would help me to continue spending time on this project in the future, and to make more videos. I promise, I will not make it another year and a half before the next one. But I still need a few days break before ! [Music]

Info

Channel: Yosh

Views: 13,232,517

Rating: undefined out of 5

Keywords:

Id: Dw3BZ6O_8LY

Channel Id: undefined

Length: 20min 41sec (1241 seconds)

Published: Sat Sep 30 2023