AI Outraces Trackmania Experts

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Each of these cars is controlled by an Artificial Intelligence in the racing game Trackmania. Right now, this AI is attempting something particularly tricky: driving on pipes. I've designed this AI to learn from scratch, without any previous knowledge of the game. So at first, the AI can't keep its balance for long. But this is part of the plan. Because this computer program is designed to learn from its mistakes and improve itself over time. So with enough training, can it come up with better strategies than humans ? On these unstable pipes, this question might be particularly interesting. So to answer that, the AI is gonna attempt to beat the human World Record on three challenging tracks that I've selected. Starting with the easiest one: a simple straight pipe. The rules are simple. the AI can use four different actions. At first, without any experience, it's just using them randomly. If we want the AI to make progress, it needs a target. So for each action the AI takes, I'm gonna give it a reward. The faster it progresses along the pipe, the higher the reward, as long as it doesn't fall off. Now, the AI's only goal will be to predict the actions that add up to the most rewards. And it's gonna learn this through a process called Reinforcement Learning. Basically, the AI is gonna play over and over again. In each attempt, it can try new things, and gather new experience. Over time, this experience is used to reinforce the AI to select actions leading to more reward. Through this trial and error process, the AI should gradually learn to go faster and keep its balance on the pipe. After 12 hours of driving, the AI is already quite fast. It may look easy but I can assure you, it's not. I've tried myself to challenge it, and even though I've been playing this game for years, I just couldn't keep up with its pace. The AI's driving already looks quite inhuman. To select its actions, the AI uses real time observations of the game. A few numbers that sum up everything it needs to know, such as its speed, position and orientation on the pipe. About every tenth of a second, the AI interprets these numbers with something called a neural network. Basically, that's the AI's brain. Its job is to predict the optimal action in a given situation. So the AI's strategy depends on how this neural network is configured. This is where reinforcement learning operates behind the scenes, by gradually tuning the network configuration. For now, this training process is not over. The AI can probably reach a faster pace, and it's not super consistent anyway. Let's see how far it can go. This isn't the first AI I've trained in Trackmania. I've already got some promising results before. But today, there's something I can say for the first time. The level of this AI is definitely inhuman. This is the human record on this track. Compared to that, the AI's pace is just absurd. But driving fast is only useful if you can reach the finish at the end. And if you haven't noticed yet, this track is quite long. That's partly why the world record keeps a very low speed compared to the AI. Going any faster would be quite risky, as a single mistake is enough to end a run. In fact, that's something that started to worry me about the AI. Can it really maintain this pace to the finish line without falling off the pipe ? Because I've observed this AI attempt the track many times, and it's not super consistent. All these cars are controlled by the exact same version of the AI, yet the outcome is very different from one attempt to another. It's kinda strange, the AI can repeat the same cyclic pattern dozens of times without any problem, but suddenly its car starts to deviate slightly, and that's the end of the run. I have no idea what it's doing wrong. I mean, I just can't drive at that level myself. We could guess that the AI favors its pace over its consistency. But for some mysterious reason, it can't optimize both. But that's okay, we can save that issue for later. because the AI still got several promising attempts. Like this one. A pretty good time, right ? This looks very promising for the upcoming tracks. But to be honest, this time is still quite far from optimal. And I know that because.. This isn't the only AI I've trained on this map. Here you can see the start of a second training session using the exact same training process. I've run several of this out of curiosity, and it gave me quite disconcerting results. Just like before, these new AIs eventually managed to complete the track. But they didn't end up with the exact same strategy. It looks similar, with the same kind of cyclic pattern. But when you look at their actions closely it's not exactly the same. In particular, they use a different method for slowing down. And the two new strategies are actually better, both achieved a faster time. So if the AI can find different strategies, there might be a faster one that it hasn't yet discovered. But I can hardly understand what's happening here. How could I know if it's possible to do better ? Well I think there's one thing I could try. As I said, it appears that all these AIs intentionally slow down the car. is that really necessary ? If I retrain the AI one final time from scratch, but I force it to always accelerate and never brake, can it drive faster ? If the only thing the AI can control is its steering angle, can it still finish this track ? Here's the best the AI could come up with. At first, the AI can't slow down, so it inevitably builds up speed. Soon enough, it reaches a faster pace than all previous AIs. But then, something interesting happens. Its speed starts to stabilize. After a closer look, I think it's because the car regularly loses contact with the pipe. With these little jumps, the AI still found a way to control its speed, it should be able to cross the entire pipe. Like all the previous AIs, it's not super consistent. But again these mistakes were not too frequent. Not enough to prevent the AI to finish the map one final time. So finally, it turns out that this track belongs to the full speed category. Actually, it's surprising that the AI didn't find it on its own. We might never know how close we are to the limit. Maybe the AI can still reduce its air time and go even faster. But we'll stop here for this track. This was just a warm up, and it's time to move on to the serious stuff, on a more complex map. A map with a challenging world record, which by itself was enough to motivate me to make this video. A record held by a player the AI recently faced, but couldn't beat. A player named Wirtual. If you don't know him, Wirtual is a highly experienced Trackmania player, and also a well-known streamer. I can remember it was while watching one of his livestreams, two years ago, that I first thought of making an AI drive on a pipe. That night, Wirtual was attempting to beat the world record on this giant pipe maze. And after a few hours, he finally succeeded, with an impressive time under 20 minutes. As for me, I managed to get my AI working a few months later. But it wasn't fast enough to beat experienced players. Since then, I've made quite a few improvements to the AI. A few months ago, for the first time, it was able to completely dominate me on regular road tracks. However, this wasn't enough against Wirtual, who after another extensive playing session, managed to beat the AI. But the AI looks stronger on pipes. Today, it might have a chance of a small revenge. Overall, the training method remains the same as for the first track. The main difference is that the AI needs additional information about the track layout. For instance, this new input provides the distance to the next corner, and this one the direction of that corner. So far, the AI hasn't been able to go that far in the map. Once again, it seems to prioritize pace over consistency. Its driving style is fairly aggressive. It's able to overtake the record in the first few corners, but it never gets very far. This time, consistency might be more important. Okay it looks like the AI has no intention of playing any safer. It's just driving faster than before. But it's not that bad, it can go further and further. Actually the AI isn't really exploring the map as it goes. For the clarity of the video, I'm only showing attempts from the start block, But in reality, the AI starts from a random location on each new training attempt. This way, it can practice all possible scenarios, without focusing excessively on the first few turns. And if you look closely, there's one area where I made the AI spawn more frequently. Right before the finish. This particular section is quite different from the rest. To reach the finish line, you need to build up speed and jump from the last corner. It's a tricky jump, even for experienced players. But for the AI, the difficulty wasn't the main issue. It was more a problem of understanding what to do. To guide the AI, I had to adapt the reward signal a bit. When the AI enters the finish area, it knows it with this input. From there, it's rewarded based on how close it gets to the finish, regardless of whether it's following the path or not. And if it ever crosses the finish line, it receives a massive bonus reward. With that, the AI quickly understood that jumping would bring more rewards. But it wasn't jumping from the right spot. It took the a many hours to rectify its approach. From there, it started to look interesting. And after many attempts, the AI got its first success. It then continued to improve its approach. Eventually, it became quite consistent. Now there is a good chance that if the AI ever reaches this area in a real run, it could conclude it with a successful jump. During this time, I've kept an eye on its training on the rest of the map. Its driving looks insane now. All this time, the AI has continually improved its pace. Now we can stop the training and keep the final version of this AI. If this one maintains that pace up to the finish, it could set an insane time. We just have to repeat what we did on the first track: make it attempt the track many times and see how far it can push the limit. Among all these attempts, the AI didn't reach the finish once. Most of the time, the AI can survive for one or several minutes. But for some reason, it always ends up falling of the pipe. That sounds familiar. I could train the AI longer, but I don't think it's gonna make any difference. Its consistency has remained almost unchanged for many hours now. If we want the AI to be the human record, that's a problem. But if the AI understands so well how to go fast on a pipe, why doesn't it also understand how to avoid falling out of it ? Why do these mistakes even happen ? I mean, it's strange to observe this behavior for a robot. It's not like a human who would make an inattention error after some time. Why is it that even a robot can't repeat consistently the same strategy without failing ? These are the questions that have obsessed me over the last few months. I've conducted dozens of training sessions, with various training settings. It never fixed the problem. I've tried to modify the reward signal, to further punish the AI when it falls off. It didn't fix the problem. Then I've tried to increase its action frequency. Maybe its reaction time is too slow to recover from small accidents. It didn't fix the problem. Honestly I think I'm a bit lost here. But I still have one thing to investigate. For that, I need to tell you about a small detail I haven't mentioned yet. I said that all these cars are controlled by the exact same AI. And you might have been wondering.. Why is there so much disorder among these different attempts ? Why are these runs even different ? Since the AI has a fixed decision making process, all its runs should be identical, as Trackmania's physics are deterministic. The same action in the same state of the game will always have the same consequences. But to counter this, I'm using a small trick. In the first tenth of a second, the AI normally decides to go straight ahead. Instead, I'm forcing it to turn very slightly, using a different steering value for each run. In the next tenth of a second, the AI takes back control. This initial perturbation is so tiny that it's not even visible on screen. However, after a few seconds, you can see that the actions and trajectories in each run become desynchronized. To the point of generating completely different runs. With this simple trick, we can get the same AI to drive many different runs. What's surprising about these runs is that there doesn't seem to be any obious pattern to the AI's falls. It's almost as if these mistakes occur at random. But what I find most disconcerting is how even the slightest change in one single action can yield widely diverging outcomes. And it's not specific to the the start of the track. This kind of perturbation can be generated anywhere. Like here. Just by applying any tiny random change to one action. Every time, a seemingly insignificant change in one decision will have massive consequences later on. This gave me an idea. Let's take a look at one of the AI's attempts, right around the spot where it ends up falling off. What if we generate the same type of perturbation on this run a few seconds before the fall ? Right here. Now it's getting really disturbing. Almost every time, the mistake disappears. So if the AI had changed its steering by almost any tiny amount here, it would not have failed in the next corner. I've repeated the same experiment a little closer to the fall point. Here. This time, the outcome is a little more uncertain. What's even more disconcerting is that there doesn't seem to be any apparent logic between the AI's action here, and whether or not it falls 2 seconds later. The outcome looks completely random. So is it the AI that plays randomly here ? Let's repeat this experiment one final time. But this time, after the perturbation, each car will be forced to maintain the same actions as the reference run. Well even with consistent actions, the outcome still varies a lot. It fluctuates between different patterns. In a way that once again just seems unpredictable. It can't just be the AI that's inconsistent. Instead, it might be the game's physics that behave randomly. But Trackmania is deterministic, so.. it's not random. Right ? Well, if you ask any experienced player, they will probably tell you that there are quite a few situations where this game feels random. Situations where you try to repeat the same lines and actions, but the game just seems to react differently. Situations where the game's behavior seems unpredictable. There's one track from the official campaign that's particularly known for that: E03. This track is a nightmare. On almost every jump, you can feel that sense of randomness. Especially at the landing, where the car's behavior is highly sensitive to any small change in its state. Even for professional players, it's super hard to predict these behaviors. Such irregularities are commonly referred as bugs, and they constitute a severe obstacle to consistency. As far as I know, no one has ever managed to complete a run without landing bugs on E03, 15 years after it's release. So is it the same thing that limits the AI's consistency on pipes, despite its insane driving skills ? I'm not sure we can call that bugs, but I feel there's a fair amount of randomness whenever a car drives over a pipe in Trackmania. At high speeds, these complex behaviors might become particularly punishing. Unrecoverable for anyone, even an AI. But this is just a guess, there might be other reasons. This AI is far from perfect. Maybe these pipes are not random, just too complex for the AI. We might get a better answer with the final level, when we get there. But for now, let's refocus on our initial target. Wirtual's record. So how can we beat that ? Actually, I've no idea how to fix the AI's mistakes. But what I do know, is that even if the AI makes mistakes, it did have a few promising attempts. Sometimes completing more than half of the map. So maybe the AI just needs a bit more luck ? if we make the AI drive the map many more times, it should end up getting a lucky attempt. One that reaches the finish line, and finally beats the human record. In the world's most competitive racing game, driving a perfect speedrun often implies taking risky lines. Trajectories where disastrous mistakes can be inevitable. Apparently, this is no different for our AI, which often loses its balance in corners. In fact, based on its numerous attempts, the AI has about a 97.3% chance of passing a corner. In a way, that's a pretty good number. But this infamous track contains many corners. And the probability of completing all of them without failing becomes pretty low. But it's possible, the AI just needs one good run. one single run where everything goes well. A run which would finally tell us how far the AI can push the limit on this map. And finally, after more than a thousand additional attempts, the AI got this run. 8 minutes faster than the human record. That's pretty strong ! And the AI still has some room for improvement. Already in the first corners, its run was well behind some of its other attempts. With this new success, I was already thinking about putting the AI on the final track. But before going any further, I think Wirtual deserves a second chance. Honestly, we could argue that the AI had obious advantages in this battle. Compared to a human, it doesn't risk to lose its focus on this long track. It can make quick decisions, which is ideal on pipes. And it can't lose its patience either. Even if it fails, it can just try again and again. So I've been thinking about ways of penalizing the AI, to try to make the competition with human a bit more even. And I came up with this question: What if we force the AI to drive backwards ? So I've driven the first few seconds of a run myself, to make the car land on the pipe backwards. From there, the AI is gonna take control. I'm gonna start a fresh training session with this new starting point, and we'll see if the AI can still beat humans. Even backwards, the AI keeps very good control on the pipe. I was quite amazed when I first saw this. Again, the AI's efficiency and precision look inhuman. But still, the AI is not very intelligent. because it could have done better. It could have cheated. Although it's difficult, it's actually possible to turn around anywhere on the pipe. But I'm not too surprised that the AI didn't discover any of this. For that to happen, the AI would need to perform a precise sequence of actions by chance, without immediate positive feedback. The payoff would only come a long time after. This is quite unlikely to happen, without clear and guided indications. Anyway, it's a good thing the AI didn't cheat. Driving backwards was supposed to be a handicap for the whole race. Now it's time to find out. In these conditions, can the AI still beat Wirtual ? Ok so the AI is definitely faster than the record pace. Even backwards. But it looks like we have a major problem: is it still possible to reach the finish line ? During its training, the AI never cleared the jump. But over time, it made good progress. It even got a few very promising attempts. It's at this stage of the training that the AI stopped improving on the rest of the track. Yet I still have the feeling that it can do better here. So let's try to continue training, but with even more focus on the jump. From now on, the AI will spend 90% of its training in the finish area. Ok now we have confirmation that it's possible. But let's keep training a little more, the AI is still hesitant. And actually I think it's quite fun to watch. Seeing the AI gradually master the jump, the different strategies it puts in place.. Here for example, the AI has turned its car over by accident. Now, it just has to go forwards to the left. But the AI doesn't understand this new situation. During its training, it was punished again and again every time it tried to drive forwards. So now, t's just trying to go backwards as usual. Until it's able to finish. After a while, the AI's strategy has stabilized. At this point, it's still unable to complete the jump consistently, only about 20% of the time. That's quite low compared with the previous AI. But on the rest of the map, it's surprisingly consistent. It makes almost no mistakes compared with the one driving forwards. So this time, it didn't take the AI that many tries to get everything right in a single run. Honestly, I wasn't expecting such a good time. It appears that driving backwards wasn't such a big disadvantage for the AI. So again, I've been wondering how to handicape the AI further. What if we force the AI to drive upside down ? No I'm kidding, let's move on to something more interesting. I've tried though! But let's forget that, and finally focus on the final level. This is by far the hardest one. And probably the most random. This track was released 12 years ago. And since then, only one person has ever finished it: a player named Unnamed. Most of the track consists of the same repetitive jump from one pipe to another. So initially, to simplify training, we're gonna focus on that part, making the AI start from here. Again, I had to slightly adapt the AI's inputs for this track. But overall, the training method remains the same. After 100 hours of play, the AI's behavior doesn't look super promising. Let's take a closer look at the world record. As you can see, Unnamed is constantly maneuvering at moderate speed, to ensure he lands correctly after each jump. That's quite different from what the AI chose to do. It's maintaining speed to jump in one go, without using the intermediate platforms. Obviously, the AI strategy seems faster. But it can't pass these jumps consistently. If the AI never tried the world record strategy, I think it's for the same reason that it never flipped its car when it was driving backwards. For the same reason I had to force it not to brake on the first level. When you see all this, this artificial intelligence doesn't seem so intelligent. It seems to lack creativity. There might be a link with the AI's falls. We've observed them on every level. and I said it's because of the game's randomness. But maybe the AI just lacks enough creativity to deal with that ? Still on the last level, even if it's not consistent, I believe the AI found the fastest strategy. So to better understand its mistakes, I've tried to play the same way for about an hour. And I couldn't do much better than the AI. Maybe some other players could, I don't know. But I'm not sure creativity is an issue here. Actually the one thing I remember about this experience, it's just how random these jumps feel. On this track, the pipes look more random than ever. For the final time, is it really because Trackmania is random, or are we missing something ? Here is a last experiment I made. Let's say we wait at the start of this simple track, without pressing anything. During this time, the car remains stationary. But not entirely. With external tools, we can observe imperceptible variations in the car state. So if we start to accelerate at different times, the car will start with small differences in initial conditions. Extremely small differences. Yet it's enough to induce a completely different behavior on the pipe. A behavior that looks as random as the roll of a die. Like for example, if you wait exactly 7.65s before accelerating, you always reach the finish. Here are the same runs again, visualized as if they were starting at the same time. Can we predict anything in this mess ? Can we really predict how a car is gonna land on a pipe ? It just looks like complete chaos. Since I've been confronted with these irregular behaviors in the game, I've been wondering if there's a connection with chaos. It's a field I don't know really well, so I'd be interested to have your opinion on this. Basically, chaos theory deals with deterministic systems where small differences in initial conditions can lead to vastly different outcomes. That sounds pretty close to what we've observed so far. And this theory states that the deterministic nature of these systems does not make them predictable. So if there is chaos in this, even if both the AI and the game are deterministic, the consequences of certain actions could be impossible to predict. The whole point of this AI is to make this kind of predictions. So maybe that's the reason some of its mistakes seem inevitable. I won't go any further on that. Again, it's totally outside what I know. But if there are any specialists of this field watching this, please don't hesitate to contact me. I'd like to understand this better, maybe talk about it in a future video. What I know for sure, it's that if we want this AI to finish the track, it's gonna need to be very lucky. I've tried to train the AI longer, but I haven't observed any progress. As if it had given up trying to understand the map. Actually, I've tested way too many things on this track already. I think I'm getting tired of these pipes. I don't want to dream about these pipes anymore. And this video is getting quite long too, it already took 5 months to get there. So I'm gonna keep the AI as it is, and hope it's good enough. I just gave it a bit of extra training to practice the start. It caused additional problems, but anyway it's time to test it on the whole map. Once again, it's gonna play it many times. And we just have to hope it gets one lucky attempt. That's it, the AI did it ! And with that it's done, it managed to break the human world one each of the three levels. A big thanks to all my Patreon members, who helped finance this video. Making this kind of project takes a lot of time, so any support on Patreon is a great reward for me, and it will help me to spend more time on upcoming videos. I'd like to end with a shoutout to the players mentioned in this video. The tracks I chose were quite specific and repetitive, clearly in favor of the AI. It didn't fully showcase the incredible skill, patience, adaptability and intelligence of such Trackmania players.

Info

Channel: Yosh

Views: 2,184,901

Rating: undefined out of 5

Keywords:

Id: kojH8a7BW04

Channel Id: undefined

Length: 37min 18sec (2238 seconds)

Published: Wed Mar 13 2024