AI Outraces Trackmania Experts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Each of these cars is controlled by an Artificial  Intelligence in the racing game Trackmania. Right now, this AI is attempting something  particularly tricky: driving on pipes. I've designed this AI to learn from scratch,   without any previous knowledge of the game. So  at first, the AI can't keep its balance for long. But this is part of the plan. Because  this computer program is designed to   learn from its mistakes and  improve itself over time. So with enough training, can it come  up with better strategies than humans ? On these unstable pipes, this question  might be particularly interesting. So to answer that, the AI is gonna attempt to   beat the human World Record on three  challenging tracks that I've selected. Starting with the easiest  one: a simple straight pipe. The rules are simple. the AI can use  four different actions. At first,   without any experience, it's  just using them randomly. If we want the AI to make progress,   it needs a target. So for each action  the AI takes, I'm gonna give it a reward. The faster it progresses along the pipe, the  higher the reward, as long as it doesn't fall off. Now, the AI's only goal will be to predict  the actions that add up to the most rewards. And it's gonna learn this through a  process called Reinforcement Learning. Basically, the AI is gonna play over  and over again. In each attempt,   it can try new things, and gather new experience. Over time, this experience is used to reinforce  the AI to select actions leading to more reward. Through this trial and error process,   the AI should gradually learn to go  faster and keep its balance on the pipe. After 12 hours of driving,  the AI is already quite fast. It may look easy but I can assure you, it's not. I've tried myself to challenge it, and even  though I've been playing this game for years,   I just couldn't keep up with its pace. The AI's driving already looks quite inhuman. To select its actions, the AI uses  real time observations of the game. A few numbers that sum up everything  it needs to know, such as its speed,   position and orientation on the pipe. About every tenth of a second,   the AI interprets these numbers with  something called a neural network. Basically, that's the AI's brain. Its job is to  predict the optimal action in a given situation. So the AI's strategy depends on how  this neural network is configured. This is where reinforcement  learning operates behind the scenes,   by gradually tuning the network configuration. For now, this training process is not over. The AI can probably reach a faster pace,  and it's not super consistent anyway. Let's see how far it can go. This isn't the first AI I've  trained in Trackmania. I've   already got some promising results before. But today, there's something  I can say for the first time. The level of this AI is definitely inhuman. This is the human record on this track.  Compared to that, the AI's pace is just absurd. But driving fast is only useful if you  can reach the finish at the end. And if   you haven't noticed yet, this track is quite long. That's partly why the world record keeps  a very low speed compared to the AI. Going any faster would be quite risky, as  a single mistake is enough to end a run. In fact, that's something that  started to worry me about the AI. Can it really maintain this pace to the  finish line without falling off the pipe ? Because I've observed this AI attempt the track  many times, and it's not super consistent. All these cars are controlled by  the exact same version of the AI,   yet the outcome is very different  from one attempt to another. It's kinda strange, the AI can repeat the  same cyclic pattern dozens of times without   any problem, but suddenly its car starts to  deviate slightly, and that's the end of the run. I have no idea what it's doing wrong. I mean,  I just can't drive at that level myself. We could guess that the AI favors its  pace over its consistency. But for   some mysterious reason, it can't optimize both. But that's okay, we can save  that issue for later. because   the AI still got several promising attempts. Like this one. A pretty good time, right ? This looks very promising for the upcoming tracks. But to be honest, this time is still quite  far from optimal. And I know that because.. This isn't the only AI I've trained on this map. Here you can see the start of a second training  session using the exact same training process. I've run several of this out of curiosity,  and it gave me quite disconcerting results. Just like before, these new AIs  eventually managed to complete the track. But they didn't end up with  the exact same strategy. It looks similar, with the  same kind of cyclic pattern.   But when you look at their actions  closely it's not exactly the same. In particular, they use a  different method for slowing down. And the two new strategies are actually  better, both achieved a faster time. So if the AI can find different strategies,   there might be a faster one  that it hasn't yet discovered. But I can hardly understand what's happening here.  How could I know if it's possible to do better ? Well I think there's one thing I could try. As I said, it appears that all these  AIs intentionally slow down the car. is that really necessary ? If I retrain the AI one final time from scratch,   but I force it to always accelerate  and never brake, can it drive faster ? If the only thing the AI can control is its  steering angle, can it still finish this track ? Here's the best the AI could come up with. At first, the AI can't slow down, so it  inevitably builds up speed. Soon enough,   it reaches a faster pace than all previous AIs. But then, something interesting  happens. Its speed starts to stabilize. After a closer look, I think it's because the  car regularly loses contact with the pipe. With these little jumps, the AI still  found a way to control its speed,   it should be able to cross the entire pipe. Like all the previous AIs,   it's not super consistent. But again  these mistakes were not too frequent. Not enough to prevent the AI to  finish the map one final time. So finally, it turns out that this track  belongs to the full speed category. Actually, it's surprising that the  AI didn't find it on its own. We   might never know how close we are to the limit. Maybe the AI can still reduce  its air time and go even faster. But we'll stop here for this  track. This was just a warm up,   and it's time to move on to the  serious stuff, on a more complex map. A map with a challenging world record,   which by itself was enough to  motivate me to make this video. A record held by a player the AI  recently faced, but couldn't beat. A player named Wirtual. If you don't know him, Wirtual is a  highly experienced Trackmania player,   and also a well-known streamer. I can remember it was while  watching one of his livestreams,   two years ago, that I first thought  of making an AI drive on a pipe. That night, Wirtual was attempting to beat  the world record on this giant pipe maze.   And after a few hours, he finally succeeded,  with an impressive time under 20 minutes. As for me, I managed to get my AI  working a few months later. But   it wasn't fast enough to beat experienced players. Since then, I've made quite  a few improvements to the AI. A few months ago, for the first time, it was able  to completely dominate me on regular road tracks. However, this wasn't enough against Wirtual,   who after another extensive playing  session, managed to beat the AI. But the AI looks stronger on pipes. Today, it might have a chance of a small revenge. Overall, the training method remains  the same as for the first track. The main difference is that the AI needs  additional information about the track layout. For instance, this new input provides  the distance to the next corner,   and this one the direction of that corner. So far, the AI hasn't been able to  go that far in the map. Once again,   it seems to prioritize pace over consistency. Its driving style is fairly aggressive.  It's able to overtake the record in the   first few corners, but it never gets very far. This time, consistency might be more important. Okay it looks like the AI has no intention of   playing any safer. It's just  driving faster than before. But it's not that bad, it  can go further and further. Actually the AI isn't really exploring the  map as it goes. For the clarity of the video,   I'm only showing attempts from the start block,   But in reality, the AI starts from a random  location on each new training attempt. This way, it can practice all possible scenarios,   without focusing excessively  on the first few turns. And if you look closely, there's one area  where I made the AI spawn more frequently. Right before the finish. This particular section is quite different  from the rest. To reach the finish line,   you need to build up speed  and jump from the last corner. It's a tricky jump, even for experienced players. But for the AI, the difficulty wasn't the main   issue. It was more a problem  of understanding what to do. To guide the AI, I had to  adapt the reward signal a bit. When the AI enters the finish  area, it knows it with this input. From there, it's rewarded based on  how close it gets to the finish,   regardless of whether it's  following the path or not. And if it ever crosses the finish line,  it receives a massive bonus reward. With that, the AI quickly understood that jumping   would bring more rewards. But it  wasn't jumping from the right spot. It took the a many hours to rectify its approach. From there, it started to look interesting. And after many attempts, the  AI got its first success. It then continued to improve its approach. Eventually, it became quite consistent. Now there is a good chance that if the  AI ever reaches this area in a real run,   it could conclude it with a successful jump. During this time, I've kept an eye on its training   on the rest of the map. Its  driving looks insane now. All this time, the AI has  continually improved its pace. Now we can stop the training and  keep the final version of this AI. If this one maintains that pace up to  the finish, it could set an insane time. We just have to repeat what  we did on the first track:   make it attempt the track many times  and see how far it can push the limit. Among all these attempts, the  AI didn't reach the finish once. Most of the time, the AI can  survive for one or several minutes. But for some reason, it always  ends up falling of the pipe. That sounds familiar. I could train the AI longer, but I don't  think it's gonna make any difference. Its   consistency has remained almost  unchanged for many hours now. If we want the AI to be the  human record, that's a problem. But if the AI understands so  well how to go fast on a pipe,   why doesn't it also understand  how to avoid falling out of it ? Why do these mistakes even happen ? I mean, it's strange to observe  this behavior for a robot. It's   not like a human who would make an  inattention error after some time. Why is it that even a robot can't repeat  consistently the same strategy without failing ? These are the questions that have  obsessed me over the last few months. I've conducted dozens of training  sessions, with various training settings. It never fixed the problem. I've tried to modify the reward signal, to  further punish the AI when it falls off. It didn't fix the problem. Then I've tried to increase its action frequency.   Maybe its reaction time is too slow  to recover from small accidents. It didn't fix the problem. Honestly I think I'm a bit lost here. But I still have one thing to investigate. For that, I need to tell you about a  small detail I haven't mentioned yet. I said that all these cars are controlled by the  exact same AI. And you might have been wondering.. Why is there so much disorder  among these different attempts ? Why are these runs even different ? Since the AI has a fixed decision making process,   all its runs should be identical, as  Trackmania's physics are deterministic. The same action in the same state of the  game will always have the same consequences. But to counter this, I'm using a small trick. In the first tenth of a second, the AI  normally decides to go straight ahead. Instead, I'm forcing it to turn very slightly,  using a different steering value for each run. In the next tenth of a second,  the AI takes back control. This initial perturbation is so tiny  that it's not even visible on screen. However, after a few seconds, you can see that   the actions and trajectories in  each run become desynchronized. To the point of generating  completely different runs. With this simple trick, we can get the  same AI to drive many different runs. What's surprising about these runs is that there   doesn't seem to be any obious  pattern to the AI's falls. It's almost as if these mistakes occur at random. But what I find most disconcerting is how even the   slightest change in one single action  can yield widely diverging outcomes. And it's not specific to the  the start of the track. This   kind of perturbation can be generated anywhere. Like here. Just by applying any  tiny random change to one action. Every time, a seemingly insignificant   change in one decision will have  massive consequences later on. This gave me an idea. Let's take a look at one of the AI's attempts,   right around the spot where  it ends up falling off. What if we generate the same type of perturbation  on this run a few seconds before the fall ? Right here. Now it's getting really disturbing.  Almost every time, the mistake disappears. So if the AI had changed its steering  by almost any tiny amount here,   it would not have failed in the next corner. I've repeated the same experiment  a little closer to the fall point. Here. This time, the outcome is a little more uncertain. What's even more disconcerting is that  there doesn't seem to be any apparent   logic between the AI's action here, and  whether or not it falls 2 seconds later. The outcome looks completely random. So is it the AI that plays randomly here ? Let's repeat this experiment one final time. But this time, after the perturbation,   each car will be forced to maintain  the same actions as the reference run. Well even with consistent actions,  the outcome still varies a lot. It fluctuates between different patterns. In a  way that once again just seems unpredictable. It can't just be the AI that's inconsistent.   Instead, it might be the game's  physics that behave randomly. But Trackmania is deterministic,  so.. it's not random. Right ? Well, if you ask any experienced player,   they will probably tell you that there are quite  a few situations where this game feels random. Situations where you try to  repeat the same lines and actions,   but the game just seems to react differently. Situations where the game's  behavior seems unpredictable. There's one track from the official campaign  that's particularly known for that: E03. This track is a nightmare. On almost every  jump, you can feel that sense of randomness. Especially at the landing,  where the car's behavior is   highly sensitive to any small change in its state. Even for professional players, it's  super hard to predict these behaviors. Such irregularities are commonly referred as bugs,   and they constitute a severe  obstacle to consistency. As far as I know, no one has ever  managed to complete a run without   landing bugs on E03, 15 years after it's release. So is it the same thing that limits  the AI's consistency on pipes,   despite its insane driving skills ? I'm not sure we can call that  bugs, but I feel there's a fair   amount of randomness whenever a car  drives over a pipe in Trackmania. At high speeds, these complex behaviors  might become particularly punishing. Unrecoverable for anyone, even an AI. But this is just a guess,  there might be other reasons. This AI is far from perfect. Maybe these pipes  are not random, just too complex for the AI. We might get a better answer with the final level,   when we get there. But for now,  let's refocus on our initial target. Wirtual's record. So how can we beat that ? Actually, I've  no idea how to fix the AI's mistakes. But what I do know, is that even if the AI makes  mistakes, it did have a few promising attempts. Sometimes completing more than half of the map. So maybe the AI just needs a bit more luck ? if we make the AI drive the map many more  times, it should end up getting a lucky attempt. One that reaches the finish line,  and finally beats the human record. In the world's most competitive racing game,   driving a perfect speedrun often  implies taking risky lines. Trajectories where disastrous  mistakes can be inevitable. Apparently, this is no different for our AI,  which often loses its balance in corners. In fact, based on its numerous attempts, the AI  has about a 97.3% chance of passing a corner. In a way, that's a pretty good number. But this infamous track contains many corners. And the probability of completing all of  them without failing becomes pretty low. But it's possible, the AI just needs one good  run. one single run where everything goes well. A run which would finally tell us how far  the AI can push the limit on this map. And finally, after more than a thousand  additional attempts, the AI got this run. 8 minutes faster than the human  record. That's pretty strong ! And the AI still has some room  for improvement. Already in the   first corners, its run was well  behind some of its other attempts. With this new success, I was already thinking  about putting the AI on the final track. But before going any further, I think  Wirtual deserves a second chance. Honestly, we could argue that the AI  had obious advantages in this battle. Compared to a human, it doesn't risk  to lose its focus on this long track. It can make quick decisions,  which is ideal on pipes. And it can't lose its patience either. Even  if it fails, it can just try again and again. So I've been thinking about  ways of penalizing the AI,   to try to make the competition  with human a bit more even. And I came up with this question: What if we force the AI to drive backwards ? So I've driven the first  few seconds of a run myself,   to make the car land on the pipe backwards. From there, the AI is gonna take control. I'm gonna start a fresh training  session with this new starting point,   and we'll see if the AI can still beat humans. Even backwards, the AI keeps  very good control on the pipe.   I was quite amazed when I first saw this. Again, the AI's efficiency  and precision look inhuman. But still, the AI is not very intelligent.  because it could have done better. It could have cheated. Although it's difficult, it's actually  possible to turn around anywhere on the pipe. But I'm not too surprised that the  AI didn't discover any of this. For that to happen, the AI would need to perform a   precise sequence of actions by chance,  without immediate positive feedback. The payoff would only come a long time after. This is quite unlikely to happen,  without clear and guided indications. Anyway, it's a good thing the AI  didn't cheat. Driving backwards   was supposed to be a handicap for the whole race. Now it's time to find out. In these  conditions, can the AI still beat Wirtual ? Ok so the AI is definitely faster  than the record pace. Even backwards. But it looks like we have a major problem: is  it still possible to reach the finish line ? During its training, the AI never  cleared the jump. But over time,   it made good progress. It even  got a few very promising attempts. It's at this stage of the training  that the AI stopped improving on   the rest of the track. Yet I still have  the feeling that it can do better here. So let's try to continue training, but with  even more focus on the jump. From now on,   the AI will spend 90% of its  training in the finish area. Ok now we have confirmation that it's possible. But let's keep training a little  more, the AI is still hesitant. And actually I think it's quite fun to watch.  Seeing the AI gradually master the jump,   the different strategies it puts in place.. Here for example, the AI has turned  its car over by accident. Now,   it just has to go forwards to the left. But the AI doesn't understand this new situation. During its training, it was punished again and  again every time it tried to drive forwards. So now, t's just trying to go backwards as usual. Until it's able to finish. After a while, the AI's strategy has stabilized. At this point, it's still unable to complete the  jump consistently, only about 20% of the time. That's quite low compared with the previous AI. But on the rest of the map, it's surprisingly   consistent. It makes almost no mistakes  compared with the one driving forwards. So this time, it didn't take the AI that many  tries to get everything right in a single run. Honestly, I wasn't expecting such a good time. It   appears that driving backwards wasn't  such a big disadvantage for the AI. So again, I've been wondering  how to handicape the AI further. What if we force the AI to drive upside down ? No I'm kidding, let's move on  to something more interesting. I've tried though! But let's forget that, and  finally focus on the final level. This is by far the hardest one. And probably the most random. This track was released 12  years ago. And since then,   only one person has ever finished  it: a player named Unnamed. Most of the track consists of the same  repetitive jump from one pipe to another. So initially, to simplify training, we're gonna  focus on that part, making the AI start from here. Again, I had to slightly adapt the AI's  inputs for this track. But overall,   the training method remains the same. After 100 hours of play, the AI's  behavior doesn't look super promising. Let's take a closer look at the world record. As you can see, Unnamed is constantly  maneuvering at moderate speed,   to ensure he lands correctly after each jump. That's quite different from  what the AI chose to do. It's maintaining speed to jump in one go,  without using the intermediate platforms. Obviously, the AI strategy seems faster.  But it can't pass these jumps consistently. If the AI never tried the world record strategy,   I think it's for the same reason that it never  flipped its car when it was driving backwards. For the same reason I had to force  it not to brake on the first level. When you see all this, this  artificial intelligence doesn't   seem so intelligent. It seems to lack creativity. There might be a link with the  AI's falls. We've observed them   on every level. and I said it's  because of the game's randomness. But maybe the AI just lacks enough  creativity to deal with that ? Still on the last level,  even if it's not consistent,   I believe the AI found the fastest strategy. So to better understand its mistakes, I've  tried to play the same way for about an hour. And I couldn't do much better than the  AI. Maybe some other players could,   I don't know. But I'm not sure  creativity is an issue here. Actually the one thing I remember about this  experience, it's just how random these jumps feel. On this track, the pipes  look more random than ever. For the final time, is it really  because Trackmania is random,   or are we missing something ? Here is a last experiment I made. Let's say we wait at the start of this  simple track, without pressing anything. During this time, the car remains stationary. But not entirely. With external tools, we can  observe imperceptible variations in the car state. So if we start to accelerate at different times,   the car will start with small  differences in initial conditions. Extremely small differences.  Yet it's enough to induce a   completely different behavior on the pipe. A behavior that looks as  random as the roll of a die. Like for example, if you wait exactly 7.65s  before accelerating, you always reach the finish. Here are the same runs again, visualized  as if they were starting at the same time. Can we predict anything in this mess ? Can we  really predict how a car is gonna land on a pipe ? It just looks like complete chaos. Since I've been confronted with these  irregular behaviors in the game,   I've been wondering if there's  a connection with chaos. It's a field I don't know really well, so I'd  be interested to have your opinion on this. Basically, chaos theory deals with  deterministic systems where small   differences in initial conditions can  lead to vastly different outcomes. That sounds pretty close to  what we've observed so far. And this theory states that  the deterministic nature of   these systems does not make them predictable. So if there is chaos in this, even if both  the AI and the game are deterministic,   the consequences of certain actions  could be impossible to predict. The whole point of this AI is to  make this kind of predictions.   So maybe that's the reason some  of its mistakes seem inevitable. I won't go any further on that. Again,  it's totally outside what I know. But if there are any specialists  of this field watching this,   please don't hesitate to contact me. I'd like to understand this better,  maybe talk about it in a future video. What I know for sure, it's that if  we want this AI to finish the track,   it's gonna need to be very lucky. I've tried to train the AI longer,   but I haven't observed any progress. As if  it had given up trying to understand the map. Actually, I've tested way too many things on   this track already. I think I'm  getting tired of these pipes. I don't want to dream about these pipes anymore. And this video is getting quite long too,  it already took 5 months to get there. So I'm gonna keep the AI as it  is, and hope it's good enough. I just gave it a bit of extra  training to practice the start. It caused additional problems, but anyway  it's time to test it on the whole map. Once again, it's gonna play it many times. And we just have to hope  it gets one lucky attempt. That's it, the AI did it ! And with that it's done, it managed to break  the human world one each of the three levels. A big thanks to all my Patreon  members, who helped finance this video. Making this kind of project takes a lot  of time, so any support on Patreon is a   great reward for me, and it will help me  to spend more time on upcoming videos. I'd like to end with a shoutout to  the players mentioned in this video. The tracks I chose were quite specific and  repetitive, clearly in favor of the AI. It didn't fully showcase the  incredible skill, patience,   adaptability and intelligence  of such Trackmania players.
Info
Channel: Yosh
Views: 2,184,901
Rating: undefined out of 5
Keywords:
Id: kojH8a7BW04
Channel Id: undefined
Length: 37min 18sec (2238 seconds)
Published: Wed Mar 13 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.