Each of these cars is controlled by an Artificial
Intelligence (AI) in the racing game Trackmania. And this AI is designed to improve over time
through trial and error. The longer it trains, the better it gets. With enough training,
it should be able to find the best lines, to drift perfectly, it might even become
unbeatable. At least that's the theory. Actually, I've already tried to build such an AI, several
times, and it could drive but I've played this game for years and it wasn't enough to beat
me. Still I think it has some potential. So about six months ago, I decided to give this
project one last chance, and to offer the AI an opportunity for revenge. This video is the
conclusion of a three-year journey to make an AI that could beat me in Trackmania. And this
time it got much better than I had expected. But first let's start with this very simple
track, to better visualize what this AI does. The AI uses something called an artificial neural
network. Basically, it's some kind of mathematical tool which roughly models how a brain works. Every
tenth of a second, this neural network receives a few numbers which describe what's happening in
the game. In response, it outputs new numbers specifying the action to perform. Hopefully, the
network will select actions that ensure the AI finishes this track as quickly as possible. But
this will only happen if the network is configured correctly, that's the tricky part. For this, I'm
using a method called Reinforcement Learning. Here's how it works. The AI starts from scratch
with zero prior knowledge about anything. So at first its decisions are quite random. But for
every action it takes it's rewarded, depending on how good that action was. The faster the AI
progresses along the track the higher the reward. So with each new attempt, the AI explores
the game and gathers data from it. Now the idea of reinforcement learning is
to use this data to progressively tweak the neural network, in a way that
reinforces actions leading to more reward. Actually, all the cars I'm showing are controlled
by slightly different versions of the neural network, and they represent successive states of
the AI as it learns. In each new attempt, the AI is trying actions based on some of its current
knowledge. Sometimes it goes well, sometimes not. but whatever happens, the AI can use this
fresh knowledge to update its decision process, And it's through this trial and error loop that
the AI gradually learns the game, all by itself, until it has mastered it. At least, that's
the theory ! In practice, it's been a real nightmare to get this thing working properly...
Let's go back a few months in this project. This where the AI's first decent attempts on this
track, compared with my personal best. It wasn't my first time with reinforcement learning, and the
AI seemed to pretty much understand what it had to do. But it would often get stuck in a sub-optimal
strategy, unable to come up with anything better. In particular, the AI loved to hit the walls.
And in a way it makes sense. When the AI decides to smash into a wall at full speed,
it actually collects more rewards initially, and it's only later that it turns out to be
a bad decision. Such conflict between short and long-term reward is one of the many things
that makes reinforcement learning difficult to get right. And when you observe the AI doing
weird things, it's quite tricky to figure out where the problem lies. Maybe there is a bug
in my code ? Or maybe it's the method itself, is the reward signal clear enough ? I could
let it train a little longer to see, but I don't know.. It could be that the AI doesn't see
enough to locate the walls. And did I configure the learning algorithm properly ? Should I try a
different algorithm ? This seems interesting... Let's check out the comments on my last
video, I might find some help there. Oh maybe it's because the AI can't break ? No.
That I'm sure is not the issue. But for the rest, well it's hard to know. So like my AI, I entered
a trial and error loop of guessing what to fix, re-running the training, and waiting to see if it
got better. Usually it didn't. This was a painful process. Especially because the training part
needs to run for a long time before producing any results. It's hard to learn on your own from
scratch. An AI like this usually needs to gather a lot of data to even begin to understand what's
happening. So each time I want to test an idea, I need to let these training sessions run during
hours on my laptop, before getting any useful feedback. That's why I went back to this project
with such a simple track and without enabling the brake. Every simplification of the decision making
space tends to make the problem easier and quicker to solve, and I found it easier to progress
this way. And eventually, as the days passed, all these efforts started to pay off. After many
small adjustments in my code, the AI finally stopped hitting the walls and it was getting
closer and closer to my time. [Music] Until finally, it happened. And this
was only the beginning. [Music] [Music] Three years after starting this project, I had
finally trained an AI that I would probably never be able to beat. And I have to admit:
after playing this game for so many years, it was a strange feeling to be outmatched
like that by a computer program ! But this track: it's quite simplistic. It was
time to challenge the AI one step further. Last year, I spent a few hours trying
to get a good time on this map. I then trained an AI on the same
map, without much success. Now it's time to try again and see if
the AI can get its revenge. [Music] [Music] Overall, the training method remains the same as
on the first track. The main difference lies in the observation input the AI gets when it drives.
Just like on the first track, The AI gets a few car metrics, such as its current speed. It also
gets its position relative to the road centerline. But on this track, the road layout is no
longer repetitive, the AI must anticipate the upcoming turns. So I added new inputs to
encode the map path for the next three corners. To make the video easier to follow,
I'm only showing the attempts from the starting line. But in reality, the AI
regularly spawns anywhere on the map during its training. This prevents it from
focusing too much on the first turns. Here's an overview of the first 5 hours of
training. Some attempts are already ahead of my personal best in the first few turns.
This looks much more promising than last year. But the AI is not sufficiently robust,
and it never maintains its lead for long. At least for now. One thing that surprised me
is that the AI takes most turns very inside. It appears that the game's physics are slightly more
complex around these road edges. For this reason, I had to include a few more inputs to make
sure the AI understood everything that was going on. Adding these two inputs gives the
full orientation of the car, and these inputs indicate which wheels are in contact with
the road, and whether they are sliding. [Music] After nearly 9 hours of training,
the AI managed to complete the map, with a pretty good pace. It completely crushed
the AI from last year, finishing only 13 seconds behind my personal best. And it continued
to complete the map on subsequent attempts, it was becoming very consistent. But
could it also drive faster ? [Music] [Music] [Music] That's it. After playing this game for 35 hours,
the AI was already faster than me on this map. But I wasn't ready to give up yet. Now it was
my turn to train. This time, I will try with the brake. Braking doesn't add a huge advantage on
this map. Still, it makes it possible to drift, which can save a few tenths of a second here
and there. This time, I was able to finish the map in 4min36. Of course, this gives me an
unfair advantage over the AI, since I don't allow it to break yet. But I was still curious
to see how close it would get to this new time. Well, not only did the AI beat this new time, but
it wasn't even close. More than 5 seconds ahead. Now, it seemed it was time to admit the
superiority of the AI on this track. But only on this track, right ? Well, I tested
the AI on another map where it had never trained before. And it was quite good ! I also tried
to make it drive the training track in reverse, and it adapted pretty well to this new
context. But overall, on these unseen tracks, the AI is less precise, it makes more mistakes,
and sometimes it just gets completely confused. Especially when it's approaching a long
straight line for some reason. All this stuff relates to the question of generalization.
There would be much more to say on this subject, I've tested a whole bunch of other things like
modifying the road surface, adding obstacles, even changing the car's physics. But let's forget all
that for now, I'd prefer to return to the training map. Because even there, there is something
about this AI that doesn't fully convince me. Of course I agree, the AI completely crushed me
on this map. But does it really drive faster than me ? I mean, my personal best run contains quite
a few mistakes. Trust me, it's super hard to get through these 230 corners at a good pace without
failing. The AI on the other hand is extremely consistent. And I think that's what makes it
so strong, in this kind of endurance scenario. However, if I focus on just one part of the
map, until I drive a run with no errors that I'm fully satisfied with, will the AI still
be faster than me ? This is the third and final map where I will face the AI. And if it
beats me here, then I will be fully convinced. [Music] This time, the AI didn't beat me once.
It's becoming clear that without brakes, the AI is badly disadvantaged. So far, I've
always disabled the brake on the AI. Braking makes the game much more complicated to understand
and master. I was afraid it would make AI training too complex. Maybe I was wrong. Anyway, I
think it's now time to make the AI drift. Here's the plan: I'm going to retrain the AI
on the long map with the brake available. Then, I will compare this new AI with my best run on
the shorter map. This will be our final dual, and it will determine, once and for
all, which of us is driving faster. [Music] Thanks to the brake, the AI is
now slightly faster than before. But for some reason, it doesn't drift. That's
a surprising choice. Just out of curiosity, I also tried to train it with the brake on
the first track, and the result is the same: it's faster, but it doesn't drift. Though I'm
not sure drifting is useful on the first track, I'm pretty confident it saves time on the other
maps. And yet, the AI chose not to drift. Actually I found a few rare cases where the AI does some
kind of drift. It appears to deliberately clip the road corners to unbalance the car, and
initiate a drift. Definitely not the most straightforward approach ! It's not so surprising
to see the AI struggling. The road is very narrow, and the many curves prevent high speeds. And
in Trackmania, there is usually only one way out in such cases: by using a trick known as the
neo-drift. By applying this precise sequence of actions correctly, it's possible to trigger a
drift, even at relatively low speeds. It's the kind of trick that's pretty hard to discover
on your own. I think the AI must have stumbled across it at least once during its many hours
of trial and error, but it probably didn't have enough driving experience to explore further
in that direction. Such trick won't bring any advantage if it's not properly mastered. So, can
an AI learn to neo-drift from scratch ? Probably, but maybe we could also help it a little. from now
on, the AI will get a big reward bonus whenever it's drifting. To detect if the car is drifting,
I'm looking at this input. When it's high enough, it basically means that the car is pointing
in a different direction from where it's actually going, which usually means that it's
drifting. Ok, let's restart the training. [Music] Well, I guess the AI outsmarted me ! Apparently,
it found a way to constantly trigger the reward bonus, just by spamming these weird action
patterns at low speed. Ok new rule: now the reward bonus only applies when the car speed is
high enough. Let's restart the training again. [Music] Now this looks good. The AI has
clearly mastered the neo-drift. So well in fact, that it even chains multiple
drifts in straight lines, to get more rewards. Obviously, in terms of speed alone, its current
strategy isn't effective. So let's continue the training without the bonus. Now that it discovered
how to drift, the AI shouldn't forget it. [Music] [Music] Over the next hours, the AI
learned to drift more wisely, only where it saves time. Its
driving looks cleaner than ever, and it destroyed its previous record on the
endurance map. It's now 16 seconds ahead of me. I think it's time to answer our question: aside
from its endurance skills, is the AI still faster on a shorter map ? To answer that, I made the
AI drive this final level many times and I selected its best run. This one. This run is the
culmination of a three-year journey to create an AI that can beat me in Trackmania. But I also did
my best to drive a challenging run myself, putting into practice my years of experience in this
game. Now it's time to find out who is faster. [Music] [Music] [Music] After showing that it was more precise,
more consistent, the AI again outpaced me in this final test. And it seems time
to accept that I can't beat it anymore. But is this AI truly unbeatable ? On this last
level, I'm sure not. Some of its lines are still quite far from optimal, and I'm certain that
many better players than me could easily drive a faster time. But on the first two levels, I'd
be quite surprised if anyone can beat the AI. If you want to try it for yourself, the game is
free and these tracks can be downloaded online. Of course, all these maps are quite simple,
and we've only just scratched the surface of this complex and beautiful game. Now
that the AI is working pretty well, it deserves to be challenged a little more. But
that's for another time. This video already took way too long to make. If you'd like to see more,
and if you want to support this YouTube channel, I just opened a Patreon page to which you
can subscribe. This would help me to continue spending time on this project in the future,
and to make more videos. I promise, I will not make it another year and a half before the next
one. But I still need a few days break before ! [Music]