Each of these cars is controlled by an Artificial
Intelligence (AI) in the racing game Trackmania. This AI is not very intelligent yet.. But that's
normal : it has just started to learn. In fact, I want to use a method called Reinforcement
Learning to make this AI learn by itself how to drive as fast as possible. I also
want it to become intelligent enough to master various combinations of turns without
ever falling off the road. And to ensure this, the AI will have to pass a final challenge :
to complete this giant track. But first of all, how is a simple computer program supposed to learn
things ? It's not the first time I'm experimenting with AI in Trackmania. And to achieve this,
i'm using a method called Machine Learning. First, I'm running a program that controls the car
in-game to make it turn and accelerate. So the AI can choose between 6 different actions. But how
can it decide which action to take ? The AI needs to get information about the game. It receives
that in the form of numbers called inputs. Some inputs describe the state of the car,
such as its current speed and acceleration. Others indicate how the car is positioned
on the road section it's currently crossing. And the last inputs indicate what's further ahead.
This is now what the AI sees when playing. But how can it interpret that ? It needs to sort of use
this data in an intelligent way. To link inputs to the desired action, the AI is going to use a
neural network, which basically acts like a brain. Now, all that remains is to parameterize the
neural network so that it results in fast driving. And that's where Machine Learning comes
into play. As I said earlier, the objective here is that the AI learns to drive by itself. So it
will have to experiment with different strategies, through trial and error, to progressively select
the neural network that leads to the best driving. One way to do this would be
to use a genetic algorithm. I've already tried that in Trackmania and it works
fairly well. Basically, the idea is to start with a population of several AIs, each with its own
neural network. All AIs compete on the same map, and the best ones are selected and reassembled
through a process similar to Natural Selection. This can be repeated for many generations,
to get a better and better neural network. One problem with this method is that you only
compare the different AIs based on their end result. To make an AI progress, it might be
better to give it feedback on what it did well or not so well during the race. So it's time
to try something else : Reinforcement Learning. And this goes with a crucial
idea : the concept of reward. This time, the AI has only one goal in
mind : to get as many rewards as possible. The idea of reinforcement learning is to learn
to pick the action that brings the most reward, in any situation. In fact, this is quite like a
pet being trained, which will interpret pleasure or food intake as a positive reinforcement. But in
Trackmania, there is no food. So how can we define rewards ? the AI can take 10 actions per second.
Each action will be associated with a reward equal to the distance traveled up to the next action. So
the faster the AI goes, the more rewards it gets. If the AI ever tries to go the wrong
way, it will receive a punishment, which is actually just a negative reward. And if
the AI falls off the road, it will be directly punished by a zero reward, but also indirectly by
stopping the race. Which means no more rewards. Now, it's time to start training. To learn
which inputs and actions lead to which reward, the AI must first gather information about
the game. This is the exploration phase. the AI simply takes random actions and doesn't use its
neural network for the moment. The runs are driven one by one. And after a thousand of them, here
is what the AI has explored of the map so far. Each line corresponds to one race trajectory. the
AI has already collected plenty of data about the rewards it can expect to get for various sets of
inputs and actions. Now, it's time to use this data to train its neural network. This is the role
of the reinforcement learning algorithm. There are many different variants of this method and
here I chose to use one called Deep Q Learning. Basically, for a given set of inputs, the role
of the neural network is to predict the expected reward for each possible action. But which reward
are we talking about ? is it an immediate one ? In Trackmania, although some actions may result
in an immediate positive reward, they may have negative consequences in the long run. Sometimes,
it may be useful to sacrifice short-term incomes, for example by slowing down when approaching a
turn, in order to gain more long-term reward. the AI therefore needs to consider the long-term
consequences of each action. To achieve this, the AI tries to imagine the cumulative reward
that it's most likely to obtain in the future. Although the long term is important, an action
still has more impact in the short term. Thus, the events in the immediate future are weighted
more. So each time the AI gets inputs, its neural network tries to predict the expected cumulative
reward for each possible action. And the AI just selects the one with the highest value. Let's
resume training where we left off. In parallel to driving, the AI is continuously trying to improve
its neural network with the data it collects. But by only doing random exploration, the AI
ends up not having much new to learn. Instead of just exploring, it's time for the AI to also
start exploiting the knowledge it has acquired, meaning using its neural network instead of
just acting randomly. the AI is still a bit immature though, to only rely on its neural
network. If it does too much exploitation, it will just experience the same things over and over
again, which will not teach it much. For now, I'm setting the proportion of exploration at 90%, and
I'll decrease it progressively during training. After more than 20 000 attempts on this map, here
is the best run the AI has done so far. The AI drives quite carefully, and it's not too bad for
a start ! It has definitely learned something. Going further into the map, it
seems a bit more complicated, and the AI ends up falling.
Time to get back to training ! At this point, you might think
that the AI hasn't learned much, after training on the same map for
so many hours. But I think it's quite normal. Reinforcement learning is known to
require a large number of iterations to work. The time displayed here is in-game time.
Fortunately, training is faster in practice, since I can increase the game speed using a tool
called TMInterface. This project would probably not have been possible without this tool,
so a big thanks to Donadigo, its developer. The AI has made some nice progress. The
driving style it learned in the first turns seems to apply well to the following
ones, which shows a good capacity of generalization. The AI has now reached a 5%
exploration, which I will not decrease further. It seems that the AI is stuck and can no longer
progress. Here is its current personal best. In the first part of the map, the
AI shows very little hesitation. This first portion has a lot of turns and
short straights. But then the AI arrives in a new section with mainly long straight
lines. Its driving becomes a little sketchy. At one point, it even stops, as if it's afraid to
continue. After a long minute, it finally decides to continue, and dies. The AI seems to have
difficulty adapting to this new type of road. Or maybe it just needs more time. To be sure,
I decided to push the training a little longer. After 10 000 more attempts, the AI hasn't made
much progress. It still has a lot of trouble with long straight lines. There may be several reasons
for this, but I think the main one is overfitting, which is common in machine learning. In the
exploration phase, the AI practiced the same first few turns over and over again. Its neural network
became a specialist of this kind of trajectories, learning them almost by heart, as if nothing else
existed. But when the AI faces a new situation, the driving style it learned in the past is no
longer appropriate : it needs to adapt. In a way, adapting means questioning everything it
has learned in the past. If the AI tries to drastically change its strategy to adapt to this
new roads, it risks to break everything that was working for the first few turns. When there
is overfitting, there is no generalization. So what's the solution ? Maybe the AI could drive
each run on a different map, to constantly learn new things. But at this point, I really don't want
to spend hours building dozens of different maps. So, I'm gonna do things differently. I'm going
to restart training from the beginning. But now, each time the AI will start a new run,
it will spawn at a random location on the map, with a random speed and a random
orientation. This should limit overfitting, since the AI will be forced to consider many
different situations from the beginning. This time, the AI is learning way faster. However,
perhaps the AI managed to cover long distances just because it spawned in easy sections of the
map. The real challenge is still to complete the track from start to finish. From now on, I
will regularly test the AI outside of training, on a normal race. Outside of training,
I remove any exploration to optimize the AI's performance. I also increase the
action frequency from 10 to 30 per second. The AI is able to drive in all sections
of the map, so there is clearly less overfitting this time ! Now, the AI only
has to combine everything in one run. In this attempt, the AI manages to surpass its
previous record, going further than ever. But it fails within 500 meters of the finish. It
has never been so close to finish this map. And finally, a few attempts later, and after
53 hours of training, AI gets this run. The AI was able to complete 230 turns
without ever falling. Sounds good, but is the AI fast ? Now, it's
my turn to drive, to compare. After a few attempts, I made a
run of 4 minutes and 44 seconds. Without using the brake of course, for a fair
comparison. So yeah, the AI is not very fast. But training is not over ! Now, the AI has one
goal : to finish this map as fast as possible. 6 minutes and 28 seconds. After this run, I
continued training, and the AI kept getting slightly faster on average, more consistent too,
but it never managed to beat its personal best. With this version of its neural network, the
AI drives quite aggressively, and takes most turns very sharply. It's quite surprising to see
it survived the whole race with such a driving style. But it's the best the AI has found. Perhaps
there is still a way to improve the AI's record one last time, still with the same neural network.
If I randomly force some actions of the AI at the beginning, here, the AI will have to adapt to
this small perturbation. And this is the start of a completely different run. Now, I can repeat
this a few hundred times to see what happens. And Here is the final improvement of
AI's record. Not a big improvement, but it was visually worth it ! There is still
a big gap with human performance, but I'm still very happy with the result. Trackmania is a game
that requires a lot of practice, even for humans, and from my experience I'm pretty sure this
AI could beat a good amount of beginners. If there's anything AI is doing well, it's
generalization. It can adapt to any new map with a similar road structure. I even tried to change the
road surface to see if it could drive on grass, and AI is doing quite well ! Same thing on dirt,
even though the AI has never experienced these surfaces during training. But can it still
survive on a new map, with a mix of road dirt and grass surfaces, and
a few slopes and obstacles ? So yeah of course there is room to improve
this AI. But with reinforcement learning, it seems that the main limitation is always the
same : training time. Even with a tool to increase game speed. That's why I never venture into
more complex maps, and that's why I try to limit any complexity in general : few inputs,
no breaks, not too many actions per second, and so on. Anyway for now, the AI has deserved
to rest after those long hours of training. And maybe it will be back
one day, with new surprises !
It surprises me that even after being able to regularly reach the end with 0% exploration it still has a high rate of failure.
Super cool, I love to see ML in video games. I’m still not totally convinced he’s not overfitting on the original map, as the test map at the end has a really high failure rate.
I think some type of cross-validation would be great. I.e., the AI’s training should be evaluated and chosen based on segments of the course they haven’t seen yet.
I understand though that he has practical limitations. Looks like the training for this video took 2-300 hours.
Really interesting video. Shows how far AI has to go still. Really requires a person constantly changing the weighting, disturbance and reward parameters to get a usable solution. Even then it seems to often converge on a poor solution. It still seems like a brute force approach.
Is there a method where you can "teach" the A.I. to drive? Like doing some runs yourself and then feeding those runs into the neural network with heavy weights. I feel like this is how humans and other animals learn so fast, by observing others and adapting similar behaviors, so they don't have to go through trial and error several thousands of times.
Also, I was surprised how the A.I. didn't learn, after thousands of runs, that a long straight = go straight. I don't know anything about A.I. but I feel like that should be something it should be able to learn fairly quickly. Maybe training it on technique first would help? Such as, give it a thousand runs on a long straight, give it a thousand runs on some simple left/right turns, then maybe on some s-turns and u-turns. Get to the point where it recognizes these types of turns and then put it on a unique track where it has to apply it's learnings.
I guess that is kinda what you did with the random start point, but I think because the track continued after each segment, the A.I. generalized the run rather than learned important information from a single segment.... Like, in general you will get more reward for going back and forth slowly than straight off a cliff. But if it can recognize when it doesn't need to apply the general rule, then it could gain a lot of speed.
Wasn't this done before in Trackmania? Still love the video, but I swear I've seen this concept before.
What did he use to get information from the game? Is it a mod?
So is the AI getting better at following a track, or is it just getting better at following this track? Will it have to start from scratch again if it was given a new track?
Not gonna lie, it would take me the same amount of time to finish this track, so not bad for a machine I guess.
Interesting video, by the way!