Reddit Comments

It surprises me that even after being able to regularly reach the end with 0% exploration it still has a high rate of failure.

👍︎︎ 44 👤︎︎ u/UnoriginalStanger 📅︎︎ Mar 12 2022 🗫︎ replies

Super cool, I love to see ML in video games. I’m still not totally convinced he’s not overfitting on the original map, as the test map at the end has a really high failure rate.

I think some type of cross-validation would be great. I.e., the AI’s training should be evaluated and chosen based on segments of the course they haven’t seen yet.

I understand though that he has practical limitations. Looks like the training for this video took 2-300 hours.

👍︎︎ 85 👤︎︎ u/CostlyOpportunities 📅︎︎ Mar 12 2022 🗫︎ replies

Really interesting video. Shows how far AI has to go still. Really requires a person constantly changing the weighting, disturbance and reward parameters to get a usable solution. Even then it seems to often converge on a poor solution. It still seems like a brute force approach.

👍︎︎ 111 👤︎︎ u/DexicJ 📅︎︎ Mar 12 2022 🗫︎ replies

Is there a method where you can "teach" the A.I. to drive? Like doing some runs yourself and then feeding those runs into the neural network with heavy weights. I feel like this is how humans and other animals learn so fast, by observing others and adapting similar behaviors, so they don't have to go through trial and error several thousands of times.

Also, I was surprised how the A.I. didn't learn, after thousands of runs, that a long straight = go straight. I don't know anything about A.I. but I feel like that should be something it should be able to learn fairly quickly. Maybe training it on technique first would help? Such as, give it a thousand runs on a long straight, give it a thousand runs on some simple left/right turns, then maybe on some s-turns and u-turns. Get to the point where it recognizes these types of turns and then put it on a unique track where it has to apply it's learnings.

I guess that is kinda what you did with the random start point, but I think because the track continued after each segment, the A.I. generalized the run rather than learned important information from a single segment.... Like, in general you will get more reward for going back and forth slowly than straight off a cliff. But if it can recognize when it doesn't need to apply the general rule, then it could gain a lot of speed.

👍︎︎ 9 👤︎︎ u/kdramaaccount 📅︎︎ Mar 12 2022 🗫︎ replies

Wasn't this done before in Trackmania? Still love the video, but I swear I've seen this concept before.

👍︎︎ 5 👤︎︎ u/MisterSnippy 📅︎︎ Mar 13 2022 🗫︎ replies

What did he use to get information from the game? Is it a mod?

👍︎︎ 3 👤︎︎ u/Spam_The_Chat 📅︎︎ Mar 13 2022 🗫︎ replies

So is the AI getting better at following a track, or is it just getting better at following this track? Will it have to start from scratch again if it was given a new track?

👍︎︎ 3 👤︎︎ u/Supertrinko 📅︎︎ Mar 13 2022 🗫︎ replies

Not gonna lie, it would take me the same amount of time to finish this track, so not bad for a machine I guess.
Interesting video, by the way!

👍︎︎ 1 👤︎︎ u/Francisjohnson84 📅︎︎ Mar 14 2022 🗫︎ replies
Each of these cars is controlled by an Artificial  Intelligence (AI) in the racing game Trackmania.   This AI is not very intelligent yet.. But that's  normal : it has just started to learn. In fact,   I want to use a method called Reinforcement  Learning to make this AI learn by itself   how to drive as fast as possible. I also  want it to become intelligent enough to   master various combinations of turns without  ever falling off the road. And to ensure this,   the AI will have to pass a final challenge :  to complete this giant track. But first of all,   how is a simple computer program supposed to learn  things ? It's not the first time I'm experimenting   with AI in Trackmania. And to achieve this,  i'm using a method called Machine Learning.   First, I'm running a program that controls the car  in-game to make it turn and accelerate. So the AI   can choose between 6 different actions. But how  can it decide which action to take ? The AI needs   to get information about the game. It receives  that in the form of numbers called inputs.   Some inputs describe the state of the car,  such as its current speed and acceleration.   Others indicate how the car is positioned  on the road section it's currently crossing.   And the last inputs indicate what's further ahead.  This is now what the AI sees when playing. But how   can it interpret that ? It needs to sort of use  this data in an intelligent way. To link inputs   to the desired action, the AI is going to use a  neural network, which basically acts like a brain.   Now, all that remains is to parameterize the  neural network so that it results in fast   driving. And that's where Machine Learning comes  into play. As I said earlier, the objective here   is that the AI learns to drive by itself. So it  will have to experiment with different strategies,   through trial and error, to progressively select  the neural network that leads to the best driving.   One way to do this would be  to use a genetic algorithm.   I've already tried that in Trackmania and it works  fairly well. Basically, the idea is to start with   a population of several AIs, each with its own  neural network. All AIs compete on the same map,   and the best ones are selected and reassembled  through a process similar to Natural Selection.   This can be repeated for many generations,  to get a better and better neural network.   One problem with this method is that you only  compare the different AIs based on their end   result. To make an AI progress, it might be  better to give it feedback on what it did   well or not so well during the race. So it's time  to try something else : Reinforcement Learning.   And this goes with a crucial  idea : the concept of reward. This time, the AI has only one goal in  mind : to get as many rewards as possible.   The idea of reinforcement learning is to learn  to pick the action that brings the most reward,   in any situation. In fact, this is quite like a  pet being trained, which will interpret pleasure   or food intake as a positive reinforcement. But in  Trackmania, there is no food. So how can we define   rewards ? the AI can take 10 actions per second.  Each action will be associated with a reward equal   to the distance traveled up to the next action. So  the faster the AI goes, the more rewards it gets.   If the AI ever tries to go the wrong  way, it will receive a punishment,   which is actually just a negative reward. And if  the AI falls off the road, it will be directly   punished by a zero reward, but also indirectly by  stopping the race. Which means no more rewards.   Now, it's time to start training. To learn  which inputs and actions lead to which reward,   the AI must first gather information about  the game. This is the exploration phase. the   AI simply takes random actions and doesn't use its  neural network for the moment. The runs are driven   one by one. And after a thousand of them, here  is what the AI has explored of the map so far.   Each line corresponds to one race trajectory. the  AI has already collected plenty of data about the   rewards it can expect to get for various sets of  inputs and actions. Now, it's time to use this   data to train its neural network. This is the role  of the reinforcement learning algorithm. There   are many different variants of this method and  here I chose to use one called Deep Q Learning. Basically, for a given set of inputs, the role  of the neural network is to predict the expected   reward for each possible action. But which reward  are we talking about ? is it an immediate one ? In   Trackmania, although some actions may result  in an immediate positive reward, they may have   negative consequences in the long run. Sometimes,  it may be useful to sacrifice short-term incomes,   for example by slowing down when approaching a  turn, in order to gain more long-term reward.   the AI therefore needs to consider the long-term  consequences of each action. To achieve this,   the AI tries to imagine the cumulative reward  that it's most likely to obtain in the future.   Although the long term is important, an action  still has more impact in the short term. Thus,   the events in the immediate future are weighted  more. So each time the AI gets inputs, its neural   network tries to predict the expected cumulative  reward for each possible action. And the AI just   selects the one with the highest value. Let's  resume training where we left off. In parallel to   driving, the AI is continuously trying to improve  its neural network with the data it collects.   But by only doing random exploration, the AI  ends up not having much new to learn. Instead   of just exploring, it's time for the AI to also  start exploiting the knowledge it has acquired,   meaning using its neural network instead of  just acting randomly. the AI is still a bit   immature though, to only rely on its neural  network. If it does too much exploitation, it   will just experience the same things over and over  again, which will not teach it much. For now, I'm   setting the proportion of exploration at 90%, and  I'll decrease it progressively during training. After more than 20 000 attempts on this map, here  is the best run the AI has done so far. The AI   drives quite carefully, and it's not too bad for  a start ! It has definitely learned something.   Going further into the map, it  seems a bit more complicated,   and the AI ends up falling.  Time to get back to training ! At this point, you might think  that the AI hasn't learned much,   after training on the same map for  so many hours. But I think it's quite   normal. Reinforcement learning is known to  require a large number of iterations to work.   The time displayed here is in-game time.  Fortunately, training is faster in practice,   since I can increase the game speed using a tool  called TMInterface. This project would probably   not have been possible without this tool,  so a big thanks to Donadigo, its developer. The AI has made some nice progress. The  driving style it learned in the first turns   seems to apply well to the following  ones, which shows a good capacity   of generalization. The AI has now reached a 5%  exploration, which I will not decrease further. It seems that the AI is stuck and can no longer  progress. Here is its current personal best.   In the first part of the map, the  AI shows very little hesitation.   This first portion has a lot of turns and  short straights. But then the AI arrives   in a new section with mainly long straight  lines. Its driving becomes a little sketchy.   At one point, it even stops, as if it's afraid to  continue. After a long minute, it finally decides   to continue, and dies. The AI seems to have  difficulty adapting to this new type of road.   Or maybe it just needs more time. To be sure,  I decided to push the training a little longer. After 10 000 more attempts, the AI hasn't made  much progress. It still has a lot of trouble with   long straight lines. There may be several reasons  for this, but I think the main one is overfitting,   which is common in machine learning. In the  exploration phase, the AI practiced the same first   few turns over and over again. Its neural network  became a specialist of this kind of trajectories,   learning them almost by heart, as if nothing else  existed. But when the AI faces a new situation,   the driving style it learned in the past is no  longer appropriate : it needs to adapt. In a way,   adapting means questioning everything it  has learned in the past. If the AI tries to   drastically change its strategy to adapt to this  new roads, it risks to break everything that was   working for the first few turns. When there  is overfitting, there is no generalization.   So what's the solution ? Maybe the AI could drive  each run on a different map, to constantly learn   new things. But at this point, I really don't want  to spend hours building dozens of different maps.   So, I'm gonna do things differently. I'm going  to restart training from the beginning. But now,   each time the AI will start a new run,  it will spawn at a random location on   the map, with a random speed and a random  orientation. This should limit overfitting,   since the AI will be forced to consider many  different situations from the beginning.   This time, the AI is learning way faster. However,  perhaps the AI managed to cover long distances   just because it spawned in easy sections of the  map. The real challenge is still to complete   the track from start to finish. From now on, I  will regularly test the AI outside of training,   on a normal race. Outside of training,  I remove any exploration to optimize   the AI's performance. I also increase the  action frequency from 10 to 30 per second. The AI is able to drive in all sections  of the map, so there is clearly less   overfitting this time ! Now, the AI only  has to combine everything in one run. In this attempt, the AI manages to surpass its  previous record, going further than ever. But   it fails within 500 meters of the finish. It  has never been so close to finish this map.   And finally, a few attempts later, and after  53 hours of training, AI gets this run. The AI was able to complete 230 turns  without ever falling. Sounds good, but   is the AI fast ? Now, it's  my turn to drive, to compare. After a few attempts, I made a  run of 4 minutes and 44 seconds.   Without using the brake of course, for a fair  comparison. So yeah, the AI is not very fast. But   training is not over ! Now, the AI has one  goal : to finish this map as fast as possible. 6 minutes and 28 seconds. After this run, I  continued training, and the AI kept getting   slightly faster on average, more consistent too,  but it never managed to beat its personal best.   With this version of its neural network, the  AI drives quite aggressively, and takes most   turns very sharply. It's quite surprising to see  it survived the whole race with such a driving   style. But it's the best the AI has found. Perhaps  there is still a way to improve the AI's record   one last time, still with the same neural network.  If I randomly force some actions of the AI at the   beginning, here, the AI will have to adapt to  this small perturbation. And this is the start   of a completely different run. Now, I can repeat  this a few hundred times to see what happens. And Here is the final improvement of  AI's record. Not a big improvement,   but it was visually worth it ! There is still  a big gap with human performance, but I'm still   very happy with the result. Trackmania is a game  that requires a lot of practice, even for humans,   and from my experience I'm pretty sure this  AI could beat a good amount of beginners.   If there's anything AI is doing well, it's  generalization. It can adapt to any new map with a   similar road structure. I even tried to change the  road surface to see if it could drive on grass,   and AI is doing quite well ! Same thing on dirt,  even though the AI has never experienced these   surfaces during training. But can it still  survive on a new map, with a mix of road   dirt and grass surfaces, and  a few slopes and obstacles ? So yeah of course there is room to improve  this AI. But with reinforcement learning,   it seems that the main limitation is always the  same : training time. Even with a tool to increase   game speed. That's why I never venture into  more complex maps, and that's why I try to   limit any complexity in general : few inputs,  no breaks, not too many actions per second,   and so on. Anyway for now, the AI has deserved  to rest after those long hours of training.   And maybe it will be back  one day, with new surprises !
