Training an AI for WIPEOUT (MLAgents Unity Reinforcement Learning)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in this video I told you you're an AI to train it for Total Wipeout the goal is simple we have three obstacles the balls the sweeper and the swing the AI will have to learn from its mistakes to control his 40 muscles in a perfect manner a bit like this hello I am under the water welcome to training an AI for Total Wipeout foreign so how do you teach an AI to walk run and jump obstacles this video is once again focused on reinforcement learning a topic I already covered in my drone video the basics of reinforcement learning is that an agent looks at his environment for sensors takes actions and gets a reward we Define for him the agent then trains on the memory of this sensor action reward triplets to find better actions reinforcement learning only works for careful environment shaping we have to carefully Define and iterate on what the agent sees what he can do and how he reward it the agent should have all the necessary information to succeed in the environment but not too many as it would confuse him here we give him his body positions and we draw Rays from his eyes to the ground to help them know where the obstacles are the agents also should have enough action space to succeed at his task so we give him control over the 40 muscles on his body positive and negative reward are given to the agents to explain the objective but also give some pointers on which behaviors help towards the objective here we'll start by giving the agent rewards for moving forward proportional to his speed after defining all these we can finally train to excel the training we train on these eight parallel environments in the beginning the agents try random actions hoping to reach Rewards training on such a complex task is a long process so I'm gonna go to sleep and let him train for another 8 hours [Music] today I woke up with great hope the training went well the rewards increased a lot but when I tested the agent this happened the agents learned that the best behavior was going off course and jumping as fast as he could into the abyss so why did this happen in AI this is called a local Optimum when a neural network converges to an optimal behavior that is not the best behavior since the reward function is currently all about speed he understood that going on those balls is hard and slows him down instead jumping here is way faster and gets him more rewards to fix this we have to do an iteration of environment shaping action and input shaping are generally less Resort since the neurode network input and output sizes are fixed changing them means either retraining the agent from scratch or doing some complicated brain surgery an easy way to fix local Optima is to change the rewards here we'll punish the agent from going off course as we think that staying in the center lane is the way to go now we wait for another night of sleep this morning is a bit better our agent does stay on course this time and even passes a few balls but it still Falls before completing the course when observing him it seems that he gets a bit to ambition with speed and loses balance at the last obstacles we need to slow him down so we punish him for going too fast then another night of sleep and finally we have an agent that completes the course a few more nights of training for the other obstacles and here are the results [Music] [Music] foreign [Music] [Music] [Music] [Music] thank you okay bye
Info
Channel: Alexandre Sajus
Views: 2,122
Rating: undefined out of 5
Keywords:
Id: _YXOLM2a41Q
Channel Id: undefined
Length: 5min 11sec (311 seconds)
Published: Mon Jun 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.