AI Learns to Park - Deep Reinforcement Learning

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Video Description for more detail:

"An AI learns to park a car in a parking lot in a 3D physics simulation. The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm.

The input of the Neural Network are the readings of eight depth sensors, the cars current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force (continuous values). These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly.

The AI is rewarded with small positive signals for getting closer to the parking spot, which is outlined in red, and gets a larger reward when it actually reaches the parking spot and stops there. The final reward for reaching the parking spot is dependent on how parallel the car stops in relation to the actual parking position. If the car stops in a 90° angle to the actual parking direction for instance, the AI will only be rewarded a very small amount, relative to the amount it would get for stopping completely parallel to the actual direction. The AI is penalized with a negative reward signal, when it either drives further away from the parking spot or if it crashes into any obstacles."

👍︎︎ 41 👤︎︎ u/SamuelArzt 📅︎︎ Sep 05 2019 🗫︎ replies

This is extraordinarily cool. Unlike most videos on the internet, I actually watched this in its entirety!

👍︎︎ 15 👤︎︎ u/rhacer 📅︎︎ Sep 05 2019 🗫︎ replies

i love stuff like this. How much time did the 320K + attempts take?

👍︎︎ 14 👤︎︎ u/Xbotr 📅︎︎ Sep 05 2019 🗫︎ replies

I’d love to see a reward timeline chart showing the density of success vs failure over the course of attempts. Do you have the data to make something like this?

👍︎︎ 4 👤︎︎ u/gtzpower 📅︎︎ Sep 05 2019 🗫︎ replies

Nice job! I am working on an ml-agents demo myself, started out by wanting some cubes to play soccer against each other. But I had to dumb it down to a simple task as shoot the ball with the right force and they fail to learn that even after multiple runs.

I confirmed they are training except tensor board is showing me graphs with just a bunch of spikes at the reward graph. As if they learn then suddenly forget and have to start over. Haven’t been able to figure it out yet, perhaps you have some tips for me? And if possible, I would be very interested in your agent code

👍︎︎ 5 👤︎︎ u/Munsis 📅︎︎ Sep 05 2019 🗫︎ replies

I think it didn't have enough data. It would make more sense for the inputs to be wider, even if they were less precise. For example a depth camera that calculates the distance to the closest object within -30 +30 degrees.

Could you make the same experiment but with virtual bumpers - colliders at 1, 2 and 3 meters in front of the car? This might be much more efficient with lower resolution but higher quality.

👍︎︎ 4 👤︎︎ u/[deleted] 📅︎︎ Sep 05 2019 🗫︎ replies

I like how at 1:43 it was like "you know what? Screw this!"

👍︎︎ 3 👤︎︎ u/[deleted] 📅︎︎ Sep 05 2019 🗫︎ replies

It parks about as well as the people in my apartment complex at 30k.

👍︎︎ 3 👤︎︎ u/Mike312 📅︎︎ Sep 05 2019 🗫︎ replies

Fantabulous! I messed around with ML-Agents and I could see some progress on certain tasks but it always seemed like it was taking too long :D I'm happy to see someone else pull off such an incredible feat. Makes me want to get back to it because it was a load of fun but I just have too much on my plate.

👍︎︎ 2 👤︎︎ u/OldNewbProg 📅︎︎ Sep 05 2019 🗫︎ replies

Captions

[Music] you [Music] [Music] [Music] [Music] [Music] [Music] [Music] you [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music]

Info

Channel: Samuel Arzt

Views: 2,310,389

Rating: undefined out of 5

Keywords: AI, Neural Networks, Deep Learning, Reinforcement Learning, Deep Reinforcement Learning, Car, Simulation, 3D, Unity, Unity3D, madewithunity, Machine Learning, ML, ANNs, ML-Agents, Unity ML-Agents, PPO, Proximal Policy Optimization, RL, Parking

Id: VMp6pq6_QjI

Channel Id: undefined

Length: 11min 5sec (665 seconds)

Published: Fri Aug 23 2019