AI Learns to Park - Deep Reinforcement Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Video Description for more detail:

"An AI learns to park a car in a parking lot in a 3D physics simulation. The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm.

The input of the Neural Network are the readings of eight depth sensors, the cars current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force (continuous values). These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly.

The AI is rewarded with small positive signals for getting closer to the parking spot, which is outlined in red, and gets a larger reward when it actually reaches the parking spot and stops there. The final reward for reaching the parking spot is dependent on how parallel the car stops in relation to the actual parking position. If the car stops in a 90ยฐ angle to the actual parking direction for instance, the AI will only be rewarded a very small amount, relative to the amount it would get for stopping completely parallel to the actual direction. The AI is penalized with a negative reward signal, when it either drives further away from the parking spot or if it crashes into any obstacles."

๐Ÿ‘๏ธŽ︎ 41 ๐Ÿ‘ค๏ธŽ︎ u/SamuelArzt ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

This is extraordinarily cool. Unlike most videos on the internet, I actually watched this in its entirety!

๐Ÿ‘๏ธŽ︎ 15 ๐Ÿ‘ค๏ธŽ︎ u/rhacer ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

i love stuff like this. How much time did the 320K + attempts take?

๐Ÿ‘๏ธŽ︎ 14 ๐Ÿ‘ค๏ธŽ︎ u/Xbotr ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

Iโ€™d love to see a reward timeline chart showing the density of success vs failure over the course of attempts. Do you have the data to make something like this?

๐Ÿ‘๏ธŽ︎ 4 ๐Ÿ‘ค๏ธŽ︎ u/gtzpower ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

Nice job! I am working on an ml-agents demo myself, started out by wanting some cubes to play soccer against each other. But I had to dumb it down to a simple task as shoot the ball with the right force and they fail to learn that even after multiple runs.

I confirmed they are training except tensor board is showing me graphs with just a bunch of spikes at the reward graph. As if they learn then suddenly forget and have to start over. Havenโ€™t been able to figure it out yet, perhaps you have some tips for me? And if possible, I would be very interested in your agent code

๐Ÿ‘๏ธŽ︎ 5 ๐Ÿ‘ค๏ธŽ︎ u/Munsis ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

I think it didn't have enough data. It would make more sense for the inputs to be wider, even if they were less precise. For example a depth camera that calculates the distance to the closest object within -30 +30 degrees.

Could you make the same experiment but with virtual bumpers - colliders at 1, 2 and 3 meters in front of the car? This might be much more efficient with lower resolution but higher quality.

๐Ÿ‘๏ธŽ︎ 4 ๐Ÿ‘ค๏ธŽ︎ u/[deleted] ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

I like how at 1:43 it was like "you know what? Screw this!"

๐Ÿ‘๏ธŽ︎ 3 ๐Ÿ‘ค๏ธŽ︎ u/[deleted] ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

It parks about as well as the people in my apartment complex at 30k.

๐Ÿ‘๏ธŽ︎ 3 ๐Ÿ‘ค๏ธŽ︎ u/Mike312 ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies

Fantabulous! I messed around with ML-Agents and I could see some progress on certain tasks but it always seemed like it was taking too long :D I'm happy to see someone else pull off such an incredible feat. Makes me want to get back to it because it was a load of fun but I just have too much on my plate.

๐Ÿ‘๏ธŽ︎ 2 ๐Ÿ‘ค๏ธŽ︎ u/OldNewbProg ๐Ÿ“…๏ธŽ︎ Sep 05 2019 ๐Ÿ—ซ︎ replies
Captions
[Music] you [Music] [Music] [Music] [Music] [Music] [Music] [Music] you [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music]
Info
Channel: Samuel Arzt
Views: 2,310,389
Rating: undefined out of 5
Keywords: AI, Neural Networks, Deep Learning, Reinforcement Learning, Deep Reinforcement Learning, Car, Simulation, 3D, Unity, Unity3D, madewithunity, Machine Learning, ML, ANNs, ML-Agents, Unity ML-Agents, PPO, Proximal Policy Optimization, RL, Parking
Id: VMp6pq6_QjI
Channel Id: undefined
Length: 11min 5sec (665 seconds)
Published: Fri Aug 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.