Controlling Drones with AI (Python Reinforcement Learning Quadcopter)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello in this video I create an AI that Pilots 2D quadcopters this is a terrible IDE since it's already hard for a human to control these fully manually quadcopter control is a very unstable environment and there already are very robust mathematical methods of piloting drones but I'm still gonna do it welcome to controlling drones with AI quadcopters are very agile and manned vehicles that use four propellers that turn really fast and therefore push against Veer and create lift in 2D there are only two actions that matter when controlling a quadcopter the left amplitude and the lift difference the lift amplitude is the power of both rotors we use this to fight gravity and control altitude the lift difference is the difference between the powers of both rotors if the left rotor turns faster than right one the Drone will tilt right and move to the right from this study we can derive these physics equations and create an environment where we can control a drone using the keyboard the goal in this environment is popping as many balloons as possible within the time limit before we use AI models on this task we should probably look at the current state of the yard when it comes to drone control currently most drones use control Fury to move to a waypoint control Fury is a field of mathematics that uses the error signal between the drone's position and the target position to derive drone actions great tool in control theory is the PID controller where we send the weighted sum of the proportional derivative and integral of the error signal as an output you using pids I can send the position error to derive optimal angle and speed and then derive the rotors Powers using this key this is already pretty good the Drone follows waypoints in a robust and careful Manner and never crashes the good part is that this method only takes a few minutes of tuning to work the bad part is that it's not fast enough since the PID values are constants the Drone always has the same strategy regardless of how far the target is this results in not tilting the Drone enough when the target is far away that's where AI comes in in this part we'll use a subfield of AI called reinforcement learning reinforcement learning is when an agent learns by observing the state of the environments taking an action and observing the reward it gets after training a lot the agent knows which actions result in the best rewards and chooses accordingly but it's not that simple to make sure that the agent learns the task we can't just let him play the game like a human we have to ease the training as much as possible through and environment shaping environment shaping comes in three steps action shaping we choose which actions the agent can take the actions should be complete enough so the agent has full capability but they also shouldn't be too complex to use here the action space will be two floats one for the first amplitude and one for the thrust difference observation shaping we also choose what the agent Sees at each step of the game if we ask the agent to learn on images of the game it will be very difficult for it as it will have to understand from the image his position and attitude and the target position instead we only give it the necessary float values he needs to play distance and angle to Target attitude and velocity reward shaping choosing when to reward and punish the agent is also important we want to help the agent understand winning behaviors early on but we don't want to bias him either first we should punish the agent heavily if it crashes as not crashing will be the first lesson to learn we can also reward it for surviving then we should punish the agent for being far from the targets so it naturally tries to reach targets finally we should reward it when it reaches a Target as this is the goal of the game we can finally start training in the beginning the agent tries random actions hoping it scores some points unfortunately crashing means a 1 000 points punishment it quickly learns that staying alive is better than dying and finally learns to hover it then learns to reach one or two balloons but at a very slow pace and finally after many many episodes it finally learns to chase multiple targets at a fast rate after all this training it's finally time to see which is better human control theory or reinforcement learning foreign [Music] as we can see the aerial agents absolutely obliterates both me and the control Fury agents which is quite impressive especially considering how unstable the environment is anyways if you want to know more I am linking in the description the GitHub repo I even wrote a paper about the project but since it's not peer-reviewed take everything in it with a huge grain of salt okay bye

Info

Channel: Alexandre Sajus

Views: 25,634

Rating: undefined out of 5

Keywords:

Id: J1hv0MJghag

Channel Id: undefined

Length: 5min 0sec (300 seconds)

Published: Wed Feb 01 2023