Watch this A.I. learn to fly like Ironman

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign [Music] it's been a while since I've made a coding video I haven't told you guys this but the reason is because my last coding video depleted so much of my brain cells that I took a whole year just to recover I mean look at this how do you even code that like the only way to code that is by sacrificing your hairline plus half of your sanity so that's right for the past year I've been meditating in the rainforests of Bhutan to allow my brain cells to regenerate and the videos in between are actually not me they are AI deep face that I bought from some guy on 4chan to make sure I had some videos prepared in advance God those the face cost me my left kidney I've definitely not broken even with the ad Revenue yeah that was a mistake anyways I've started coding a bit again recently I learned some machine learning stuff you know out of my intrinsic passion for computer science obviously and definitely not some other reason but here's the thing everyone starts off with digit recognition when they do machine learning but that's kind of boring for a YouTube video like this is what it would look like a 90 accuracy so we all know how a video about that would end up so we compute the partial derivative for the weights and biases starting at the loss function and then using the chain rule to back propagate through the hidden layer hey are you still listening stop falling asleep this is important I'm going to back propagate the gradient yeah not very interesting so I thought you know why not make an AI that interacts with a game-like environment where it flies around or something Iron Man scared you'd actually beat him so let's just jump into it of course we got to start by finding a good Iron Man model and here's one on sketchfab by XX boom XX it is a hundred and forty two thousand triangles my computer's gonna explode it's all good I got a plan look we open blender right and we use a tool called decimate what this tool does is it basically Thanos snaps triangles out of existence on any 3D model you know to save everyone's computers so uh check this out look at the number of triangles over there and perfectly balanced as all things should be let's check out the effect oh dear anyways now we can start working on the Ragdoll in unity we just attach some joints and collision and rigid bodies and uh oh nope that's not it uh that that's not it either I really should have researched how to do this huh oh okay I think we got it we got it let's see how it damn I got moves and now we add propellers to its arms and legs so that it can fly [Music] okay now let me explain how this AI is gonna work so in machine learning there's three main types supervised learning unsupervised learning and reinforcement learning supervised learning is when you give the neural network labeled data like pictures of digits along with what that digit actually is then when the network sees enough examples with the correct answer attached it learns to be able to find the correct answer itself for an unknown example that doesn't have an attached label now unsupervised learning is when you give the network a bunch of data that doesn't have labels and you're like okay don't worry about figuring out what each thing actually is just group the ones that look similar together and then once you do that we can get a human to manually label the whole group at once but for video games and stuff we need to use the third type which is reinforcement learning which is where an agent in this case the Iron Man learns what actions to take in an environment There Is No Label or correct answer like supervised learning in this case I cannot show the AI how to fly around because I don't know how to troll it I got dropped on the head as a child and now I have a 0.6 KD in fortnite you think I have the dexterity to control four limbs in eight different directions to fly perfectly no shot so reinforcement learning is kind of like telling the network look I don't know how to do the thing but you try to do the thing and if you succeed I'll give you a reward of like five dollars so basically like a father that failed that life and pushes his kid way too hard in an attempt to live out his dreams through his child that got depressing now reinforcement learning comes in a lot of different flavors just look at this whole classification chocolate God damn but for purposes like what we're trying to do it's generally agreed upon that PPO or proximal policy optimization is the best one now I'm not going to explain PPO because look at all these equations you guys want to sit through me explaining all these equations I thought not so yeah I'm gonna skip the explanation for the viewers and definitely not because like I don't understand it either what a dumb idea you use PPO we can just use an ancient technique known colloquially as other people's code like Unity Unity provides PPO so thanks Unity okay one last thing we need to cover before training the AI I know I'm dragging this out pretty long but I promise this is important we need to first get the inputs right what do I mean by that well a neural network basically takes an input of a bunch of numbers and it gives an output of a bunch of numbers the numbers that we're going to give the network are here I've listed 33 of them but we're only going to use the 4 verse 27 notice how we're not using Euler angles which is the typical you know degrees around each axis type of rotation this is because Euler angles are fine until you reach a full revolution of 360 degrees and then it suddenly resets back to zero degrees this means it's discontinuous and the neural network is going to freak out when you give it Euler angles well what about quaternions are they continuous I don't have enough brain cells to comprehend how quaternions work but I searched it up on Google and someone said they're not continuous so I'm just going to take their word for I don't know so yeah instead we take the up Vector and the forward Vector of the object and any combination of these two vectors is always going to give us a unique rotation it does mean that instead of three numbers for Euler rotations or four numbers for quaternions we need six numbers just for one rotation but better safe than sorry I guess now you'll notice here instead of just the position of the Iron Man we have relative positions to these three targets we're only going to use one of them so we'll have 27 inputs in our 33 but basically moving around this target is how we'll get it to fly around so first we have to make sure that it can at least hover around a fixed Target so we'll Define the training reward system like this for 20 seconds every frame the Iron Man will measure its distance from the Target and then add a reward of negative that distance so if it perfectly hovers at the Target by the end of the 20 seconds it will have a reward of zero why am I only giving negative reward and not positive reward well I guess I'm an Asian parent all right time to set up the training ground and run the machine learning hmm what are we going to get for the average reward like negative 5 000 negative 10 000 I'm guessing oh oh well this is going to take a while isn't it [Music] okay we're getting somewhere the average award is actually like pretty good now seems like they're pretty good at staying around a single point in space let's zoom in on one single example I don't know why it's spinning so much but I'm not going to question it uh yeah well then we can check off our first Milestone staying at Point a we're making progress and then now we have to move on to our second milestone which is moving from point A to point B now we change the training environment to be this whole area and then every time the training restarts the Iron Man's position gets reset and the target gets moved to a random new position this way it's incentivized to fly towards the Target because then the distance is smaller and the reward that it gets won't be so negative well let's have a guess if we just use the current brain of the Iron Man that we've trained to hover around a single point and then we put it into this new training situation where that point moves around different locations is it gonna be able to adapt well let's see oh [Music] [Applause] nope it cannot oh geez it is really bad at this oh this learning's gonna take a while you see being an Asian parent is not that easy we're back to like a negative 500 000 reward God damn it all right okay it seems like the rewards are getting pretty good again now uh let's see how it's doing oh hey let's go it's doing a thing it's moving to the to the Target [Music] and again and again and uh yeah this is actually pretty reliable [Music] I hope it failed one time but I'll let that slide I do notice that whenever it starts above the target it always dips down below it first and then comes back up which is quite interesting so we do have a third stage to the training process up till now we've been resetting the Ironman to the middle every time because it kept losing its balance and falling down into the void so we had to reset it but as it gets better at flying I'm gonna stop resetting it so that it learns to deal with its existing momentum as it switches targets so yeah that's the third Milestone it's a relatively minor one I don't even know if it's going to make that big of a difference okay the reward did dip a little bit as I added the change and then came back up so I guess it helps a little bit but yeah as time to let our AI fly a full path and see what it can do so it's time for map design and map design obviously just means going to the unity asset store clicking free and then downloading a bunch of random models so here's a relatively entertaining part of this whole process just me dragging garbage mismatched assets around so you hacked together something that kind of resembles an obstacle course visually I think this time lapse is kind of fun to watch so I'll just take this as an opportunity to take a break from the actual topic of the video and go on a tangent I mean gosh I guess the brain can only take so much neural network work machine learning algorithm training reward system knowledge before it needs a break I'm trying to make this video longer well one for the ad Revenue obviously but also to fight against the shortening average attention span nowadays basically ever since Tick Tock got popular I mean short videos are fine but nothing pisses me off more than when they put a clip of like GTA 5 drifting on the bottom or Subway Surfers or I don't know scraping paint from a bucket or some other like slime Montage just so people keep watching like if you've got some Joe Rogan highlight that's cool but I'm not a three-year-old I don't need big colorful visual movement happening on half the screen to keep me entertained I feel like this sort of practice and short film videos is literally accelerating the progress of humanity backwards we're going backwards we can't even look at a 10 second video without Minecraft Parkour footage on it ah yeah anyways ran over and time lapse over two and now I'm going to play footage of the AI flying and let me just preface this by saying it's really bad at flying it is so slow when I first started on this video idea I was expecting it to be able to zip around like in the movies but now I'm just happy if it doesn't fall off the map and it does fall off the map that's the thing like matter of fact it's so bad at flying that afterwards I decided to train another agent a quadcopter to see if it can do any better so yeah stick around to the end to see how that compares but I'll just let it play I might speed up some of it so it doesn't take up an entire 10 minutes of the video but here we go [Music] [Music] thank you foreign [Music] [Music] [Music] thank you thank you [Music] [Music] [Music] foreign [Music] [Music] [Music] [Music] foreign [Music] [Music] [Music] [Music] [Music] [Music] foreign [Music] [Music] [Music] [Music] so with that in mind I had to think is this Iron Man model just a terrible design for flying around would a quadcopter be any better so I designed a quadcopter simulation and trained it the exact same way that I did for the Iron Man here's how that went [Music] oh [Music] foreign [Music] well it seems I'm just bad at reinforcement learning but hey I guess it's a learning experience it feels like with each video I make I try to tackle a new topic and with each new topic I just get barely good enough at it to scrape together a video if I wanted to perfectly succeed at what I'm doing in each video then I guess I'd have to upload once a year so maybe that's reasonable but anyway AI is hard I'll catch you in the next one [Music]
Info
Channel: Gonkee
Views: 362,955
Rating: undefined out of 5
Keywords:
Id: Zi677y6bg2Y
Channel Id: undefined
Length: 17min 55sec (1075 seconds)
Published: Fri Mar 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.