How to use Machine Learning AI in Unity! (ML-Agents)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello there i'm your code monkey and let's learn how to use machine learning and ml agents in unity this is a very powerful toolkit that lets you create some extremely intelligent ai it helps you solve tons of problems that would simply be impossible to solve while using classic app there's immense massive potential in this toolkit so you should know how to use it so you know how it can help you and when to apply it this is a long video but it's the only video you're going to need in order to learn how to get started working with machine learning in unity we're going to start completely from scratch and go through the entire installation process then learn how to use it by setting up a scene to train an ai using reinforcement learning and finally look at the results to see the ai in action using our trained brain model so make sure you watch the video until the end to understand the whole process this video is meant to help you get started and after watching it go check out the playlist link in the description where i will be adding videos covering interesting use cases made with machine learning in unity for example i'm currently working on a specific match 3 use case and many other different ones so stay tuned for that now the way machine learning works in unity is through the ml agent's toolkit which combines several tools first you have the ml agents python package which runs the machine learning algorithm then you have your learning environment which is your unity scene with the game running and then you have the ml agent c sharp package which lets you define the data that you feed into the algorithm as well as using the resulting brain so let's go through that whole process starting from scratch first over here is the github page for the ml agents package there's a link in the description you can find tons of documentation here so definitely give it a look you have a quick readme talking about how the whole thing works all the features the release the documentations and so on you've got the docs folder where you have all the documentation so tons of topics on installation getting started making some environments and so on and it also has lots of awesome examples which you can browse around to see how they work now the first thing we need to do is actually install python and as of the time of this recording the recommend python version is either 3.6 or 3.7 so over here on the python website i'm going to go ahead and download 3.7.9 again if you're watching this in the future make sure you check the official docs to see which version you should install so go ahead just download and install it after installing python open up the command prompt so just click on the start button and type cmd so here it is and now there is actually one quirky thing about windows 10 which is in theory you should be able to run python by just typing in python however if you do here on windows 10 it opens up the microsoft store instead of actually running python so if you see this behavior the solution is to instead of python just type py so over here instead of python just py and i hit enter and there you go now i'm inside python and over here you can verify that first of all python is running and you can verify you have the correct version which in this case 3.7.9 okay so far so good now let's just exit out of python okay back in the command line now the next step is we need to change the directory to go to our unity project so over here is the unity project i'm going to use so just go ahead copy the entire path and on the command prompt just change directory onto that directory okay now in here what we're going to do is create a python virtual environment this will help us by keeping all of our projects separate so each virtual environment is completely separate from the others meaning that we can have multiple projects in the same machine each of them using their own python packages and they will not cause conflicts with each other so again first go into your unity project directory and then in here we're going to type the command python-m means we're going to run a module and one we want to run is called vnv to create a virtual environment and then afterwards this requires a folder name where the environment won't be created so just keep things nice and organized and give it the exact same name so just vnv so this will create the virtual environment inside a folder named vnv now if you're on linux or mac the commands are slightly different so check the official docs and again like i said previously if you have the issue with python not running when you type python then here instead of python just type py mvnv vnv so go ahead hit on enter and yep now it's creating the virtual environment all right it's done and you can verify that it worked by opening up your file explorer and yep over here there's a folder called vnv and over there we have our virtual environment over here you see some folders and you see this one with a bunch of scripts and here we see a activate script this is how we're actually going to activate the virtual environment so back in the command line we go inside the vnv folder we access the scripts folder and then run the activate so when you do yep the command prompt changes over there it says vnv so we are now inside the python virtual environment so any changes that you make here will not impact any other python projects you have in your machine like for example any other unt projects with other python libraries now before we install our python packages let's make sure our installer is updated the python packaging seller is named pip so in order to make sure that it's using the latest version let's run the command python-m we're going to do a pip install dash dash upgrade and we're going to want to upgrade our pip package so just go ahead hit on enter and you have now it's successfully installed the latest pip package okay so far so good now we can begin installing our packages and the first one we're going to need is a package called pytorch this is an open source library for performing computations using data flow graphs so it's the underlying representation of the deep learning models for that let's run the command pip install and here we need a specific version so here it is in song torch version 1.7 and download it from this website now if you have issues or you're watching this many months in the future check the github installation docs to see which version to use so you go ahead run this and now we wait for it to complete okay pytorch is now installed next up we install the ml agents package so we just do pip install ml agents however just running like this may give you some compatibility errors so let's try it and see and up here we see an error where we have the incorrect numpy version so it's incompatible so if you didn't get that error then it's fine just keep going but if you did get the error just like i did then the solution is to use a different package resolver so do pip install ml agents and then dash dash use the feature and the feature is the 2020 resolver so if you use the new resolver and run if there you go now you can see it's uninstalling the incorrect version and selling the correct one alright so now we have all of our correct version and we can verify that the ml agents package was correctly installed by running the command go into mlagents-learn and then use the command-help and hit enter and yep if everything went correctly then you should be able to see the helm files for the ml agents learned package so here it is everything went correctly so everything is installed correctly and as of the time of this recording i'm using ml legends release 10 with the python package 0.22 so again if you have any issues or you're watching this many months in the future check the official docs for any version changes okay so far so good and with that the python side is all installed correctly now there's actually one more optional step here if you look in the console you might be seeing some warning messages now these are not directly related to the mlegens package but whether it's due to one of the dependencies so if you're watching this in the future with another ml engine's release it might not show these warnings something saying it could not load a dynamic library with the name cuda rt64 underscore 101 so if you see something like that it's telling you that it cannot find the cooldown libraries now this is optional everything will run just fine without them so if you have no gpu you can skip this step and it will use your cpu instead of your gpu but if you do have an nvidia gpu you can optionally install cuda so if you see that message pay attention to the name of the missing library so in my case i was seeing a missing library that ends with underscore 101.dll so that means that it requires cuda version 10.1 so just go into nvidia's website and download cuda however again pay attention to the version as of the time of this recording the latest cuda version is actually version 11 however the missing library is version 10. so when you go into the download page don't download version 11. instead go into the archive in this case we're looking for version 10.1 so go ahead and download that one after installing it if you once again run ml agent stash learn dash dash help if you run that again you should be able to see that the warnings are gone so it now finds the cooldown libraries however you might be seeing another warning which again check the library name so you might be seeing another missing library named could nn64 underscore seven so this is the cuda deep neural network library so once again just go into nvidia's website and search for coo dnn so over here just go ahead and download it but again pay attention to the library name again in my case it's missing the code dnn64 underscore 7 meaning that it uses version 7 and again the latest one is actually version eight so when you download it make sure you download the correct version seven when you download it you get a zip file and inside you see a cuda folder and a bunch of files so in order to install it you just go into your cuda installation folder so in my case i put it on default so on program files and nvidia gpu computing toolkit then inside you see the cuda folder and inside we see we go into version 10.1 and then over here we see our various folders so just go ahead and copy all these so the include folder the lib and the bin so just take these and drag them all in there and after doing that you can verify by going inside the bin library and over here you should be able to find all the dlls so in my case the cuda rt64101 and the other one is the could nn 64 underscore seven and now if we run ml agent learn help now you should be able to see everything run without any warnings so we're here the command ran and nope no warnings okay so far so good so with this we have all the setup for the python side including the optional codon libraries now let's go into our unity project over here i have a project that's pretty much just brand new so just a simple demo i have prepared the ml engines package works with any unity version starting from 2018.4 now i want to make sure that this video stays relevant for as long as possible so in this project i'm currently using 2020.2 but everything works exactly the same if you're using 2019.4 or if you're in the future using the 2020 lts version so to install the ml agents package just go ahead open up the package manager then here select the packages and make sure you go into the unt registry and in here just scroll down and find the ml agents package here you can see the latest stable package which is the time this recording is version 1.0.6 and again i want this video to be relevant for a long time so i'm going to instead install the latest preview package so for that i'm going to click on the gear icon go into the advanced project settings and in here i'm going to enable the preview packages and yep i understand so over here on ml legends i can expand it see the other versions and here i see the latest preview package which at the time this recording is 1.6 but again if you're in the production stage of your development and you want maximum stability then go with the stable unity lts version as well as a stable ml agents package so select your choice and go ahead and click on install okay it's done and you can verify that everything installed correctly by just creating an empty game object and over here if you go into add component you should be able to now see a group for the ml agents and over here the various scripts alright so we have everything correctly installed now over here for testing i have this demo which is pretty much taken from the official examples it's just a nice character and over here is the goal so the objective is to teach this character to move towards the goal and not fall off the map so let's see how we actually use our ml agents now for that first we need to create an agent an agent is what's going to run our ai both for training and then for playing and in order to make an agent we just make a normal c sharp script so just right click over here going to create a new c sharp script and let's name this the move to goal agent and go ahead open up the script and now in here we need to go up here to add using go inside unity and let's use the unity ml agent then over here let's get rid of the default methods we don't need them right now and instead of inheriting from modern behavior we're going to inherit from the agent class so over here you can right click on the agent and go into the definition and here we see the definition of that class so as you can see we have a whole bunch of methods all of them related to machine learning now the way the agent learns is through reinforcement learning so it's based on a relatively simple loop of observation where the agent gathers data from its environment then it makes a decision based on the data that has and then it takes an action and if it does the right action then it gets a reward so this is a continuous cycle where the agent grows to learn based on its observations and what actions lead to the highest rewards okay so let's see how to implement this cycle again here is the agent class and we're going to need to override two functions so we're going to need to override this one collect observations in order to give the agent some observations and then we're also going to need to override this one which receives a buffer with all of our actions okay so let's go back into our script so our move to goal agent and first let's look at how the ai takes actions so we're going to do a public override and we're going to override the on action received which takes an action buffer so this buffer then contains our actions as either floats or ends now one thing to keep in mind is that the machine learning algorithm only works with numbers meaning that it doesn't have an understanding of what exactly is a player object or what it means to move to the right all it knows is numbers it's easier to understand this if we see it in action so for now let's go back into the editor and over here let's select our agent so i have my nice agent in here and i'm just going to drag the move to goal agent and attach it in there and yep here's our move to go agent script and when we added this it also added the behavior parameter script these are the various parameters that our ai uses first we have the behavior name so let's rename this to move to go so give it a proper name to this agent and then over here let's look at the vector action so let's learn what all of these mean first of all you've got the space type so in here you can choose between the script and continuous now essentially discrete are whole numbers so you can have 0 1 2 3 and so on and continuous are floats so going between minus 1 to plus 1 and all the numbers in between so 0.2.3 minus 0.4 and so on we're going to do a quick test to see these differences in a bit let's just learn about the other parameters so first if you select continuous over here you see the space size and this is how many actions you won't get on the vector so for example if you put a 2 here then in the code if we inspect our action buffers we see that this contains two action segments one for the continuous actions and one for discrete actions so the action segment here is essentially an array so when you set the space size you are defining the size for the array of the type that you select so if you set a space size of two then this will have two positions with two values a value on index zero and another one on index one and then if you choose discrete the size is the same so it's how many values you get on that array but then you also have the second parameter which is the maximum value for this branch so like i said discrete means integers or whole numbers so for example if you put a 1 in here then you won't get an action value of just zero however if you put a two then you're going to get an action value of either zero or one and if you put a five then you can get a zero one two three or four and each branch can have its own size so for example if you're making a car ai you would make the first branch refer to accelerating and breaking so you would put it with two values and then for the second branch let's say it would represent turning so you would put three values one for turning left turning right and don't turn okay now before we go further and look into how to define training let's just test these actions to get a better understanding for how all of this works so first let's put a discrete with just one branch keep it simple and let's put it with a size of five and now over here in the code we simply do a debug.log we go inside the action buffers then we access our discrete actions and let's just print out what's on index 0. since we have just one branch that means we have one value on this array so that value is on index zero okay now before we can test this ai we need to add one more thing over here in our agent let's add the component go into ml agents and we're going to add a decision requester like i said previously the way reinforcement learning works is through a cycle of observation decision action and reward so in order to take an action we need to first request the decision and what this script does is simply request a decision every certain amount of time and then takes actions now there are other ways of requesting decisions but for now let's just use this simple script okay so over here we are ready to begin training and run our test to see what the ai will output over here so for that let's go back into our command prompt and here make sure you are inside the virtual environment and in order to train it it's very easy we just run the command ml agents dash alert so just hit enter and yep we see a nice ascii unity logo and a message telling us that we can start training by pressing the play button so just do that just in here press the play button and yep we now have our training running so we can check in the command prompt yep we have everything running so it listens that and it's running our training and over here we can check on the console and now we can verify and see what the actual action vector contains so let's hit on collapse and over here we can see that we put just one branch with a branch size of five so over here we do have values going from zero one two three and four so we have five values zero to four so this is what it means to have a discrete vector with a branch size of five now let's test with the continuous type to see what actions we get so here let's just swap it from discrete into continuous and with space size of just one then here in the code it's pretty much the same the only difference is we access the action segment for the continuous actions instead of discrete and the position is the same on index zero and now we want to run this test so here in the command line first of all we can see that the previous test worked correctly so we have our training we have our model and so on and here if we run the exact same command ml agent learn if we run it like this there it is we get an error and the error is because we're trying to run training again on using the same default id so over here we have two options we can call our ml agents learn with the force tag so this will override the previous data or we can specify a different id name so let's try doing that so dash dash run dash id equals and then something name so say test2 so now if we go ahead hit on enter and yep we have it we are listening on the port so just start training so just hit it and let's see and yep we have training running and now we can see what this one does so we can see what a continuous action looks like so over here we are getting values pretty much between minus one and plus one and everything in between all right so now you should have a better understanding of exactly how the actions work so discrete is integers and in continuous we've got floats from -1 to plus one so as you can see these are really just numbers so it's up to you to decide what they represent now let's look at another part of the reinforcement learning cycle let's look at observations so back in the code here the way we collect observations is by overriding a function so we just do public override and we're going to override this one collect observations which takes a vector sensor and as soon as you do up here it will add the using unity ml agent sensors so this is where this sensor exists so we have our collect observations function and now observations are how the agent observes its environment so think of it kind of like the inputs for the ai and obviously this will differ based on what problem you're trying to solve so essentially you need to think about what data does the ai need in order to solve the problem you're giving it now our goal in this example is we have a character and we have a goal and we want to move the character towards the goal so if you think about it if you are controlling the player so what information do you need well first of all obviously you need to know where you are so we should pass in the player position and then you also need to know where the target is so we also need to pass in that position so over here in the script how we pass that into the ai is very simple we just go into the sensor and call the function add observation and over here first let's pass in the transform.position so the player position so with this the ai will have the data for the player position and then let's also pass in the target position so up here let's just add a serialized film for a reference so a transform for the target transform so back in the editor we have our film let's just drag the gold transform on there and in here we do the same thing sensor add an observation and pass in the target transform position all right so with these two positions the ai should have enough data taken from observations of its environment in order to be able to complete its task so we're passing in these two observations and back in here let's now look at the vector observation parameters so we see a space size so this is how many inputs we're going to give it and back in our code you might think that we're sending in two inputs however we're only sending in two positions and you have to remember that a position is really a vector three which is composed of three loads for the x y and z so for each of these two positions each of them is passing in three values so with two positions we're actually passing in six values or six floats so in here for the space size of the observation we're going to set it to six then the other parameter is the sect vectors so this is for more advanced use cases where you need the ai to have some sort of memory so if you set it to one then it just takes one observation grabs all of its six values and makes its decision and if you set this to two then it takes one observation and also the last one and uses both of those to make its decision so for example if you pass a stacked vector of more than one and you use the position as the observation then the ai could then infer the direction of the object but like i said that's for more advanced use cases so here let's keep it simple and just put it at one alright so with this we have our observations taken care of and we already saw how the actions work so now let's actually use those actions again the goal in this test is to move the character towards the goal so for that let's set the action space into continuous so we have floats and let's set it to receive two so we're going to receive one for the x movement of our character and another one for the z movement so back here in our code let's grab our actions we're going to define the first position as the x so we float move x we go into our actions in this case we're using continuous actions so you grab that one on index zero this will be our move x and then the other one on index one is for the move z so again like i said the ai only works with numbers so these aren't just floats and it's up to you to define what they represent so here i am saying that the first float on index zero refers to the move x and the one on index one refers to the moveset so we have this and then let's just do a very basic transform so just transform move the position just increase it let's make a new three with the move x with a zero on the y we don't want to move on the y and it moves z then we multiply this by time dot delta time and then by a certain move speed so here we fold for move speed and for now let's leave it just at one okay so with this very basic logic the ai should be able to move the character now once again let's go back to the reinforcement learning cycle we've taken care of the observation decision and action now all that's left is to add a reward our goal here is to have the character hit the target and our character here has a rigid body as well as a box highlighter and then on the target itself it also has a collider with set to trigger so we can easily test for this collision so on the agent here we just add a very basic private void we add a on trigger enter and when we enter the trigger then we have our goal now there's two ways that we can give a reward we can call the function set reward so this one sets the reward to a specific amount and then you also have the other one which is add reward which increments the current reward so for example when making an ai card driver you would increment on every checkpoint you hit but over here we just have a single goal so using set reward is perfect so just call set reward and set it to let's say 1f now the specific value that you choose here doesn't really matter so it can be 1 or 10 or 0.3 or pretty much anything it only matters relative to your other rewards like for example when we hit a wall we should give a large penalty okay so with this we are setting the reward when we hit the collider now another thing about how ml agents works is the concept of episodes so one episode is essentially one run and the episode should end when the character either achieves the final goal or loses so in here after setting the reward let's end our episode so we just call the function and episode so this will end the episode and then when the episode ends the game doesn't actually quit but we need some way of resetting the state so we can train again so for that we can override another one so a public override void and we're going to override the function on episode begin so this one is called as soon as the episode begins and here we can reset everything back to normal now in this very simple example we just need to reset the character position back into its starting state which for now for the simple demo i have here the starting state just on zero zero zero now later on we're going to add some randomness but for now let's just keep it simple and reset it back into the exact same point so here just transform that position and put it on vector 3.0 so this will correctly reset the state so that it can train again okay so over here we have almost everything ready to train the last thing we need is just a penalty so here in order to make our training more effective let's add some collectors on the edges so we can give it a negative reward and then end the episode so let's just make a new 3d cue let's name this the wall and let's just put it on the edges okay so here i added some walls just some basic colliders and let's also make it as triggers and now we just need to identify if the player collides with either the goal or the wall so for that let's just make some basic tag components so one for the goal and another one for the wall and just add the empty component just to serve as tags so the wall and the goal so now here when we have the untrigger enter we can go into the other and try get component first of all try get the goal so if it does have a goal then we're going to give a positive reward and end the episode and then we check if it has a wall instead if so then we're going to give a negative reward and also in the episode alright so that's it everything should be almost done now before we actually start training the first thing we should do is validate to make sure that everything is indeed working so for testing there's another thing we can do which is we can drive the actions ourselves so let's override another function so we're going to override this one it's called heuristic and takes and actions out for the action buffers and now here we can essentially modify the actions that will then be received by this function so in this case we are using continuous actions so we go into the actions out and we access the continuous actions this is of type action segment float so we get those and then we can easily modify them so in this case let's use the input to move the character with the arrow keys so we just modify these values so first one on z we've got the move x so let's go into the input to get the axis raw for the horizontal and then the vertical okay so that's it so this is just for testing and now back in the editor we select our agent and over here we have a film for the behavior type so we have default heuristic and inference now in this case we can manually set it to heuristic only which will force it to use heuristics or you can leave it as default and as long as you don't have python with ml agents running and you have no model selected it will automatically use heuristics so if we do like this and we run here is the game running and if i use the mouse keys yep now i can move the character let's just increase the speed by a tiny bit okay i have up the speed now let's make sure that everything is working so first of all movement is working so we are correctly passing in the actions and mapping those actions into movement next let's try hitting the wall so go up there hit a wall and if there you go it does happen so it ends the episode as you can see it reset back into zero and now if we hit the goal yep it also happens alright so here we can verify that everything is perfectly working and we have everything ready for training now in order to train is the exact same thing that we saw previously just in here make sure that the behavior type is set back into default and then open up the command prompt and here let's run the same thing that we saw previously so let's give it a different id and let's say this is test three so hit on enter and yep now it's listening so start training so just press on the playing and if there it is there we have our agent and it's correctly working so you can see now it is indeed going through the training process so it's trying all kinds of values until it finds something that might give it a positive reward now all we have to do is wait however there's one thing that we can do to massively speed up training and let's also solve one potential issue that might happen so if the issue is that if the ai never touches the goal then it might simply learn to avoid the walls and just stay in place forever so we can fix that to make sure that doesn't happen by setting a max step so here on the agent we can see a field for the max step now a step is kind of like an update on the training by default it runs 50 times per second exactly the same as the physics update so here let's give it a max step of something like a thousand just to make sure that the episode ends and doesn't run forever okay so that's one problem solved and here let's just visually hide the walls just so it looks a bit better now in order to speed up training it's very simple we can just use more than one agent so let's take all of our training environment here and put in an actual object so just a container let's name it our environment and let's just drag our entire environment inside of there and then we take this and let's just drag it onto our project files in order to make it into a prefab so we have our prefab and now we simply copy paste this several times so just duplicate put one there another one there and now again you can put as many as you want in order to train quite a bit faster than just one at once all right so there it is here we have 20 environments all of them correctly for training now there's one very important thing when using this method here we are duplicating and moving our environments so if you take this approach in order to speed up training you need to make sure that all of your logic works based on low composition and not on global position so for example this character here is indeed on local position of 0 but it's on a global position of 13 so if you reset it back into a level position of 0 then it's going to go back in there and not where it should actually go to so here on our logic we're using position and let's just replace all of instances of position within said low composition okay everything shouldn't be working and now here just to make this easier to visualize i'm going to add something so on the script i'm going to add two more fields so just some references to a wind material in loose material and the floor mesh renderer this just so we can visualize the training obviously this is not necessary so just go down here when we have our wind let's set the floor mesh rendering material into the wind material and when we lose let's set it to the unloose material so back in the editor let's open up the prefab select the agent and here we have our fields let's pass in the field for the platform and just the win and the illness material again this is just for visual just to make it easier to see the training happening on the video it's obviously not necessary to actually train the engine okay now before we start mass testing let's make sure everything is working so once again validated just with heuristic only and let's see here's all our agents and yep they all move and it works good and if we go towards the wall yep it turns into red so we can easily visualize that the training failed and in there any open turns too green okay so the logic is working and we can visualize the training now we're ready to do some mass training just go into your agent and make sure that the behavior type is at the default and now with our command prompt let's just run our ml agents learn and for the run id and let's give it a proper id so let's name it move to go okay so just run it and it's ready so just hit the play button and here we can see all the agents happening we see some reds some greens and yep it's actually learning quite quickly so you see some reds happening and now it's really just mostly green so over time the agent is learning and it's constantly getting better and better and with this very simple example after just a tiny bit yep everything is working and we can see pretty much all of them all in green so here we have an ai that correctly learned how to move towards the target goal okay so that's awesome now let's just stop training so just stop the editor and over here in the command prompt you can see it saved the model and the brain is this dot on x file and you can see that copy the results to results move to go move to goal and we have the brain so open up the file explorer and go into your project folder and in here go inside the results in this case we have the move to goal and here we have the move to go on dot onyx this is our brain so just go ahead copy this paste it onto our normal assets and up here we can see the move to goal so we have our nice brain so this is our neural network model and now in order to use this brain let's just select our environment so for now let's disable all the others just to see this one in action so select the agent and just click and drag and assign our neural network model and then on the behavior type we can leave it as default or you can directly set it to inference only inference means it uses the brain model rather than training okay so let's test like this and we should be able to see our character using this brain to achieve the goal and if there it is we have our character correctly using our brain to achieve our goal alright so congratulations you've just trained your very first machine learning ai awesome now the real challenge in machine learning is how to do training effectively so there's the design of your training scenario which matters a lot so for example over here we test an extremely simple possible setting so we're just getting a character to move from here all the way to here so that's what the eon learned however if i now take this goal and i just move it down here and yep there you go all of a sudden the character does not know what to do with the way that we set up our training our ai only learned that it moves to the right and gets a reward so by moving the transform it didn't actually learn how to go into the actual goal position so it's a very simple example of the eye doesn't know what to do since it wasn't trained for a moving goal so this is why when training usually you want to add some randomness to prevent the ai from being trained on just one very specific scenario so there's a lot that you can do in order to define a proper training scenario and then there's also tons of parameters that you can play around with the parameters for the algorithm are stored in a configuration file so if you go into the github page onto the docs for the learning environment create the new and you scroll all the way down here we can see the format for the training yaml file so here i'm just going to go ahead and copy all this then onto the project folder let's make a new folder keep things nice and organized name it config and now in here let's create just a brand new text object name it move to goal.yml then just open this with notepad or any text editor and here just pass in those parameters now here i will not go into too much detail onto every single one of these parameters if you want you can go into the github docs to see what each one does so here there's pretty much just one thing that we need to change which is over here this name here is the name of the brain that we want to train so here in our agent we give it the behavior name move to goal so that's what we need to add so here instead of rollerball let's use that name okay so go ahead save that file so here it is on the config folder move to gold.yml and now once you have this file you can run training using these parameters so just open up the command editor and we're going to run the ml agents dash learn and then we pass in the config so it's on config and then we have in this case the move to goal dot yaml and then let's give it a run id let's name it test parameters and now it's the same as previously so just click on enter and now it's ready to run so here on the engine let's set it back into the default so that it runs training and run it and yep now the agent is training and it's training using those custom parameters again like i said go check out that page to see what they all do now with this one more thing we need to learn is how do we improve upon a model so previously we made this model which works pretty well the character goes there and it goes towards the target but as we saw if we suddenly move the goal and all of a sudden the character completely fails so we can take this model and improve upon it so first of all let's improve the actual training scenario so let's add some randomness to both the start position of the goal as well as the character so over here when we have the on episode begin let's take the transform local position and add some randomness so new factor 3 random.range and let's see the random values so here is the agent on local position of 0 so let's go from that one so on minus three and let's go up to maybe plus one so for the x from minus three f to plus one f then for the y let's leave it at zero and then for the z let's see so let's go from -2 all the way to plus two okay so we have the character on a random position and then let's also move the transform target so here let's take the goal and let's see the randomness first of all on the x let's start on this one so on 2.4 and we're going to random up to that so between 2.4 and 5. so here between two point four f and five f and then for these end let's start from all the way down there so from minus two to plus two okay so with this every time we start a new episode we're going to select different random positions so this will enable the model to actually learn how to go towards the target rather than just a specific position now once again before we do anything let's validate to make sure that everything is working so in here let's choose heuristic only and yep it's spawned on a random position and now if i end and if it's on a different position different different and so on okay so both the character and the goal they're both on random positions okay so now let's run training and improve upon the previous model so for that let's run the same thing so the ml agents learn we pass in the config and then the way that we learn from a previous brain is dash dash initialize from and then we pass in the run id that we previously used which was moved to goal so it's going to load up that brain and then let's give it another id so dash dash run dash id equals move to goal 2 okay so let's press enter now it's ready to learn and in here let's just enable all the other environments and make sure that this one is set into custom so it learns and lets it on play and up here we have the training at work and you can see that they are indeed going into random positions and we've got some reds and some greens and yep it seems to be working all right now there's one last thing related to machine learning which is a nice visualization so let's look at that while our train is running so for that open up a brand new command prompt and here let's go into the project folder then once again go inside the virtual environment so vnv script activate so we are inside the virtual environment and now in here let's run the command tensorboard so tensorboard is the name of the utility that visualize our results and then we pass in the folder with our results which by default is named results so dash dash then log here and log there is results so click on enter and yep now we see this message so tensorboard is running on this url so localhost on port 6006. so then just open up a browser and go into localhost 6006. and yep here it is and we can now visualize everything so most importantly over here we see our cumulative reward so we gave it a goal of one when it hits the target so we should be able to see this constantly increasing as the brain becomes a lot better then the episode length is also going down meaning that the ai is learning how to get to the goal faster and on the command prompt you can see every time it updates so right now it's updating the graph on every 10 000 steps so just click on refresh and over there we see this is the one that we're currently running and as you can see it started off in there and it raised all the way up there if we look into our unity build we can see that it is indeed working so we've got pretty much a sea of green so even with the random positions it seems that our ai has already learned how to go towards the target then down here you can also visualize the policy so these are all the inner workings of it so you've got tons of things like the beta the entropy you've got the reward estimate and so on so here you have tons of graphs for you to analyze and improve your training and your ai now back in here we can see that the training went very well so we can just stop training and once again we see that we save the model onto that position so here let's go into results this is the move to go to let's copy the brain paste it onto our assets and now here we do the same thing to use the spring so let's hide the other environments and just leave on this one and select it set it to use that brain and set it to inference and if we run and if there it is we can verify that our training went indeed very well so even with random positions the ai is smart enough to actually know that the goal is not just to move to the right rather to move towards the goal so here we have fully trained our ai from scratch without giving it any specific commands again remember how all we did was we gave it the current position and the target position we did not tell it how to move we did not tell it what move means we did not tell it any of that so the aion learned to take those values and learn what it needed to do in order to gain a reward so that almost feels like magic that is the awesome power of machine learning alright so now you know everything in order to get started working with machine learning and ml agents in unity machine learning is some really exciting tech with tons of potential applications so definitely stay tuned for some more awesome videos you can explore the official examples which have tons of awesome use cases if you have a specific scenario you'd like to see let me know in the comments also i'm currently working on a match 3 use case so definitely stay tuned for that and like i said there's a playlist in the description that i won't keep updated as i explore ml agents more and more so if you're watching this in the future check that link to see all the videos alright so this video was a ton of work to make but i really hope you learned a lot if you did please hit the like button and consider subscribing this video is made possible thanks to these awesome supporters go to patreon.com unitycodemonkey to get some perks and help keep the videos free for everyone as always post any questions you have in the comments and i'll see you next time you
Info
Channel: Code Monkey
Views: 105,518
Rating: 4.9664159 out of 5
Keywords: unity machine learning, ml agents unity, game machine learning, unity ai, machine learning, ml agents tutorial, ml agents, code monkey, ml-agents unity, ml-agents, machine learning ppo, python, tensorflow, unity 3d, unity, neural network, deep learning, artificial intelligence, game development, game dev, game development unity, game dev machine learning, game dev ai, how to unity machine learning, how to unity mlagents
Id: zPFU30tbyKs
Channel Id: undefined
Length: 44min 50sec (2690 seconds)
Published: Sun Nov 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.