How to train simple AIs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
machine learning is a Hot Topic and for good reason it has unlocked the ability for our computers to perform a whole host of tasks that we simply didn't know how to program manually this fascinating discipline is also a lot of fun to try out on your own but can be quite daunting due to its many mathematical and programming prerequisites especially if you want to write everything from scratch yourself a branch of machine learning called reinforcement learning is particularly appealing it promises to teach a program to perform complex tasks by letting it carry out its own experiments in a given environment however this set of techniques requires an excellent understanding of the machine learning principles on which it is based in this video I'm going to present a very simple method which you can do yourself from scratch and which requires much less theoretical knowledge naturally this method is also far less powerful than the current state-ofthe-art but does at least allow you to build up an intuition about the fundamental principles while playing around with some amusing results to give a concrete example of how to use this method I'm going to apply it to a classic case stabilizing an inverted pendulum on a cart this technique is heavily inspired by the famous neat algorithm neuroevolution of augmenting topologies but in a simplified version the principle of this algorithm is to evolve neural networks both in terms of their architecture and their weights the advantage of building the neural network as the training progresses is to start with very few parameters to optimize at a time this aspect being a notorious limitation of evolutionary algorithms the type of networks used is a little different from traditional models based on interconnected layers since unlike the latter their architecture is designed to evolve as training progresses here we use dags directed a cyclic graphs to represent neural networks dags are a kind of graphs in which there are no Cycles networks are made up of three types of nodes inputs whose value comes from a measurement of the environment hidden nodes whose value is not accessed directly but is used for the internal workings of the network and outputs whose value will be used to interact with the environment the first thing to do in order to execute comp network of this type is to determine the order in which to perform the operations to do this we can use topological sorting to order nodes according to their ancestors so that a descendant is always placed after its parents to sort nodes we start by creating a list containing all nodes without incoming connections then as long as this list is not empty we extract the nodes from it one by one and for each one we delete its outgoing connections then add it to the list of sorted nodes if one of The Descendants has no more incoming connections we add it to the list of nodes to be [Music] processed at the end of this process we obtain the list of sorted nodes [Music] next we set the value of the inputs with the data coming from the environment and following the previously defined order we apply the operations one after the other until we reach the outputs for each neuron there are three operations to perform add the bias to the current sum apply the activation function to this result and for each outgoing connection multiply this value by the connection weight and add the result to the sum of the destination neuron in this example the activation function is the identity function for the inputs and hyperbolic tangent for the other neurons The Peculiar nature of this type of network makes it slow to run unlike layered networks which exploit Matrix multiplications that are massively parallelizable now that we've seen how the neural networks used in this approach apprach work let's see how they can be trained to perform the required task training is carried out in successive iterations each of these iterations takes place in three stages evaluation selection and mutation the aim of each iteration is to generate a population n+ one containing agents that perform better than the generation n the first step is to assess the current population on the task in hand to do this we use a fitness function that Associates a score with each agent representing its ability to perform the task finding a good Fitness function can be tricky but it can have drastic consequences on the final result as we'll see later the second step is to select the agents who will participate in the next iteration to do this they are first sorted by score then the top 30% are directly added to the next population unchanged to generate the remaining 70% we draw individuals from the population as many times as necessary each agent has a chance of being selected which is proportional to its score the final step is to evolve the networks so that they produce slightly different and ideally better results each selected Network then has a chance to undergo a mutation which can be a new connction connection a new neuron created by splitting an existing Connection in two or a modification of an existing weight it is also possible for the network to remain unchanged we've seen how the algorithm works as a whole now let's apply it to a concrete case balancing an inverted [Music] pendulum for this problem the environment is very simple consisting of a fixed length rail on which a cart can move with a pendulum connect Ed to it here are the network inputs and outputs we'll be using as you can see all the network controls is the speed of the cart on the horizontal axis the activation function used is reu for the hidden neurons and hyperbolic tangent for the output for this first test we'll use a very simple Fitness function giving the network one Point per second when it manages to keep the end of the pendulum above a threshold this threshold is equal to the length of the pendulum minus a small margin to train the network I'll use 1,000 agents who will have 100 seconds at each iteration to achieve the best possible score in this case 100 points to speed up the process the execution of the iterations is multi-threaded allowing the agents to experiment in parallel I was actually surprised to find that it was quite fun to try and balance it myself albeit rather complicated I must admit I practiced a little so I wouldn't look too ridiculous let's take a look at training with this Fitness [Music] function we can already take a look at where our network stand after a few seconds of training at the 10th iteration and here are all the active agents trying their best to maximize their score let's get back to training and follow the agent's progress which so far has been quite rapid with a score of over 99 let's see how the network performs it's not too bad but there's a strong oscillation maybe a better solution will emerge after a few more [Music] iterations unfortunately it's even worse as I said earlier the fitness function plays a fundamental role in the quality of the final result let's try to modify it a little to force networks to find more stable Solutions for this second trial the fitness function will be quite similar to the First with the score still increasing if the pendulum is kept above the threshold but this time the score will be divided by the angular velocity of the pendulum we can already see that progress is a little slower even if it's still very much there let's move forward a few iterations now that we've reached 99 points let's take a look at the [Music] result it's already much better the strong oscillations have disappeared and the stable state is reached very quickly however I'd be happier if the pendulum stabilized in the center of the rail let's adapt the fitness function again to push the solutions towards the middle in addition to the previous modification the score will be reduced in proportion to the distance to the center progress is very rapid at first before slowing down and gently increasing let's fast forward and see where it takes us it's exactly what I wanted to see the balance is reached very quickly and then the cart remains almost stationary this solution was found quite quickly in less than a minute in real time of course in simulated time it's a different story since 1,000 agents spent 150 times 100 seconds trying out random things all in all almost 6 months of training were required which isn't incredible particularly when compared to the few minutes it takes a human to learn to perform the same task although execution remains less optimal and precise out of curiosity let's look at another possible result by changing the training seed another rather elegant solution although the final network is a little more complex than last time after playing with all these solutions for a while I wondered how sensitive they were to external disturbances to test this I added random pulses that are applied to the end of the pendulum in an irregular pattern let's load the same neural network and see what what happens with the addition of random [Music] pertubations I was very pleasantly surprised to find that the system remained stable even with the disturbances I was expecting more sensitivity from the solution [Music] I then wanted to interact directly with the pendulum myself to better test its robustness overall I'm very pleased with these results although the method used is very rudimentary however the simple pendulum seems to be a bit too easy of a problem even for this algorithm that's why in the next video we'll see how this same algorithm copes with a double pendulum making the problem far more complex due to its extremely unstable and chaotic nature [Music]
Info
Channel: Pezzza's Work
Views: 46,126
Rating: undefined out of 5
Keywords:
Id: EvV5Qtp_fYg
Channel Id: undefined
Length: 12min 59sec (779 seconds)
Published: Fri May 03 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.