How to train simple AIs to balance a double pendulum

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in my previous video we saw how a simple Evolution algorithm could be used to solve simple tasks such as balancing a simple pendulum it's best to watch it first if you want to know the details of the algorithm used to perform the training however this specific task was very simple requiring only a few seconds for the algorithm or even a human to find a solution the various Solutions found while very effective in solving the problem were also very simple with neural networks of just a few nodes in this second part we'll see if a basic algorithm such as the one described in the first video can succeed in finding a solution for a far more complex and precise exercise balancing a double pendulum first of all is it really that much more complicated to have a double pendulum instead of a single one wouldn't it simply be twice as difficult to illustrate the difference in difficulty let's start by comparing the performance of a human in both cases first the simple pendulum as we can see is pretty easy to get the hang of it and only takes a few minutes of practice intuitively it's fairly easy to understand what's going on if the pendulum falls to the left you move the C to the left to compensate and the same goes for the right now let's see what happens with the double pendulum [Music] 2,000 years later we can see that there are several problems firstly it's already very difficult to stabilize the pendulum even in the neutral downward position secondly when by Chance the pendulum is relative upright it's very difficult to know what to do to hold it in place and every failed attempt reinforces its erratic movements following this resounding failure let's take a closer look at what makes the double pendulum so complicated to control first of all it's a highly unstable system this means that when the pendulum is in a state of equilibrium the slightest disturbance is enough to take it out of this state this property makes the control window much thinner than in the case of the simple pendulum secondly in addition to being highly unstable the double pendulum is chaotic a system is said to be chaotic when very slight variations in initial conditions lead to very different results in this case three double pendulums start with angles identical to the millionth of a degree as we can see this tiny difference leads to a wide Vari variety of trajectories it's a notion that's often quite misunderstood and it's not at all incompatible with determinism one direct consequence is that it is difficult to predict the evolution of such a system a typical example of a chaotic system is the Earth's atmosphere making weather for casting quite uncertain to carry out the physics simulations for this project I used the X pbd method which is very easy to implement thanks to the excellent research article made public by its authors I will put a link to it in the description as the trajectories of the pendulums are quite hypnotic let's see how the same thing looks with 4,000 double pendulums as in this example only the trajectories will be drawn [Music] [Music] [Music] [Music] [Music] [Music] I think I could spend hours watching all those beautiful moving geometries but let's get back to our subject now that we have a better idea of how difficult the task is let's see if the method used for the simple pendulum can work here too here are the Network inputs and outputs will be using as with the simple pendulum all the network will control is the speed of the Cod I'm not showing the hidden neurons and connections here because they'll be created during the training phase so I don't know in advance what the network will look like the scoring function used is very simple each agent gains one point per second when the free end of the pendulum is above a threshold this threshold is equal to the length of the pendulum - 5% let's start the training using 1,000 agents each of whom will have 100 seconds to collect as many points as possible given the score function the maximum possible score is 100 [Music] points one eternity later unfortunately even after letting the algorithm run long enough to break my interface the score is nowhere near the theoretical Max of 100 points I'm not going to show here the best solution because it only looks like a crazy blender and I wouldn't want to trigger an epileptic seizure a potential problem with this first approach is that the score function doesn't differentiate between a solution that does a lot of rotations and one that does fewer but stays above the threshold for longer to try and correct this problem I then tried another score function this time at each point in the simulation the score is incremented by the time currently spent above the threshold to the power of 8 it's a bit of an exaggeration but I wanted to see what effect it would have so I went a bit wide the delta T is simply used to normalize the score to the simulation frequency in addition time is reset to zero as soon as the pendulum Falls below the threshold all this should give a big advantage to solutions that keep the pendulum above the threshold for longer than others let's see how this new evaluation function works with a new training session it's important to note that the theoretical maximum score now has nothing to do with the old 100 the score looks pretty high but I have no idea what it represents as a solution so let's have a look it's a little better but it's still not enough there's no balance and the Pendulum goes back into blender mode very quickly maybe I'm just unlucky let's try again this time the training lasted much longer so let's see the results even if the state of the neural network leaves me a little skeptical once again it's really disappointing even if it's still better than my first attempt I've tried lots of different configurations and score functions without success at this point I was really considering giving up especially after coming across an article that concluded that they hadn't managed to get a satisfactory res solution using a serious reinforcement learning method after a long pause on the subject an idea came to me perhaps the task is simply too complicated for my rudimentary algorithm why not try starting with a simpler version and gradually increasing the difficulty my idea being that it should be fairly easy to transition from one solution to another gradually since similar problems should have similar solutions for this new training I'm going to start by using a very low force of gravity and a very high air friction coefficient then each time the best current solution reaches a Target score gravity will increase by 1% and friction will decrease slightly the evaluation function will be the same as at the very beginning as it allows me to set a threshold that I can easily link to a degree of success I've also reduced the time of each iteration to 60 seconds to speed up training since the theoretical maximum score is 60 I set the Threshold at 50 this means that over the minute of an iteration agents will have to spend at least 50 seconds above the threshold let's see what it all adds up to in this diagram the scores are shown for each iteration each time a score exceeds the threshold it is marked in green and the difficulty increases [Music] the score Evolution shown here is not in real time so I've added time indications to give an idea of how long this kind of training really takes after over 100 iterations gravity has quadrupled so let's see what the current solution looks [Music] like it's a rather abrupt solution but it's pretty clean let's keep [Music] training let's look at the solution found after 14 [Music] minutes I'm very happy to see that there's now the beginnings of compensation to counteract the loss of balance induced by a much stronger gravity [Music] a few iterations later gravity has reached its Target [Music] value the network has come a long way since last time it has found a whiplash-like movement and is stabilizing much closer to the center however I feel that the simplest thing has been done for the moment since the high friction greatly reduces the instability and chaotic aspect of the double pendulum this is quite obvious if you look at what happens when you deactivate the neural network control it's like the simulation takes place underwater let's resume the training and see what happens when friction approaches zero [Music] it's still not bad but you can see that the network has grown considerably in the meantime after almost 50 minutes the friction is fairly close to [Music] zero however the solution is starting to degrade with strong rapid oscillations this kind of strategy is neither natural nor pretty it doesn't bode well for the rest of the training [Music] after a few more minutes the friction reached [Music] [Music] zero honestly I was expecting a more jerky solution after all the previous failures it's very satisfying to have proof that it can be done it's always easier to persevere when you know there's something to find in the [Music] end and this time we can clearly see the characteristics of a double pendulum when the neural network is disabled I'm very happy to have succeeded in getting a solution but I'm not completely satisfied I still find it a bit unrealistic and abrupt let's tweak the evaluation function a little and see if that improves things to avoid strong velocity variations I've divided the score by the sum of the variations between successive values returned by the network this should encourage smoother Solutions let's start from scratch with this new function and see if it produces better results [Music] let's see what we have after 8 [Music] minutes to its credit this solution is not abrupt now I'm wondering how much it will be able to adapt to more complicated conditions after 20 minutes progress slowed considerably and the score fell for this second session the threshold is very different because the evaluation function has changed it is also more difficult to find a threshold intuitively although far from perfect this solution is much more natural and smooth I really like the movement I still wonder whether it will be possible to maintain this Elegance in the future [Music] [Music] after a very long stagnation of over 1500 iterations the target gravity value is finally [Music] reached despite a slightly more unstable execution the movement is still clean and smooth [Music] now it's time to reduce friction to zero I'm going to fast forward to the end because it will take another 20,000 iterations to get there for a total of 8 hours of training or 46 years of simulated [Music] time this new solution is much better than its predecessor with far fewer jerks and a movement that feels quite natural however I think there's still room for improvement the last thing I'd like to tray is to also divide the score by the total distance covered by the cart to favor the most stable Solutions let's start training again with these new parameters a quick check after 8 minutes to get an idea of the direction taken by the algorithm it would appear that the solution is both very smooth and stable which is rather encouraging for the future I'm going to move on very quickly with this iteration as everything went pretty well right up to the end training was also fairly quick lasting around 2 hours in total [Music] so let's see what the solution looks like for this new [Music] training in my opinion this is a very good solution fast smooth and very stable I'm really happy with the end result still there's something I'd like to check as with the simple pendulum I wonder if the double pendulum can withstand random disturbances let's check it out right [Music] away I am really pleasantly surprised by the robustness of the system even if of of course the disturbances are less severe than with the simple [Music] pendulum this solution is really good and I'm very happy with it but to tell the truth there's still a little something that bothers me all this time the neural network was controlling the speed of the cart which means that the acceleration was somehow infinite which isn't very natural this forced me to modify the evaluation function to obtain correct results I wonder to what extent controlling the cart acceleration instead might not be a simpler and more physically correct solution for this latest experiment we'll have to make a few modifications in several places first of all we'll need to adapt the neural network a little by adding a new input that will allow it to know the card's current velocity then to make things easier I decided to increase the execution frequency of the neural network on until now this frequency was 60 HZ but now it will be 480 HZ finally because of the previous modification the Precision of simple floats will be a problem I've therefore decided to use double floats everywhere as these are immensely more precise let's take a quick example to illustrate the problem this tiny code simply performs Oiler integration on a float X for a simulation running at 480 HZ when you run this code here's what you get you can see that unfortunately X hasn't moved as the following test confirms now that everything's ready let's see if we can get something that [Music] works it seems that a solution has been found for the time being let's see what it looks like it's a little less precise and rapid than previous versions but it [Music] works a few minutes later gravity reached its maximum value quite fast [Music] actually again it's a little less elegant than before but the solution is smooth and stable you can see that the friction doesn't vary at a constant rate which is because I manually adapt the difficulty according to the performance of the [Music] solutions I forgot to mention that the score function used is the very first one simply proportional to the time spent above the threshold [Music] and finally after a somewhat complicated Final Phase we reach a friction of zero let's see how the network [Music] performs I'm really pleasantly surprised by the stability of the solution I'm happy a solution was found at all [Music] all in all I'm glad I was able to find different solutions using such a rudimentary algorithm however there are quite a few limitations the training remains long lasting around 2 hours there were quite a few small adjustments to be made manually and none of the solutions managed to regain equilibrium after collapsing also in around two out of three cases training is unsuccessful and and stagnates however I think this kind of method is still an interesting way of getting started in the future I'd like to explore other more sophisticated algorithms to Benchmark against thanks for watching I'll leave you with a few other Solutions I was able to find [Music] [Music] [Music]
Info
Channel: Pezzza's Work
Views: 186,739
Rating: undefined out of 5
Keywords: pezzza, physics, neural network, neat, 2D, sfml, AI, pendulum, balancing, real time
Id: 9gQQAO4I1Ck
Channel Id: undefined
Length: 24min 58sec (1498 seconds)
Published: Wed Jun 05 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.