The Knapsack Problem & Genetic Algorithms - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i would like to talk about genetic algorithms they were particularly useful for me during my phd and i think in terms of being able to build something yourself and experiment with them i think they're quite good bang for buck and i think you can solve a lot of problems using them which would otherwise be able to do so and even if you can't solve them perfectly you get an indication of what's what and i think that's pretty neat there's a broader category called evolutionary algorithms and this is a particular type of them if you represent a problem in a specific way you can evolve solutions to that problem in a way which mimics biological evolution and it's quite visual as well where we start is we have to have a suitable problem in order in order to apply genetic algorithm um some problems are more suitable than others and we're going to do a toy problem for this called the knapsack problem you have a knapsack which is like a rucksack or a bergen and you fill it with boxes and each box has a weight and a value and the objective is to fill the bergen with the most value without going over a specified weight limit so a good way to think about the knapsack problem is an aladdin's cave full of treasure so if you're in the desert and you had one rucksack you'd have to fill it with the most valuable items and that's a very good representation of this exact problem we're going to do a simple version of that problem just because it's easy to understand so we're going to have four boxes and each one has a weight and then we'll write the values below which again will be just made up so if we do five four seven two so these are our four boxes and we've got a bergen here and the goal is to put the maximum value of boxes in this burgund without going over a weight limit so seeing if these are the weight limits we're going to impose a limit of 15 kilograms so if the limit is 15 what we put into the bergen or the knapsack has got to be less than 15. we can actually represent a solution to this problem with four bits so for either zeros or ones and each one corresponding to a box and this one's pointing to this box what this says is that the first box this one won't be included in the knapsack the second one will the third one will and the fourth one won't and what we can do is then we can add up the values and the weights to see how well this satisfies this problem so the weights are two and one so that's three and three is less than 15 kilograms so what we can do is then we can give it a score on the value so we can actually return the score which is four plus seven which is eleven so this solution here scores 11. now if we were gonna do one which is perhaps not so successful so we can do one one zero one this solution has these two boxes in and these two boxes are quite heavy so what we know with the limit being 15 if we add 9 and 7 that's 16 which is above the limit so what this solution returns is zero so this is a useless solution we don't want it and the goal is is to get over time the best combination of boxes so that we can get the most value without being overweight so now we know how to represent our solutions what we can do is we can make a population of solutions which for now will just be random so we got no idea how well they're going to score so this is our population that we've got to start with so this is eight randomly generated solutions so what we did previously when we defined the problem that's fitness function so if we were going to write that into code what would happen is we would be able to give it one of these solutions and it would return how good that solution is so what we can do is we can go through each one of these and we can get a score to say how fit this solution is we can put the scores here so let's just have a little table so basically they get the score of the value but if the weight is too high to get a zero exactly right yep that's it so we've now got our scores so we've got an idea of how good these solutions are it's similar to survival of the fittest so in general with this algorithm the better solutions tend to survive and propagate over time what we do now is once we've got our population and we've got the fitnesses of that population we do something called selection we have to select some of the population to go forward to the next step there are many ways we can do this one of the ways which is quite common is something called roulette wheel selection where each one of these solutions will get a chunk of an artificial roulette wheel and the bigger the score the bigger the chunk and we spin it generally if you have a high score it's like you've got a bigger chunk and you're more likely to be selected however that's not always the case we'll do a different one which is objectively quite good and um it's easier to implement and that's called tournament selection so what we're gonna do is we're gonna have a mini tournament and we're gonna randomly select two members of the population two solutions to have a little battle and the one with the highest score wins these are just picked randomly so we're going to pick this solution and this one for the first tournament okay and the winner of this tournament is this one so the best score here for this tournament was top one is that six and then we're going to have another tournament between say this one and this one so these two have a little battle and it's this one that wins we've now got our two parents which we will draw here so they were the lucky winners of the tournaments so we've got zero one zero one for the other parent we've got zero zero one one so then what we can do is we can do something called crossover which is when we cross over some information from parent a with some information from parent b and we're going to do this by basically splitting in half and just swapping each part over so the idea of this is that then we get two children which have a little bit of parent a and a little bit of parent b and when we're using a genetic care and we do this according to it to a rate so we can set a parameter at the start called the crossover rate a lot of playing with genetic algorithms is changing the the rates and population size so yes we're going to swap over our two parents to create two children so we're going to do the first two so the first one and the last two and then so then we will have two children now and we'll draw them down here so the first one zero one from here and then one one from this parent and then we'll have zero zero from this one and then zero one from up here so these are our two children so we've done crossover to get the children so now we've got one final operation and that's mutation so we also have a mutation rate as well as a crossover rate and a mutation rate is typically a lot lower than the um crossover rate um so typically something like 0.05 0.01 0.1 something like that and what we do is we loop over every bit of information that we've got in the two children and for each one of these we pull a random number between zero and one if it's less than the mutation rate we flip a bit okay so this mimics mutation in nature the idea that sometimes you can just get random unexpected changes which adds the variance of the population which over time is quite useful so what we're going to do is we are going to flip just this bit so now this child it's down that what we started off with we had our population we went through tournament selection we had a little mini tournament we've got two parents we then crossed the parents over in this instance that doesn't always happen but it did here and then we've gone through mutation and mutation has flipped one of the bits from a one to a zero so these are now our two completed children once we have our children here what we do is we repeat this process until we get an equivalent amount of children that were in our original population so what we do is we go back to this stage and we've already calculated the fitnesses of every single member of the population so what we do is we just generate another random tournament so it won't be this one and this one or it could be but it will likely be a different tournament so we could do this one and this one for the first tournament and then say this one and this one for the second tournament and then we would generate another two children and then we go back and do the genetic operations on those two children and then if we've done that twice we'll have four children so we'd repeat that process until we have the same amount of children as we had in the population and then once we've swapped over the population of children we've created with the old population that's one generation okay and typically what we do is you run this for hundreds 250 500 000 generations and over time what you should see is the average fitness of the population gets much better and the best fitness gets a lot better and generally what you see is you see a curve that goes up and then it tapers off at the end and that's generally a good time to stop the algorithm i'm guessing this is normally done with a lot more parameters than four right yes so you're exactly right this problem is trivial it's more interesting if you add other parameters so if instead of having weight and value you could have weight value and size or additionally you have something like weight value and robustness so if it's like an aladdin's k full of jewelry you'd want to take the ones that lowest in weight highest in value but if two were the same in that respect you'd probably elect to take a piece of gold rather than a china pot because it might break in your rucksack so when you start having more parameters it becomes less easy for to see what exactly is a good solution to this problem and indeed how good a given box is because some parameters are really high and some are really low you don't quite know where it exists and when you start having a hundred a thousand of these this is when um jayhawns can be quite useful they sometimes don't give you the optimum solution in fact they often don't but they give you a good idea of what direction it's going in so you can get a better grasp of the problem at hand this is a very vanilla genetic algorithm and there are some things that could be improved and one of them is if we look at this sheet here this is our original population where we had scores this solution here wasn't picked even though it was the highest so it was just randomly not picked in this instance but hypothetically it is possible for this solution solution not to be picked at all and not to not be passed to the next generation so there are some strategies which you can use to overcome this one of them is elitism which is where you basically take the best solution from the population and then you put it in the next generation anyway so you just remove one from the the new eight that we had the new population and then just put that one in here if we were gonna use this legitimately to solve this problem there are some parameters in here which would be better if they were changed so for example the population size 8 is good visually but generally if we're going to start out using genetic algorithms we use a population size for something like 100 250 a thousand something like that because with this it's very hard to get variation partly because we've only got eight solutions but partly because the problem's so simple so if we did have value weight and then we had size because some shapes are better than others and robustness and then we had a hundred boxes i'd probably start with a population size of 500 and i'd have a crossover rate of 0.5 so crossover happens half the time and probably a mutation rate of 0.05 and i'd use a tournament selection because very rarely you have a tournament selection size too it's not particularly useful i have a tournament selection of something like four or eight those parameters just loosely should help um preserve the heterogeneity within the population to make sure that you have a a um a rich space look through because ultimately this is a search algorithm this is just going through this search space here to try and find the best solution and if you solve other functions you can actually plot how the jet algorithm is moving through a specific function you only have to work out whether it's worth alerting the user if you find the key so you know you download the temporary exposure key you perform the encryption you generate the potential rpis and you compare them with the ones you've seen or if you want a more slightly comprehensible message it's saying maybe you haven't applied a function to enough arguments
Info
Channel: Computerphile
Views: 120,802
Rating: undefined out of 5
Keywords: computers, computerphile, computer, science, Knapsack Problem, Genetic Algorithm, Algorithm, Programming, Coding, University of Nottingham, Dr Alex Turner, Tournament Selection, Roulette Selection
Id: MacVqujSXWE
Channel Id: undefined
Length: 12min 12sec (732 seconds)
Published: Thu Oct 01 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.