Quantilizers: AI That Doesn't Try Too Hard

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

make a algorithm that prevents the agi from making a expected utility maximer.

👍︎︎ 1 👤︎︎ u/loopy_fun 📅︎︎ Jan 07 2021 🗫︎ replies
Captions
hi so way back in the before time i made a video about maximizers and satisfices the plan was that was going to be the first half of a two-parter now i did script out that second video and shoot it and even start to edit it and then certain events transpired and i never finished that video so that's what this is part two of a video that i started ages ago which i think most people have forgotten about so i do recommend going back and watching that video if you haven't already or even re-watching it to remind yourself so i'll put a link to that in the description and with that here's part two take it away past me hi in the previous video we looked at utility maximizers expected utility maximizers and satisfices using unbounded and bounded utility functions a powerful utility maximizer with an unbounded utility function is a guaranteed apocalypse with a bounded utility function it's better in that it's completely indifferent between doing what we want and disaster but we can't build that because it needs perfect prediction of the future so it's more realistic to consider an expected utility maximizer which is a guaranteed apocalypse even with a bounded utility function now an expected utility satisficer gets us back up to indifference between good outcomes and apocalypses but it may want to modify itself into a maximizer and there's nothing to stop it from doing that the situation doesn't look great so let's try looking at something completely different let's try to get away from this utility function stuff that seems so dangerous what if we just tried to directly imitate humans if we can get enough data about human behavior maybe we can train a model that for any given situation predicts what a human being would do in that scenario if the model's good enough you've basically got a human level agi right it's able to do a wide range of cognitive tasks just like a human can because it's just exactly copying humans that kind of system won't do a lot of the dangerous counterproductive things that a maximizer would do simply because a human wouldn't do them but i wouldn't exactly call it safe because a perfect imitation of a human isn't safer than the human it's perfectly imitating and humans aren't really safe in principle a truly safe agi could be given just about any level of power and responsibility and it would tend to produce good outcomes but the same can't really be said for humans and an imperfect human imitation would almost certainly be even worse i mean what are the chances that introducing random errors and inaccuracies to the imitation would just happen to make it more safe rather than less still it does seem like it would be safer than a utility maximizer at least we're out of guaranteed apocalypse territory but the other thing that makes this kind of approach unsatisfactory is a human imitation can't exceed human capabilities by much because it's just copying them a big part of why we want agi in the first place is to get it to solve problems that we can't you might be able to run the thing faster to allow it more thinking time or something like that but that's a pretty limited form of super intelligence and you have to be very careful with anything along those lines because it means putting the system in a situation that's very different from anything any human being has ever experienced your model might not generalize well to a situation so different from anything in its training data which could lead to unpredictable and potentially dangerous behavior relatively recently a new approach was proposed called quantalizing the idea is that this lets you combine human imitation and expected utility maximization to hopefully get some of the advantages of both without all of the downsides it works like this you have your human imitation model given a situation it can give you a probability distribution over actions that's like for each of the possible actions you could take in this situation how likely is it that a human would take that action so in our stamp collecting example that would be if a human were trying to collect a lot of stamps how likely would they be to do this action then you have whatever system you'd use for a utility maximizer that's able to figure out the expected utility of different actions according to some utility function for any given action it can tell you how much utility you'd expect to get if you did that so in our example that's how many stamps would you expect this action to result in so for every action you have these two numbers the human probability and the expected utility quantalizing sort of mixes these together and you get to choose how they're mixed with a variable that we'll call q if q is zero the system acts like an expected utility maximizer if it's one the system acts like a human imitation by setting it somewhere in between we can hopefully get a quantizer that's more effective than the human imitation but not as dangerous as the utility maximizer so what exactly is a quantizer let's look at the definition in the paper a q quantilyzer is an agent that when faced with a decision problem returns a random action in the top q proportion of some base distribution over actions sorted by the expected utility achieved if that action is executed so let's break this down and go through how it works step by step first we pick a value for q the variable that determines how we're going to mix imitation and utility maximization let's set it to 0.1 for this example 10 now we take all of the available actions and sort them by expected utility so on one end you've got the actions that kick off all of the crazy extreme utility maximizing strategies you know killing everyone and turning the whole world into stamps all the way down through the moderate strategies like buying some stamps and down to all of the strategies that do nothing and collect no stamps at all then we look at our base distribution over actions what is that in the version i'm talking about we're using the human imitation system's probability distribution over actions for this so our base distribution is how likely a human is to do each action that might look something like this no human is ever going to try the wacky extreme maximizing strategies so our human imitator gives them a probability of basically zero then there are some really good strategies that humans probably won't think of but they might if they're really smart or lucky then a big bump of normal strategies that humans are quite likely to use that tend to do okay then tailing off into less and less good strategies and eventually stupider and stupider mistakes the humans are less and less likely to make then what we do is we find the point in our action list such that 10 of the probability mass is on the high expected utility side so that's what q is really changing it's where we make this cutoff note that it's not ten percent of the actions that would be over here it's ten percent of the probability mass then we throw away everything on the right all the stupid and useless choices we set them to zero and we keep the top ten percent now this is no longer a valid probability distribution because it only sums up to 0.1 so we multiply all of these by 10 so that the whole thing sums to 1 again and that's our final probability distribution which we sample from to get our chosen action so let's look at some different actions here and see how they do consider something like misremember your credit card details and keep trying to order stamps with the wrong number and you can't figure out why it's not working a human is reasonably likely to do that not very likely but we've all met people who point is a pure human imitation might do that but the expected utility maximizer can see that this results in very few stamps so it ends up low on the list and doesn't make the 10 cutoff so there are lots of mistakes that a human imitation might make that a quantalizer won't and note that for our stamp collecting utility function the worst case is zero stamps but you could imagine with other utility functions a human imitator could make arbitrarily bad mistakes that a quantizer would be able to avoid now the most common boring human strategies that the human imitator is very likely to use also don't make the cut off a 50 quantilizer would have a decent chance of going with one of them but a 10 quantizer aims higher than that the bulk of the probability mass for the 10 quantilyzer is in strategies that a human might try that works significantly better than average so the quantalizer is kind of like a human on a really good day it uses the power of the expected utility calculation to be more effective than a pure imitation of a human is it safe though after all many of the insane maximizing strategies are still in our distribution with hopefully small but still non-zero probabilities and in fact we multiplied them all by 10 when we renormalized if there's some chance that a human would go for an extreme utility maximizing strategy the 10 percent quantilizer is 10 times more likely than that but the probability will still be small unless you've chosen a very small value for q your quantalizer is much more likely to go for one of the reasonably high performing human plausible strategies and what about stability satisficers tend to want to turn themselves into maximizes does a quantizer have that problem well the human model should give that kind of strategy a very low probability a human is extremely unlikely to try to modify themselves into an expected utility maximizer to better pursue their goals humans can't really self-modify like that anyway but a human might try to build an expected utility maximizer rather than trying to become one that's kind of worrying since it's a plan that a human definitely might try that would result in extremely high expected utility so although a quantalizer might seem like a relatively safe system it still might end up building an unsafe one so how's our safety meter looking well it's progress let's keep working on it some of you may have noticed your questions in the youtube comments being answered by a mysterious bot named stampy the way that works is stampy cross posts youtube questions to the rob miles ai discord where me and a bunch of patrons discuss them and write replies oh yeah there's a discord now for patrons thank you to everyone on the discord who helps reply to comments and thank you to all of my patrons all of these amazing people in this video i'm especially thanking timothy lillarcrap thank you so much for your support and thank you all for watching i'll see you next time you
Info
Channel: Robert Miles
Views: 54,921
Rating: undefined out of 5
Keywords:
Id: gdKMG6kTl6Y
Channel Id: undefined
Length: 9min 54sec (594 seconds)
Published: Sun Dec 13 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.