Algorithms to Live By | Brian Christian & Tom Griffiths | Talks at Google

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

BORIS DEBIC: Welcome, everybody to one more Authors at Google talk. Today with us, Brian Christian and Tom Griffiths. Their book, "Algorithms to Live By," is now available in all the fine bookstores in the area and in the US and around the world and whatnot. Brian Christian is the author of "The Most Human Human," a "Wall Street Journal" bestseller, "New York Times" editor's choice, and "New Yorker" favorite book of the year. His writing has appeared in "The New Yorker," "The Atlantic," "Wired," the "Wall Street Journal," "The Guardian," and "The Paris Review," as well as in scientific journals such as "Cognitive Science." He has been translated into 11 languages. He lives in San Francisco. Tom Griffiths is a professor of psychology and cognitive science at UC Berkeley, where he directs the computational cognitive science lab. He has published more than 150 scientific papers on topics ranging from cognitive science psychology to cultural evolution. He has received awards from the National Science Foundation, the Sloan Foundation, the American Psychology Association, and the Psychonomic Society, among others. He lives in Berkeley. So I'm not going to talk too much about the book because we have Peter Norvig here, who will explain to you why this may be the-- we call it the gateway drug to computer science. Come on, Peter. PETER NORVIG: OK. So you remember Apple had this campaign that said, think different? And I think we realize that we all think different in some ways. Right? So we're all nerds or whatever you want to call us. And sometimes when you go out with your friends and family, you realize, oh, I think differently than them. And I think that's what this book is about. Boris talks about it as a gateway drug, and I think that's great. That's a good way to think about it because we're coming up now-- there's another hour of code thing, and we're going to celebrate. And we're going to try to teach everybody to code. But learning how to draw a rectangle on the screen and figuring out that the third argument is the color and the first two are the x- and the y-coordinates, that's not really it. That doesn't change the way you think. I mean, that's a good skill, to be able to draw rectangles and circles on the screen. But really what's important is being able to model the world. Jeanette Wing talks about it as computational thinking. And I think there's a mix between different types of thinking. There's this computational thinking. There's a mathematical thinking. There's a statistical thinking. And we all have some combinations of all of those things. And those skills together, I think, are more important than the details of the API for JavaScript objects. And this book really gets at it. It doesn't teach you how to code, but it teaches you how to think in that way. And it gives you examples to ponder about your everyday life and why it might be important to think that way and when you can do with it. So tell us all about it, Brian and Tom. [APPLAUSE] BRIAN CHRISTIAN: Thank you so much, Boris and Peter, for the introduction, and thanks to Google for the invitation to come speak. The talk is "Algorithms to Live By." I'm Brian. This is Tom, in case you were curious which of us is which. We sometimes get that question if we've forgotten to introduce ourselves at the top. And the book opens with an example that I think will be acutely, perhaps uncomfortably, familiar to many of us here in the Bay Area, which is looking for housing. So in a typical consumer situation, the way you make a choice is you consider a pool of options. You think hard about which one you like the best. And then you go with that. But in a sufficiently crowded real estate market, in a sufficiently competitive market-- which the Bay Area certainly is-- you don't have the luxury of making the decision in that way. Rather, the decision takes the form of evaluating a sequence of options one at a time, going to a number of different open houses, and at each point in time, you must make an irrevocable commitment. You either take the place that you're looking at right there on the spot, never knowing what else might have been out there, or you walk away, never to return. You have almost no chance of going back and getting the place. This is certainly a much more fraught setting for making a decision because here the critical question that you have to ask yourself is not which option to pick but how many options to even consider. Intuitively, we have this idea that you want to look before you leap. You don't want to make a premature choice. You want to get a sense of what's out there. But you don't want to hold out too long for some kind of perfect thing that doesn't exist And let the best options pass you by. So our intuition tells us that we have to strike some kind of balance between getting a feel for what's out there and setting a standard and knowing a good thing when we see it and being ready to commit. And the story that we get from mathematics and computer science is that this notion of balance is, in fact, precisely correct. But our intuition alone doesn't tell us what that balance should be. The answer is 37%. If you want the very best odds of finding the very best apartment, consider exactly 37% of the pool of options in front of you. Or alternately, you can think of it in terms of time-- 37% of your time just calibrating. Leave your checkbook at home. If you've given yourself a month for the search, in this case, that would be 11 days. After that point, be prepared to immediately commit to the first thing you see that's better than what you saw in that first 37%. This is not only the intuitively-satisfying compromise between looking and leaping, this is the provably optimal solution to this problem. So, for example, if you're committed to living in one of the painted ladies here in San Francisco, and they have their open houses on successive weekends, the algorithm tells you to look at the first two, and no matter how tempting they seem, hold tight. And then, starting with the third one, immediately leap for the first one better than what you saw at the first two. Now, more broadly, this is known as what's called an optimal stopping problem. And this structure of encountering a series of options and being forced to make a commitment one way or another-- either you're all in or you walk away-- some people have argued that this is a structure that describes not only things like the apartment hunt or real estate in general. It also, many people have argued, describes dating. You're in a relationship, and you have this decision of when to commit. Have you met enough people to have a sense of who your best match really is? And so you can do some back-of-the-envelope math and say things like, OK, the average expected lifespan of an American is 79. 37% of that gives me 29. And this roughly divides my romantic life into dating for fun versus dating seriously to really evaluate for a mate. Of course, as we will see, it all depends on the assumptions that you're willing to make about love. TOM GRIFFITHS: So as you've seen, simple algorithms can solve some of the problems which we actually normally think about not as problems for computers but as problems for people. So there are a set of problems which we have to solve just as a consequence of the fact that our lives are lived in finite space and finite time-- so having to do things like try and figure out how to organize our house or our closet or our office in a way to make it most efficient or trying to figure out how to schedule our time so that we can do the most possible things. And we normally think about those as being fundamentally human problems. But really, the argument that we make in the book is that they're not. They have analogs to the problems that computers have to solve. So, you know, your computer has to figure out how to manage its space-- its space on the hard disk and its space in memory. And it also has to figure out how to manage its time-- what it's going to do next, what program it's going to run. And as a consequence, computer scientists have put a lot of thought into coming up with good algorithms for solving those problems. So what we do in the book is explore how taking this perspective gives us insight into the problems that human beings have to solve, in some cases offering us nice, elegant, simple solutions to these problems, like the 37% rule, in other cases giving us new ways of thinking about how those problems might be described in mathematical terms and given, perhaps, a kind of quantitative analysis. So we consider a range of different problems-- so the optimal stopping problem, which you've heard a little bit about, as well as a variety of other contexts in which thinking algorithmically actually gives us a new insight into a human sort of problem. And what we're going to do today is talk about just a few of these problems. In the book, we also talk about some of the more general principles which go into designing good algorithms or thinking about the kinds of underlying ideas that are useful when we're trying to engage with the world in this computational way. So the three problems that we're going to focus on today are optimal stopping, the explore or exploit trade-off, and caching. And optimal stopping you've already heard a little bit about. The 37% rule is just one instance of a strategy which is useful for solving an optimal stopping problem. It's the solution to a problem which is known in the mathematical literature as the secretary problem. So the canonical set-up for this is that you're trying to hire a secretary. You have a pool of applicants. Each applicant comes in and is interviewed. You don't have a way of evaluating the applicants except against one another. So you can only make a judgement of how good they are based on the applicants that you've seen so far. And for each person who comes in, you have to make a decision immediately as to whether you hire that person or whether you dismiss them, in which case you'll never see them again. So this secretary problem was first presented to the public in the 1960s in a column that was written by Martin Gardner. But in the book, we trace the history of this problem back, and it turns out that romance was at its core. So what we actually found by doing some archival research is that the person who claims to have originated the problem is a mathematician called Merrill flood. And the story that he tells about it is that his daughter, who had just graduated from high school, was engaging in a relationship which he and his wife didn't really approve of with a much older man. And so he wanted to somehow convince her that this was a bad idea. So being a mathematician, the way that he approached this problem was she turned out to be taking the minutes at a mathematics conference that Flood was scheduled to present at. And so he went up, and he presented this problem of a woman who's entertaining a series of suitors and faced with the challenge of deciding which of those suitors' proposals she should accept. And he didn't actually know the solution to the problem at that point. But he was pretty sure that the number was larger than one. And so he was hoping that his daughter would sort of take the message as she was writing it down, and sort of think about how it applied to her own situation. So as we've seen, the solution to this problem turns out to be 37%. But as Brian was saying, a lot of that depends on the assumptions that you make. And there's actually been a history of mathematicians seeking love that provide some cautionary tales and perhaps some insights into variants on this problem, which are things that are worth paying attention to. So one kind of situation that you can encounter when trying to pursue this approach is rejection. And we actually found a story of Michael Trick, who's an operations researcher at Carnegie Mellon University. And he told the story of applying the secretary problem in very much the same way that Brian was describing-- calculating the period over which he thought he'd be searching, working out what 37% was. And it happened that the age at which he should switch from having fun to being serious was precisely his current age. And so he went and proposed to his girlfriend, who turned him down. So Trick ran into one of the assumptions that are being made here, which is that when you make an offer to somebody, they should be willing to accept it. And only under those circumstances is the 37% rule valid. If somebody is potentially able to reject your offer, then it changes the strategy that you should follow. And in particular, it means that the period that you spend looking should be reduced. So, for example, if you have a 50% chance of being rejected, then you should look at the first 25% of your potential candidates, and then be willing to make an offer to the first person who's better than anybody you saw in that first 25%. Another variant on the secretary problem is what's called recall. And so this is having the opportunity to go back to somebody who you passed over. So you can imagine your candidates come in, but rather than dismissing them meaning that you'll never see them again, dismissing them just decreases the chance that you'll be able to go back to them. Or in the dating setting, basically, breaking up with somebody means that maybe there's a chance that they'd take you back later on. So there's actually a mathematician who experienced exactly this phenomenon. This is the astronomer Johannes Kepler, who after his first wife died went through an extended period of courting various women, trying to sort of evaluate and come up with the person who he thought was going to be the very best person for him. And so over an extended period, he ended up interacting with 11 women. And having sort of explored those possibilities, he decided, getting to the end of that process, that number five, who he'd previously dismissed, was actually the best one for him. So he went back, and he made an offer to her. And it turned out she hadn't accepted any other proposals in the meantime. And so luckily, they were able to be married and had a happy marriage together. So this possibility-- that you can actually go back to somebody, and they can still say yes-- changes where you should set your threshold again. So in this case, it makes it something where you should spend a little longer looking around. So, for example, if there's a 50% chance that somebody who you go back to is going to be willing to accept your offer anyway, then you should look at the first 61%, and then use that to set the standard which you'll then use for evaluating the remainder. And then if you get to the very end of this period and haven't found anybody, then you go back and make an offer to the person who was the very best, which you now know because you've explored all of the options. PETER NORVIG: Tom? So if you had prior knowledge of the distribution that you're drawing from, that should also [INAUDIBLE] the period, right? TOM GRIFFITHS: That's right. So in this case, we're assuming that the only way that you can evaluate people is relative to one another, right? And so that's something which then leads to this kind of strategy, where you have to spend some time building up an impression of what the distribution looks like. So basically, pretty much any variant on this problem you can imagine has been explored by mathematicians in the last 50 years. The case that you're talking about, where you have what's called full information-- you actually know what the distribution is like-- is one where the overall strategy looks quite different. So in that case, what you should do is set a threshold. And then as you go through the pool and you're sort of remaining candidates become sparser, that threshold becomes lower and lower. Some of us might have experienced this in the context of other kinds of romantic interactions. But it certainly makes dire predictions, perhaps, as you age and people start getting married, that you should be willing to accept, perhaps, a slightly-- set a slightly lower threshold in terms of the way that you approach the problem. And there are also variants which are what are called partial information versions, where you come in not knowing what the distribution is but having some information about what that distribution is. And those turn out to be some of the sort of most interesting and mathematically intricate cases. So these examples illustrate some of the ways in which the secretary problem can be made more complicated and can accommodate some of the other kinds of situations that we might encounter in realistic romantic scenarios. But this is, by all means, not the limits of optimal stopping in terms of its implications for human lives. So basically, any situation where you have to make a decision about whether to continue to gather information or to continue to consider opportunities or to act on the information that you have so far is going to be one which is going to fit this same kind of schema of optimal stopping. So another example is deciding when to sell an asset. So be it your house or your company, you have to make a decision in a situation where you might receive a series of offers. But in this case, each of those offers is going to cost you money, right? You're going to be waiting around for somebody to come along and make another offer. And your goal is to maximize the amount of money that you get out of those offers. So this can be formulated as an optimal stopping problem. You have to decide for each offer whether you're going to take that offer. And there's a simple formula that describes the strategy that you should take for solving this problem. So basically, the strategy takes the general form of a threshold. You should set a particular value based on your expectations of the distribution of offers that you're going to receive. And then the first offer that exceeds that value should be the one that you accept. This is more closely analogous to the situation that you were talking about. So, for example, if you have, say, a uniform distribution between getting an offer of $400,000 and an offer of $500,000, then as a function of the cost per offer, we can actually work out what this threshold looks like. And so it gives you a very simple kind of approach that you can take in the context of trying to decide whether you should evaluate an offer. And it also says that you should never look back, right? So if you turned down an offer in the past, that was because the offer was below your threshold. And as a consequence, you shouldn't regret having given up on that opportunity. You should just be waiting around for the next offer that exceeds your threshold. Another example of an optimal stopping problem, which I think many of us have encountered, is the problem of figuring out how to find a parking spot. So in this case, the kind of ideal scenario that the mathematicians imagine-- although, again, there are lots of variants-- is one where you're driving towards a destination, and you kind of have a series of possibly available parking spots that are along the side of the road. And your goal is to minimize the distance that you have to end up walking. So it turns out that the solution to this problem depends on what's called the occupancy rate, the proportion of those parking spots which are occupied. And so for each occupancy rate, what we can do is identify at what point you should switch from driving to being willing to take the next available parking spot. And so we calculate for you a nice convenient table. You can cut this out, stick it on your dashboard. But basically, you can get some interesting conclusions from this, such as if 10% of the parking spots are free-- so a 90% occupancy rate-- then you should start looking for a parking spot seven spaces away from your destination. Whereas if 1% of those parking spots are free, then it's more like 70. And maybe there's a scenario down here which fits to some parts of San Francisco. So another famous problem which has been analyzed in this way is one which, unfortunately, ruins the plot of a lot of heist movies. So this is the problem of a burglar who has to make a decision about when to give up a life of crime. So there's a scene that happens in a lot of heist movies where the team has to sort of coax the old thief out of retirement. The thief is kind of going, well, you know, I kind of like living in my castle and trying to figure out exactly what they're going to do. But to sort of spoil all of those plots, in fact, there's an exact solution, and the thief need only crunch the numbers to work out what should be done. And if you assume that you have a thief who for each robbery is going to get, on average, about the same amount of money and has some probability that they succeed in executing those robberies, and if they get caught, they lose everything. So the goal is to maximize the expected amount of money that you end up with, then the optimal stopping rule is to stop after a number of robberies equal to the probability of success divided by the probability of failure. So if you are a ham-fisted thief who succeeds about half the time and fails about half the time, you could pull off one robbery. And then you should just quit if you got away with it away with it and you've been lucky. But if you're a skilled thief who succeeds 90% of the time and fails 10% of the time, you can pull off nine robberies, and then that's the point at which you should call it quits and go and live in the castle and not listen to any of these guys who come calling. So these sorts of scenarios might seem a little bit artificial, right? They're all cases where you're kind of forced into a situation where you have to make a decision about when to stop doing something. But I think there's a deeper point here, which is that while we normally think about decision-making as being something where we've got all the information in front of us, and it's just a matter of choosing what we're going to pursue, in fact, in most human decisions, we're in a scenario which is much more like one of these optimal stopping problems, where we have to be gathering information. And one of the decisions that we have to make is whether we've got enough information to act. And so I think there's a deeper point here about the way that we think about the structure of human decisions, that often we're engaging in exactly this kind of process even though we might not realize it. BRIAN CHRISTIAN: The second class of problems that we consider in the book are ones that we describe as explore/exploit trade-offs. And this encompasses a wide range of problems where we get to make a similar kind of decision over and over and over in an iterated fashion. And typically, that takes the form of a tension between doing our favorite things-- the things we know and love-- or trying new things. And this type of a structure appears in a number of different domains and in everyday life, restaurants being the obvious case. Do you go to your favorite place? Or do you try the new place that just opened up? It happens in music, where do you do you listen to your favorite album? Or do you put on the radio? It also describes our social lives. How much time do you spend with your close circle of friends, your spouse, your family, your best friends, versus going out to lunch with that colleague that you just met and you think you have something in common and want to kind of foster a new acquaintanceship? This same trade-off also occurs in a number of different places in more societal ways-- in business and also in medicine. So in business, this comes up in the context of ads, which I hear is something that you guys know a little bit about here at Google. So when you're deciding what ad to run next to a keyword, for example, you've got this tension between there is some ad that historically has the best track record of getting the clicks. But there's also this ever-changing dynamic pool of new ads that are entering the system that might be worth trying. They might be better. They probably aren't, but they could be. In medicine, this structure describes the way that clinical trials work, where for any condition, there is some known best treatment with some known chance of success and some known drawbacks. And then there's a series-- there's kind of a pipeline of these experimental treatments that, again, might be better, might be worse. And so we have to trade off between exploring-- that is, spending our energy gathering new information-- and exploiting, which is leveraging the information that we've gathered so far to get a known good outcome. The canonical explore/exploit problem that appears in the computer science literature is known as the multi-armed bandit problem. And this colorful name-- I'm sure this is familiar to some of you in the room. But the name comes from the moniker of the one-armed bandit for a slot machine. So a multi-armed bandit you can think of as just a roomful of different slot machines. So the basic setup of the formal problem is this-- you walk into a casino. There's all sorts of different slot machines, each of which pays off with some probability. But every machine is different. Some of them are more lucrative than others. But there's no way for you to know that until you just start pulling the levers and seeing which ones seem more promising. So again, here's this case where you have to trade off between gathering information and leveraging the information that you have. And so there's this question, which vexed an entire generation of mathematicians, of, OK. Let's say you walk into the casino, and you have 100 pulls. You're there for an afternoon. You have enough time for 100 pulls. What strategy is going to give you the highest expected payout before you leave the casino? In fact, for much of the 20th century, this was considered unsolvable. And during World War II, Allied mathematicians in Britain joked about dropping the multi-armed bandit problem over Germany as the ultimate instrument of intellectual sabotage to just waste the brainpower of the German scientists. And I think one of the simplest explanations of the way in which this problem can be very tricky to think about is the following choice. So let's say you've played one machine 15 times. And nine times, it paid out. Six times it did not. Another machine you've played twice. Once it paid out. Once it didn't. Now, if we just want to very straightforwardly compute the expected value of each of these machines, the nine-and-six machine's got a payout rate of 60%. The one-and-one machine has a payout rate of 50%. And so there are these two kind of competing intuitions here. One is, well, you should obviously just do the thing with the better expected value. The other is, well, there's a sense in which we just don't know enough about the second machine to walk away from it forever. Certainly, it must be worth one more pull or two more pulls. How do you decide what that threshold is? And it turns out this is, in a way, a trick question because it all depends on something that we haven't given you in this description of the problem yet, which is how long you plan to be in the casino. So this is a concept that sometimes gets referred to as the horizon, we refer to as the interval. And this concept has, just speaking personally, given me a bit of an axe to grind with one of my favorite films from my own childhood, which is the inspirational 1980s Robin Williams movie, "Dead Poets Society." It's one of these really feel-good movies, and he plays this inspiring poetry teacher who says to his students in this rousing monologue, "Seize the day, boys. Make your lives extraordinary." And going back through the lens of the multi-armed bandit problem, I can't help feeling that Robin Williams is actually giving two conflicting pieces of advice here. If we're just trying to seize the day, we probably want to pull that nine-six because it's got the higher expected value. But if we want to make our life extraordinary, then we should certainly see if there isn't some value in trying these new things because we can always go back. In standard American English, we have all of these idioms like "eat, drink, and be merry, for tomorrow we die." But it feels that we're missing the idioms on the explore side of the equation, which are things like, life is long, so learn a new language and reach out to that new colleague because who knows what could blossom over many years time. We're still honing the messaging, but it does feel like there's a gap in the culture that can be filled here. So when you're working with a finite interval, the solution to the multi-armed bandit problem comes from a method described by Richard Bellman, dynamic programming, where you basically work backwards and are able to compute the expected value of every pull given all of the possible pulls that you could make as a result of whether that succeeds or fails. And you can actually work out the expected value all the way back to walking in the door of the casino, what should you do? And this provides an exact solution to the multi-armed bandit problem, but there's a catch, which is that it requires that you know exactly how long you're going to be there and exactly how many machines there are. And it also requires doing a lot of computation up front. But I think the critical thing is that we're able to look at these solutions, these exact solutions, and get some broader principles out of it. So, for example, the value of exploration is greatest the minute you walk through the door for two reasons. The first is that if you think about it in the restaurant analogy, you just moved to Mountain View. You go out to eat that night. The first place you try is guaranteed to be the best restaurant you've ever experienced in Mountain View. The next night, you try a different place. It has a 50% chance of being the best restaurant you've ever seen in Mountain View, and so on and so forth. So the likelihood that something new is better than what we already know about goes down as we gain experience. The second reason that exploring is more valuable at the beginning of an interval is that when we make that discovery, we have more chances to go back. So discovering a really amazing restaurant on your last night in town is actually a little bit tragic. It would have been great to find that sooner. And so this gives us this general intuition that as we perceive ourselves to be on some interval of time, we should kind of front-load our exploration and weight our exploitation to the end, when we both have the most experience with what to exploit and the least time remaining to discover and enjoy something new, even if we did find it. This is significant, I think, because it gives us a new way of thinking about the arc of a human lifespan. And so ideas from the explore/exploit trade-off are now influencing psychologists and changing the way we think about both infancy and old age. So to demonstrate infancy, we have a picture of a baby eating a power cord. And I think this demonstrates what a lot of us kind of culturally intuitively think of as the irrationality of babies. They're totally-- have the attention span of a goldfish. They put everything in their mouth. They're distracted really easily. They're really bad at just generally doing things. We give the example in a book-- like a baby gazelle is expected to be able to run away from a wolf within the first day of being alive. But, you know, it takes us 18 years before we're allowed to get behind the wheel of a car, that kind of thing. And the psychologist Alison Gopnik uses ideas from the explore or exploit trade-off as a way of saying, well, maybe this extended period of dependency is actually optimal in some sense because if you're in the casino, for the first 18 years that you're in the casino, someone else is buying your food and paying your rent. And so you don't need to be getting those early payouts to buy your lunch. And so you can really use that period of time to explore, which is exactly what you should be doing at the beginning of your life. There's a sense in which just putting every item in the house in your mouth at least just once sort of resembles walking into the casino and just pulling all of the levers. Similarly, the idea of exploitation is changing the way we think about getting older. So here we have a gentleman who I like think of, imagine it as enjoying the same lunchtime restaurant that he's been to hundreds of times. And he knows exactly what he's going to get and exactly what he likes, and it's great. There's a lot of psychological data that says that as we go through life, older folks have a smaller circle of social connections. They spend their time with fewer people. And there's one interpretation of this that just says, well, they're lonely, or they're detached or disinterested, or it's just kind of sad to get older. But thinking about it from the perspective of exploitation gives a totally different story. And this comes up in the work of the Stanford psychologist Laura Carstensen, who studies aging in an attempt to sort of overturn some of the prejudices we have. And so, for example, one of the intuitions you get from the idea of the explore or exploit trade-off is that towards the end of your life, you really should be spending more of your time doing the things you already know and love both because it's unlikely to make a discovery that's better than the things you already really care about and also because there's less time to enjoy it should you do that. And so it just makes more sense to spend more of your energy on the things that you already know and love. And so as a result, what I think is actually a very encouraging story here is that as you spend more time in the casino, your average payouts per unit of time should go up. So there's a sense in which we should actually expect to get steadily happier as we go through life. We are less disappointed and less stressed out. And her research supports this, which I think is really lovely. Now, there are many cases where we don't necessarily know where we are on the interval of time, or it doesn't necessarily make sense to think of there being some finite interval. So maybe you move to Mountain View, and you don't know how long you're going to live in Mountain View. Or if you're a company, you imagine yourself being interested in being around indefinitely. But nonetheless, there's still a sense in which you care about the present more than the future. This framing of the problem led to a different series of breakthroughs, starting with an Oxford mathematician named John Gittins, who was hired by the pharmaceutical company Unilever to tell them, basically, how much of their money to invest in R&D. And so he frames the-- he almost accidentally made this enormous breakthrough in the problem by thinking of it not as there being some finite interval of time but as there being some indefinite future that is geometrically discounted. So I don't know how long I'm going to be living in the Bay Area. But a really good meal next week is maybe only 90% as valuable as a really good meal this week. Or making X dollars next quarter is only 90% as valuable as making those dollars this quarter. And it turns out that, again, you can get a very precise solution to this problem. So he explored it in this business context. But it's also, I think, very interesting that it was a pharmaceutical context, because this also gives us a way of thinking differently about how a clinical trial should be run. The basic idea behind Gittins's breakthrough here, which is called the Gittins index, is he imagines that-- the word we use is a bribe. So if you think about-- there's a game show that is on TV called "Deal or No Deal," where you have a briefcase that has somewhere between one and a million dollars in it. And someone calls you on the phone and says, I will pay you $10,000 not to open that briefcase. What do you do? This is the basic intuition behind the Gittins index. So Gittins says, for every machine that we've tried and have incomplete knowledge of-- or maybe we have no knowledge of whatsoever-- there's some machine with a guaranteed payout that's so good, we'll never try that machine ever again. Maybe the nine-six versus the one-one is more of a toss-up. But if it was 9-0 and the one-one, then there's a sense in which maybe it's just never worth pulling the one-one machine ever again. And so Gittins comes up with what he thinks is a nice approximation, which is just always play the machine with the highest bribe price. And to his own astonishment, as well as the field's, this turns out to be, in fact, not merely a good approximation but the solution. So this is another case where, like with the parking problem, we present this table in the book, and you can cut it out and take it home with you. And it provides values for situations where you're trying to weigh the value between two different options. And so going back to our two slot machines, the nine and six machine has a Gittins index of 6,300. But the one-and-one machine has a Gittins index of 6,346. So case closed. Pull the one-one machine one more time. What I think is kind of philosophically significant about this is the zero-zero square has a value of 70.29%, which means that you should consider something you've never tried as being just as good as something that you know works 70% of the time, even though it only has an expected value of 50%. And you can see if you follow the diagonal down to the right, it goes from 70% to 63.46% to 60.10%, 58.09%. And it does indeed converge on 0.5 as you gain experience. But there's this boost applied for not having that experience. So we tongue-in-cheek suggest that you just print this out and just use it to decide where to eat. But there are sometimes some problems that come up with this. One is that we don't always geometrically discount our payoffs. The other is that actually computing these values on the fly is kind of computationally intensive. And so there's a third way of thinking about the problem that has come in the wake of Gittins's work, and that is the idea of minimizing regret. So we give a lot of examples that touch on Google's work, but the best illustration of this actually comes from Amazon. So Jeff Bezos talks about being in this really lucrative hedge fund position and deciding whether to give up his cushy job to start an online bookstore. And he approaches it from what he calls a regret minimization framework. "I knew looking back I wouldn't regret it." In the context of the multi-armed bandit problem, you can formulate regret as every time you tried something that wasn't the best thing in hindsight. And so you can ask yourself, how does my regret-- what does that look like as I proceed through my time in the casino? And we have good news and bad news. We'll start with the bad news. The bad news is that you will never stop getting more regrets as you go through life. The good news is that the rate at which you add new regrets goes down over time. Specifically, if you were following an optimal algorithm, the rate at which you add new regrets is logarithmic with respect to time. And this has led to a series of breakthroughs in computer scientists looking for simpler solutions than the Gittins index that still have this optimality of minimal regret. One of our favorites, which I think is the most thematic, is called upper confidence bound. And this says that for every slot machine-- you know, you've got some error bars around what you think the payoff might be. So the expected value would be in the middle of that range. But there's some error bars on either side of that. Upper confidence bound says, simply, always do the thing with the highest top of the range. Don't care about the actual expected value, and don't care about the worst case scenario. Just always do the thing with the highest top of the range. And I think that's sort of a lovely, lyrical note that the math brings us to, which is that optimism is the best prevention for regret. TOM GRIFFITHS: So our third example begins in a different place, which is in the closet. So I think all of us have encountered a problem of an overflowing closet. Things need to be organized, but you also need to make a decision about what you're going to get rid of. And in order to solve this problem, we'd like to be able to turn to experts. Fortunately, there are experts on exactly this kind of thing. So we could consult one of these-- Martha Stewart, who says to ask yourself a series of questions-- how long have I had it? Does it still function? Is it a duplicate of something I already own? And when was the last time I wore it or used it? And then based on the answers to these questions, you can make a decision about whether you should keep that thing or not, give it away to charity, and, as a consequence, end up with a more organized closet. So there are a couple of interesting observations here. The first is that here there are in fact, multiple questions that you should be answering. And the answers to these questions could be quite different. And the other is that, in fact, there's another group of experts who have thought about exactly these problems and come up with slightly different advice. In particular, they discovered that one of these questions is, in fact, far better than any of the others. So this other group of experts don't think about closets. They think about the memory of computers. This is the picture of the Atlas computer, which was a computer which was built at Manchester University in the 1960s. And Atlas had an interesting structure, where it had two kinds of memory. It had a drum which could store information in a way where it was very slow to access. And then it also had a set of sort of magnets which could be used to store information in a format which was relatively fast to access. And when they first built the machine, the way that they were using it was to read off the information would be needed for a computation from the drum, and then store it in the magnetic memory, and then do the operations, and then write it back to the drum, and then take the next part of the computation and read off all of the relevant information from the drum, and then do the operations, then write it back to the drum. But a mathematician who was working on Atlas named Maurice Wilkes realized that there was a better way to solve this problem. He realized that they could make the whole system work much faster if they didn't always take all of the information out of the fast memory and put it back into the slow memory, but rather they kept around the pieces of information which they thought they were going to need to use again in the future. So the reason why this speeds things up is that then you don't have to spend the extra time reading those things back off the slow memory. And as a consequence, the computer runs much faster. So this is an idea which computer scientists now recognize as caching. So it's the idea of keeping the information which you're most likely to need in the future in the part of memory which is most easily and most rapidly accessed. But it brings with it another kind of algorithmic problem, which is the problem of figuring out exactly what those items that you're likely to need in the future are going to be or, to put it another way, what it is that you should throw away. And this is a problem that's called the problem of cache eviction. So cache eviction is something which requires us coming up with good algorithms for deciding what we're not going to need again in the future. And the person who really made the first breakthrough in thinking about this is this man, Laszlo Belady, who was working at IBM. So before he worked at IBM, Belady had grown up in Hungary, and then fled during the revolution there with only a bag containing one change of underpants and his thesis paper. And then he ended up having to then emigrate from Germany, which is where he moved to, again with very minimal equipment. He just had $100 and his wife. So by the time he'd reached IBM in the 1960s, he'd built up a significant amount of experience in deciding what it was that it made sense to leave behind. So Belady described the optimal algorithm for solving the problem of deciding what you should evict from your cache. And this optimal algorithm is essentially clairvoyance. What you should do is evict from the cache that piece of information which you are going to need the furthest into the future. So as long as you can see into the future as far as you need to go in order to make that decision, then you can solve this problem. You can sort of do the best possible solution to this problem that you can imagine. Unfortunately, when engineers have tried to implement this algorithm, they've run into problems. And so for mere mortals, we need to have some different kinds of algorithms. And Belady actually evaluated three different kinds of algorithms-- one where you just randomly evict items from the cache, one where the things which were first entered into the cache are the ones which first leave it, and one where you evict those things which are least recently used. That is, those items which have been used the furthest distance into the past are the ones which leave the cache first. And doing an empirical evaluation of these different schemes, he discovered that there was one clear winner, which is the least recently used algorithm. So basically, the idea is that the information that you used least recently is least likely to be the information which you're going to need again in the future as a consequence of a principle that computer scientists call temporal locality-- basically that if you just touched a piece of information, you're going to be likely to need that information again in the near future because there's a kind of correlation over time in the pieces of information which an algorithm might need to access. So taking this insight, you can build caching systems which work very efficiently in a variety of different situations. So nowadays, caches are used all over the place. So if we look on computers, we find that you'll see multiple chips that are dedicated to caching. There are caches that are built into hard disks. There are other kinds of caches which are used in servers for delivering websites to people as quickly and efficiently as possible. But the one place where these caching algorithms perhaps haven't been applied and perhaps should is back in our closet. And if we look at these ideas that Martha provides us with how to organize our possessions, then as we go through these possibilities, it's clear that one of them might be a better recommendation than the others, which is when was the last time I wore it or used it, which is actually an instantiation of the least recently used principle. So next time you're thinking about trying to organize your closet, it might be worth keeping this in mind, that as long as your possessions obey the same kind of principle of temporal locality, focusing on those possessions that were least recently used might be the most predictive of those things that you're least likely to need again in the future. So this kind of principle isn't just something which is useful in thinking about how to organize your closet. It's also something that might be useful in thinking about how to organize your office. And this was a discovery that was made somewhat accidentally by a Japanese economist called Yukio Noguchi. Co Noguchi is a tax economist. He was constantly receiving reports and papers and documents that needed to be filed away. And he was kind of overwhelmed with all of this information. He didn't have time to file it properly, so he came up with a simple solution, which was just to put all of those papers into a box. But he didn't just dump them into a box. What he did was actually put them into a box in a very orderly way. So basically, he had a box which was sort of horizontally aligned. And he put the information into the box at one end of the box. So as he'd get some new papers, he'd put those papers in at the left-hand side. And as a consequence, you know, the papers would sort of move down the box as new things came in. And then he did something else important, which is that as he used one of those papers, he'd pull it out of the box. And then when he was finished using it, he'd put it back in again at the left-hand side of the box. So this has a clear connection to least recently used caching, right? Once your box fills up, you need to get rid of something. And you can get rid of the things that hit the right end of the box because those are the things which you used furthest in the past and consequently, least likely to need in the near future. But this principle of taking out a file and then putting it back at the left-hand side of the box also corresponds to another idea which has shown up in theoretical computer science. So we can actually show that this way of organizing his information is something which is near optimal, or at least as close to optimal as we're likely to be able to get. So the actual data structure that he had created here is something that a computer scientist would recognize as a self-organizing list. So basically, in a self-organizing list, you have a sequence of pieces of information. And then as you access those pieces of information, you have the opportunity to change the order that those pieces of information appear in. And so this idea of taking the information that's accessed and then putting it at the very front of the list, or at the left-hand side of Noguchi's box, is actually something which turns out to be a very effective algorithm. So Robert [? Tagin ?] and Daniel Slater proved in 1985 that moving the most recent item to the front of the list is, at worst, twice as bad as clairvoyance. So clairvoyance is the best that you could possibly do. And it turns out, this is the only algorithm that comes with a theoretical guarantee of being at least close to clairvoyance in multiplicative terms. So if you're thinking about implementing the Noguchi filing system in your office, it might be reassuring to realize that perhaps you already have. So we normally think about a messy office as being a bad thing. And in particular, a giant pile of papers on your desk like this is something which kind of seems like a poor method of organizing information. But you can kind of think about this as taking the Noguchi system, and then literally turning it on its side. Right? So a big pile of papers is, in fact, a self-organizing list. And if you're taking things out of the pile and then sticking them back on the top and putting the most recently used items on the top of the pile, then you're implementing exactly this relatively optimal strategy for organizing the information that's contained in that list. So if you're somebody who is familiar with these kinds of messes, you'll also be reassured by some of the message in our sorting chapter, where we argue against sorting in many domestic situations. Thinking about caching isn't just useful for thinking about how to organize information around you. It's also something which might give you new insights into how human memory works. So I think there's an intuition that a lot of us would have about memory, which is kind of thinking about it as something sort of like Noguchi's box. Right? You've got kind of a limited amount of space in your memory. And so when you learn something new, you have to put it in there. And when you do that, maybe it pops something else out. And as a consequence, you forget that thing. And so forgetting is just a consequence of kind of hitting a capacity limit in terms of the amount of information that we can store. But a cognitive scientist called John Anderson has actually proposed that there's a different way of thinking about how memory works and thinking that the analogy is less like a box of finite size and more like a library of infinite size. So if you have a library which has infinitely many books arrayed along a sort of linear shelf like this, then the problem that you have is one not of figuring out where to fit information but rather one of figuring out how to organize information. Another good analog of this is thinking about something like web search, where you've got a whole lot of web pages, and you want to organize those web pages in such a way that you can find the things that people are likely to be looking for with high probability close to the top of the list that you produce. So from this perspective, forgetting something isn't that you've sort of had that thing sort of pop out of your memory but rather that the time it would take you to find that thing is greater than the amount of time that you're willing to spend, that the way that information is organized is not sufficiently good to allow you to identify that item quickly. And so this makes an interesting prediction, which is that we should be trying to organize those items in memory such that the things that we think we're most likely to need in the future are going to be the things which we find easiest to recall. And Anderson actually showed some evidence for this. So he took some famous data. This is data from an experiment which was done in the 19th century by an early psychologist, Ebbinghaus, who basically taught himself some lists of nonsense syllables, and then would look at how well he could recall those lists some number of hours after he'd memorized them. And so what Anderson did was look at whether this pattern of recall, where, basically, you get this kind of rapid falloff followed by a slow, sort of long tail could be predicted by looking at how likely it is that we're going to encounter particular pieces of information in our environment. And so what he did was go to the "New York Times," look at headlines in the "New York Times," and then look at how likely a word was to appear in the headline in the "New York Times" as a function of how long ago it previously appeared in those headlines. So he could look at the relationship between the number of days that had elapsed, and then the probability of an item appearing. And he found that this showed a very similar kind of curve. So this kind of correspondence between the probability that something is likely to be needed as a function of how long ago it was used and the patterns of recall that we see in human memory provides one suggestive form of evidence, that one of the things that our minds are doing is trying to organize information in such a way that we are keeping around those things that we are likely to need again in the future. BRIAN CHRISTIAN: Thinking computationally about the types of problems that we encounter in everyday life has payoffs at a number of different scales. The overarching argument of the book is there are deep parallels between the problems that we face in our lives and the ones that are considered some of the fundamental and canonical problems in mathematics and computer science. And this is significant because as a result, there are these simple, optimal strategies that are directly relevant to those domains in our own lives. And there's something very concrete that we can use and learn from. And even if we're not literally going to stop at exactly, you know, 37% or so forth, having a vocabulary and knowing what an optimal stopping or an explore or exploit problem looks like and having a general sense of how the solution is structured gives us a way to bolster our own intuitions when we find ourselves in those situations. And I think most broadly, a lot of the solutions that we explore don't necessarily look like what we think of when we think of what computers do, which gives us an opportunity to actually rethink our notion of rationality itself. So intuitively, we kind of have this bias that being rational means being exhaustive, exact, deterministic, considering everything, getting an exact answer that's correct 100% of the time. In fact, this is not what computers do when they're up against the hardest classes of problems. This is kind of the luxury of an easy problem. And up against an intractable problem, computer scientists turn to a totally different set of techniques. When we take into account the cost, the labor of thought itself, the best strategies may not be to consider everything, to think indefinitely, to always get the right answer in each situation. We may want to, in fact, trade off the labor of the computation versus the quality of the result. And as the book progresses, especially in the second half, we look at what these strategies are for dealing with intractable problems, which most of the ones that we face in real life are. And that leads to this conclusion, that what computer scientists do up against the hardest classes of problems is they use approximations. They trade off the costs of error against the costs of delay. They relax some of the constraints of the problem, and they turn to chance. These aren't the concessions that we make when we can't be rational. These are what being rational means. We explore this line of thinking through a number of different domains. We've talked about three today. We also look at sorting algorithms-- what do they tell you about how to arrange your bookshelf and, more importantly, whether you should? We look at scheduling theory. Every operating system has a scheduler that tells the CPU what to be doing when, for how long, and what to do next. So we look at what the parallels are there for thinking about time management in our own lives. And in the context of Bayes's rule, we think about problems of predicting the future-- how long a process will go on, how much money something will make based on what it's made so far. And, on a personal level, you've been dating someone for a couple months. It's going pretty well so far. Is it premature to book that weekend place in Tahoe at the end of the summer? What should you do? And we provide some rational answers there, as well. And unlike the types of advice that you might find in, for example, self-help books, the insights that are derived from thinking computationally about these problems are backed by proofs. Thank you so much. [APPLAUSE] BORIS DEBIC: Questions? BRIAN CHRISTIAN: Do you want to get more in the light? Yeah. AUDIENCE: So how useful is it to have a proof of an abstraction when the real life doesn't match the abstraction? TOM GRIFFITHS: Yeah. So I think the way that we really think about this is there are definitely simplifications that go into formulating these problems. But what you get out of them is if you're in exactly that situation, you know exactly what you should do. But more generally, you get insights about how those solutions change as a consequence of changing the assumptions. So, for example, I talked about some of the variants on the secretary problem. And from that, you might not know exactly what the number is in your scenario. But you can recognize, OK. Well, if it becomes more permissive, then I should be more willing to spend longer looking before I start leaping. And if it becomes less permissive, than I should have a much more rapid transition from those things. And I think there's a related question here, which is if we think that people should follow algorithms, why don't we just make computers that will solve these problems for people, and then people don't have to make any decisions at all? And I think one of the things that people are really good at is figuring out how to interpolate between these possibilities and how to kind of evaluate some of the fuzziness around the particular problems that we're facing. And so we're really providing tools that can guide those human capacities in terms of thinking about solutions to more realistic problems. AUDIENCE: Very interesting topic. How long it take you guys to write this book? BRIAN CHRISTIAN: There's a quote that we use at the beginning of the book, of the chapter on scheduling. We have, the epigraph says-- it's from Eugene Lawler, who was a researcher in scheduling theory. And he says, "why don't we write a book on scheduling theory, I asked. It shouldn't take much time. Book writing, like warmaking, often entails grave miscalculations. 15 years later, scheduling is still unfinished." So I think Tom and I had an analogous experience, where we-- I think of the book as really emerging in a dinner that Tom and I had in 2011, where we-- I mean, we've known each other for 11 years. These are shared obsessions that we've talked about for a long time. But we basically realized that we wanted to write the same book, what amounted to the same book. And so that was when we decided to team up and work together. And we had what in hindsight was a sort of predictably naive sense of like, oh, it should take about 18 months or something. It'll be out in 2013. So here we are. And you can tell how that plan worked out. TOM GRIFFITHS: I should also point out another terrible thing that can happen to a book is having a small child. So my wife and I had our second child somewhere in that process, which threw everything off. AUDIENCE: Are there any rejected chapters in this book? Are there any areas of life where you looked hard for an analog, and it didn't work out? TOM GRIFFITHS: Yeah, that's interesting. So our rejected chapters are more that there are lots and lots of algorithmic ideas that we would love to write about, but we just ran out of space and time to include. So we've actually got a folder which is called Sequel, which is the only way that we were happy cutting things, so we could pretend. And so that contains a bunch of chapters and proto-chapters that explored a lot of different kinds of algorithms. And in some cases, it was that there's algorithms that have already got good press. And in other cases, it was that we really felt like there was a key insight that those things could give you about human lives. But either you had to get too far into the weeds in order to get out what that insight was or it was something where-- like, basically, the examples we give in the book have been sort of carefully selected for exactly the right blend of you need to understand a certain amount of math in order to get a practical payoff. And some things just didn't reach that threshold. BRIAN CHRISTIAN: We also had a beta version of a chapter on data structures that had a lot of great stuff that we still kind of wish we could have saved. But it just didn't fit under the rubric of the book. So that, I guess, lives in the Sequel folder now, too. AUDIENCE: Do you feel like you learned anything about the limits or lack thereof of what machine intelligences could be capable of? It seems like a lot of these examples right now involve pretty specific parameters for what the problem has to be, but also that there's just a ton of similarity between how humans actually do things and how computers end up doing them. BRIAN CHRISTIAN: This is one where I think Tom can speak as a machine learning researcher. But I would just say, as a preamble, I think exactly how you've formulated the question is how I think of it, which is that a lot of the algorithms that we discuss involve assumptions about the distribution of values. The Gittins index, the way that we present it, assumes a uniform distribution, or the house-selling problem assumes uniform distribution. And often, the hardest versions of these problems are what are called partial information problems, where you have to be making inferences about what you think the distribution looks like on the fly based on the values that you've encountered. And those often turn out to be the intractable problems. And so there is a sense in which human intelligence involves making all of these inferential leaps, but also deciding how to parameterize the problem in the first place. AUDIENCE: Yeah. I think this also relates to Peter's question. So I think as you get closer to the actual problems that human beings solve, you sort of start to hit the edges of the theory. Another good example of this is in the explore or exploit setting, where-- so if you get people and put them in an explore or exploit scenario, you find pretty consistently that people deviate from the predictions that come out of these sort of standard multi-armed bandit models. And they deviate in a particular direction, which is that they tend to over-explore. So kind of at the point where the algorithm is committed and said, hey, just keep on pulling that one lever, people are still like, well, I'm going to go try that one. I'm going to try that one. I'm going to try that one, and still trying out different options. And so that looks mysterious until you realize that it's not the people are being irrational. It's that, in fact, the assumptions of the model are kind of wrong for a real human situation. So the assumption is that a slot machine pays off with the same probability. That's sort of fixed over time. Whereas in a human environment, the payoff probabilities for different options are things that change over time. Right? So they might be sort of slowly drifting around. A particular restaurant is getting better or worse, or a chef has changed, or it's under new management, or-- all of those things are things that over time mean that those probabilities can be different. And so if you're in a situation like that, then what you should be doing is, in fact, exploring more because you need to go back and check on the information that you had. But that problem is what's called a restless bandit problem, and it's one which still presents a challenge for developing sort of good machine learning methods that can deal with it. So I think there's plenty of room to take inspiration from human cognition, as well as room for making recommendations about humankind. BORIS DEBIC: And with that, let's please thank Tom and Michael for coming to Google. BRIAN CHRISTIAN: Thank you so much.

Info

Channel: Talks at Google

Views: 158,375

Rating: 4.8625364 out of 5

Keywords: talks at google, ted talks, inspirational talks, educational talks, Algorithms to Live By, Brian Christian & Tom Griffiths, Brian Christian, Tom Griffiths, algorithms, what algorithims should I live by

Id: OwKj-wgXteo

Channel Id: undefined

Length: 67min 27sec (4047 seconds)

Published: Thu May 12 2016