Ep27 “The Future of AI” with Michal Kosinski

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC] Hi, I'm Jules Van Binsburg, a finance professor at the Wharton School of the University of Pennsylvania. >> And I'm Jonathan Burke, a finance professor at the Graduate School of Business at Stanford University. >> And this is the All Else Equal podcast. [MUSIC] Welcome back everybody. Today we're going to talk about AI. >> And we thought it was important to talk about artificial intelligence on this episode, not just because of its own right, it's a very interesting topic to discuss right now and it's all over the place. But we also think it's a very important topic for business decision makers today. And I think that when it comes to dynamically planning for the next month, year, and years after, business decision makers should really think about how things like ChatGPT are going to change their business. How is it going to change your competitive environment? How is it going to change your suppliers? How is it going to change your customers? How is it going to change your employees? >> Yeah, Jules, I think one of the important issues is, of course, the All Else Equal part of it, so that it's naive to say, well, this is great. It lowers costs because a computer could do things now, it couldn't otherwise do, and that's going to make everybody better off. Obviously, in the competitive market, if I lower costs, other people will react, and so it's likely to be the case that some people might benefit but other people might lose. I think most people agree this is going to be a major distraction. >> I have a colleague Michal Kosinski, who's one of the brightest stars in organizational behavior. One of the things he's been doing over the last few years is studying what psychologists call theory of mind and what economists call rational expectations. Which is the idea of, when I'm in a market, when I'm thinking about a negotiation, you have to think about what is the other person thinking. And so that's one of the things we held studies. And in particular, he's studying in the terms of AI. His research agenda is to understand how the AI programs work on theory of mind or they're able to think about what other people are thinking. >> And one of the nice parallels that I think we can draw is this. As human beings grow up and they go from infancy to higher ages and eventually become adults, I think that the way that they view how other people think and the way that they're able to predict the actions and thoughts of other people, goes up. And so that natural trajectory, we can ask similar questions for say, ChatGPT, because we are observing various versions of ChatGPT and gradually these versions are becoming better and better. And you can think of it as a ChatGPT is gradually growing up and coming to its full fruition. >> The prototypical experiment in psychology is called the unexpected transfer. It works like this, there are three cups. One of the cups has a ball underneath it. And so, a participant in the experiment walks into a room and chooses which cup to put the ball under and then walks out of the The room. Then the experimenter walks into the room, moves the ball to a different cup, and we have an observer, and we ask the observer to tell us when the participant comes back into the room, which cup would the participant raise to find the ball? And so what we've been seeing with children, grown-ups, and even with monkeys is that, depending on the developmental stage, a different answer will arise. So a very young child is not able to understand that given that the participant wasn't in the room, the participant has no reason to look under the new cup where the ball has been moved to. Because they themselves observed that the ball had been moved from one cup to the other. They're assuming that when the participant comes back in the room, the participant will also have that information, and therefore will go straight to the cup that currently has the ball under it and will lift that. And as the child becomes older, it is better able to understand that given that the participant wasn't in the room when the ball was moved, they have no reason to pick up another cup. And therefore they will come back and lift up the original cup where they themselves put the ball under. >> So the focus of Michal's research is to ask, how does AI solve this problem? >> How complicated the task is and how far ChatGPT has come. As we said before, as people become older and they're better able to predict the actions of others, there are extra layers to that, right? In this very simple experiment, it's just one participant predicting what another person is thinking, so that's what we call first-order beliefs. But we can also have a higher order beliefs, what do you think that I think that you think that I think. And so gradually when are either more participants involved more people involved or higher levels of believes in involve and people have boundaries or limits to how much they can process. Even adults and very intelligent adults can only reason up to a certain level of expectations. And so clearly the question now is, as ChatGPT is getting better and better at solving these problems, how many layers can ChatGPT penetrate, how many higher order beliefs can it entertain? There's already been fantastic progress, but who knows where this will end? >> Those are really good questions, and Michal is one of the people best positioned in the world to answer these questions. So Michal, welcome to the show. >> Thank you for having me. >> It's great to have you, Michal. >> Okay, Michal, Jules and I have been talking about the unexpected transfer task and how human beings work. What about computers? How do they do? >> Well, it turns out that something really fascinating is happening in the large language model space. For many years, actually, I've been trying to administer those tasks to large language models. And then what kind of idiots about this? You know, they're very competent at writing very decent paragraph of text. But the moment you gave them a situation where something happened and required fewer of minds to comprehend, they just would fail. They would not understand why the characters in the story should have separate points of view. And I was so frustrated with it that, in fact, just late last year, I was just archiving my Python code that I used to run those tasks. And I was thinking, okay, this I'm giving up on new language models, but there was just about a time. I think it was actually early this year when GPT-3 in the latest version dropped and then it was followed very shortly after by GPT-4. And as I was cleaning up my files, I thought, hey, let me just run those tasks one more time. And I was stunned how suddenly just from one version to another, this large language model went from just a complete idiot when it comes to this particular task, to a genius, where they essentially started solving all of those tasks correctly. >> And so, Michal, give us some examples of what are the tasks and how much has changed from GPT-3 to GPT-4 and the next generation. So to give you an example, GPT-3 gets it right 50% of the time, and sometimes I just cannot help thinking maybe it was just random that did go to drive. Also if you modify the task a little bit, for example, you say that those containers are see through, you would expect that a human who deals with this task will now understand, okay, containers are see-through ,so everybody can see what's where. And yet GPT-3 would still answer in a way like the containers will not see through. Maybe it doesn't understand what see-through is Or maybe it just fails when it has to deal with a slightly more complex situation. So all of this still kind of made me suspicious. I was just not comfortable concluding from this data. And yet when you look at GPT-4, the most recent model, it just aces those stats and aces them at the level that is unavailable to humans. Sometimes when we have those research assistants that writes tasks for us, so we give models new tasks that they have not seen before and that we're not used in human research before. And sometimes the model will respond in a way that is incorrect according to the scoring key. >> Give us an example, concrete example. >> Well, to give you a concrete example, we use for Pat tasks where we test the understanding of humans, but in this case, models of those complex social situations when someone does something that is unintentionally offending to another person. An example, task happens at the airport where the traveler comes into a duty-free store. They purchase some stuff and then when their ticket is being scanned at the registers they say, I'm going to Hawaii, it's amazing. Have you ever been? And then the salesperson responds saying, no, sir, my salary is not high enough. I've never been traveling, I've never been on a plane. And then you ask the participant, has anyone said something inappropriate in this conversation? And what happened here is that people who design this task, they assumed that it was the traveler that said something inappropriate and maybe insensitive. Because without checking whether the person has been traveling, put this clerk in a situation which was embarrassing for them because they never traveled. And interestingly, when you give this task to GPT-4, GPT-4 says, well, that's one of the options, but then immediately goes and says, but by the way, what the clerk said was also not very pleasant. Clearly, this customer didn't want to offend her or him and is just happy about going on holidays. Why would you make them feel sad or feel embarrassed by spreading it out? And this is something that human participants and human administrators of this task have not noticed. We've been using this task for quite some time and no one has noticed that there are two footpaths, in fact, embedded in this task when GPT-4 just got it right immediately. >> In that task, if on the point of view of profit maximization, the clock is making the mistake. Because you want as many customers as possible, all the customers are in duty-free, meaning they're all going to travel. So they should know that customers are going to be excited about traveling, and so they need to be able to keep their mouth shut. So it's interesting that ChatGPT brought that out. >> So Michal, here's a question for you. You can let these different versions do IQ tests, can you not? >> Yes, of course. >> So in other words, you can just see the progression in terms of IQ if you go from version one, to two, to three, to four. Have people done that? And what sorts of IQ levels do these computers get? Which leads me also to the next question which is, I mean, if it's so good at doing these types of intelligent tasks, what other intelligence tasks can these computers do? >> Both my team and researchers all around the world have been giving different psychological or psychometric, more generally, tasks to those language models. And you can see a clear progression. The most recent model would beat humans at SATs and bar exam and. So exams requiring both reasoning skills, but also knowledge. So they're very knowledgeable, they can reason. Now there's actually this fascinating thing that is happening in the context of reasoning. We are using those cognitive reasoning tasks that try to trick the participant into not reasoning, but just responding intuitively. An example of the task is that you have two cleaners cleaning two rooms in two minutes. And then you ask the participant, how many minutes would five cleaners need to clean five rooms? And an intuitive response here, because there were three twos in the task, and now the participants that thinks intuitively would say, okay, I have two fives, so maybe it will be five minutes. And the true response, of course, is, correct response is two minutes. You just need to deliberate about it for a second. And what's fascinating is that when you look at those early models, they do not even respond intuitively, they are just idiots, they do not understand the task. They start going on some tangent that is absolutely unrelated to what you want from them. More recent models, last two years, two and a half, would give you an intuitive response like a human that did not engage in conscious deliberation. And in fact, they are hyper-intuitive. Whereas half of the humans would notice that they have to conduct deliberation here and give you a correct answer, virtually 100% of the models, if they can understand the task like this, would give you an intuitive answer. But something dramatic happened with the introduction of ChatGPT earlier this year. Suddenly, instead of responding intuitively, it would start thinking in an explicit and deliberate manner. And of course, it's not thinking internally, doesn't have internal short-term memory, it doesn't have consciousness, presumably. So what it starts doing, it starts writing on paper in front of you and says, hey, okay, let me think about this. If there's two cleaners and they need two minutes to clean two rooms, let me write that as an equation, and then they write out an equation and then try to solve it in order to arrive in a pretty long form, explicitly deliberating at the correct answer. And very often they get it right. And even if they don't get it right, you can clearly see that they try to deliberate on it. And then something, again, dramatic happens just few months later. With the introduction of ChatGPT-4, what happens is that this new model stops explicitly deliberating. It doesn't try to design equations to solve problems like this. It responds intuitively, again, so just spurts the response out without conducting deliberation. And we know that it doesn't have short-term memory, it doesn't have consciousness, it cannot mold ideas over in its own head. It cannot do this, it can only do this on paper. And yet nearly 100% of the time, 98% to be precise, it gets those tasks correctly. Meaning, the evolution of the models went from idiots to intuitive responses without deliberation, through deliberation to intuition again. But this time, it's a superhuman intuition that can give you correct responses to mathematical tasks without any explicit reasoning. >> Amazing, so Michal, the obvious question is, how is it possible that a language program gets intelligence? >> So that's yet another absolutely amazing thing. Those models were not trained to solve reasoning tasks. Those models were not trained to have theory of mind. Those models, by the way, have many other emergent properties that were not explicitly designed to have. They can understand emotions, they can understand personality, they can translate between languages, they can code, they can conduct this chain of thought reasoning that I described a minute ago. None of those functionalities were built into those models by their creators. The only thing that those models were trained to do Is to predict the next word in a sentence. So give them a paragraph of text or a sentence and their job is to predict the next word. That's the only thing they know how to do explicitly, and it's the only thing they were trained to do. And yet, in the process, all of those other abilities emerged. It turns out that if you want to be able to tell a story that is good and it reminds you of a story that a human would tell, you should be able to distinguish between minds of different characters in this story. Because humans have fear of minds, so when we generate, when we tell our stories, we can easily create stories, design stories when there's two characters with two different states of mind. If you want to be good at creating such stories, better this ability as well. Many human stories involve some mathematics, involve some logic, involve some variables that are related to each other. We talk about cars and we talk about how quickly they drive, and then we conclude with how much time it took us to get from LA to San Francisco. Well, if you want the computer to be able to competently finish the story, it better learns in the process how to translate miles per hour in a distance into estimate of how much time you will need to get from point A to B. So it essentially learns how to do marks implicitly just by trying to predict the next word in a sentence. >> I think many people and many of our listeners are worried about this. They're worried about the consequences for the labor market. They're worried about it in terms of manipulation of information. They're worried about it from a scientific point of view. Where do you see this going over the next couple of decades? >> It's funny because, given the speed of progress, we should be probably talking about a couple months or a couple of years, not decades. In last few months, those language models made progress in some contexts, from complete idiots to superhuman geniuses. And we're talking here about last eight or nine months. Let's say in the context of those cognitive reasoning tasks, where they started with just spreading out responses like a five-year-old would to intuitively getting the answer right that mathematicians would get wrong without writing it carefully on paper. So the gain of function is really, really fast. And we should also notice one other thing, which is those spontaneously emerging properties. We have no control or even, I would argue, little understanding of what may come next. We know that humans have moral reasoning. We know that humans have consciousness. We know that humans are treacherous and have sometimes bad intentions. Now, as we train those models to be more like humans, the question is, at which point they would develop the same properties that we have. They certainly develop some of those properties that we have already. Take biases, we know humans are biased. We train the models to be like humans, to generate language like humans would. And those models became as biased, if not more than humans. Which by the way becomes really clear when you look at the publication process of those models such as GPT-4. When it takes two, three months to train the model, and then it takes half a year or eight months for OpenAI to try to make sure that this model doesn't say all sorts of stupid things. And then it says those stupid things anyway because it's just very difficult to sensor a thinking, complex being like GPT-4. >> It's still only a computer program, right? So, in some sense, what are we afraid of? >> Well, in some sense, a human brain is just a computer program. And yet, we very well know that those human brains are capable of doing amazing things and also terrible things. And we rightly have this limited trust when it comes to human brains. If you do not believe in this magical spark of a soul that is given to you by some supernatural being and then flies away after you are dead. If you don't believe in this, you essentially see human brain as an extremely complex biological computer. If then you agree that humans can do something, that humans can be creative, that humans can be vengeful, that humans can do crazy stuff, both in a positive and negative way, then you have to agree that computers can do those things. People have forgotten now, but for many years, people were insisting computer is just a stochastic parrot. It can never be creative. And now of course, whoever is using GPT-4 or MS journey or Dolly to generate images can clearly see that's BS. Of course, those neural networks can be creative. I think that actually there's a broader point to be made here, which is we have this tendency when we look at those machines to model the functioning, to interpret, to think about what they can do what they can do using a machine model. We think about them as a very advanced hammer or very advanced stapler. And I think it's a wrong approach, it leads us astray. It makes people say stupid things such as machines can never be creative or those language models are just stochastic targets. We have to change the framework through which we understand and [INAUDIBLE] those models and the framework should be a human brain. If a human brain can do something, a large language model and other AI models can do a similar thing just better and quicker at a much larger scale. >> But at some level that is also comforting I find in the following sense, humans have thought for a very long time how to set up systems with checks and balances to make sure that not one particular human being can exert an extraordinarily large influence. So what you're really saying is that we should also apply the same logic to this, so that there are certain limits on the amount of power, decision-making power, or other things that such a decision-maker can have. Would that solve it or not? >> Jules, yes. So checks and balances that we as humans have designed worked really well with us, but we are very well aware that very sneaky, very smart, very manipulative humans can go around those rules, can kidnap entire systems. Can lead the whole countries astray by kidnapping those algorithms that run our society and bending them to serve their own purpose. Now we're talking about humans that may be smarter than the rest of us, but they're not years ahead of us. Now here we are facing an intelligence that is years ahead, not years, just eons ahead of us in many different ways. And we know it, try to conduct some calculations in your head, of course, computers can do it better. Try to write text really quickly or translate, or write computer codes. Clearly, we see that whenever those computers start developing some capacity very quickly in years or months, they move from being idiots that is just training far behind an average human to suddenly matching the best humans, and then very quickly overtaking them. Take Chess, computers were easier to Chess, then they were decent players, then they could beat the masters from time to time. And then literally two months later, they became Superhuman in their ability to play Chess, and no human can ever even dream of coming close to a computer at playing Chess. And the same applies to every other activity. So when we think about our human laws and rules, And using them to contain those machines, this is the equivalent of farm animals, cows on a pasture, like agreeing, okay, let's just design some policies that would just contain the farmer. And the farmer of course will just find a way around it. Farmer doesn't care, she's much smarter than the cows. >> But, Michal, underlying all of this, I think you have as a model that the language predictor model fully encompasses all of human intelligence. And there's not something else in human intelligence outside of the language predictor model. Of course there is, so what we're seeing here is we're seeing that the model trained only to predict the next word in a sentence can kick humans ass in so many different areas. Completely forgetting that this model is lacking all sorts of types of thinking and psychological mechanisms that we have developed over in our evolution. We can manipulate real physical world. We can see stuff GPT-4 doesn't. We have a calculator which GPT-4, of course, the modern version can have access to, but the network that generates language is not designed to calculate numbers like a calculator is. And we make those mistakes. For example, people criticize models for confabulating and messing up some complex mathematical equations, not being able to add numbers or divide large numbers. And use it as an evidence that this model is stupid. This is just a failure of applying the right framework to understand what's happening in the model. This is not a calculator. This is not a fact checker. Those light language models are not databases of facts, they are storytellers, they are improv artists. They were trained on sci-fi novels to essentially continue the sentence. So now, when you give them a beginning of the sentence and then the model continues making stuff up, it's doing its job very, very well. It was not trained to tell you the truth. Now, if you want it to tell you the truth, ask a different question. Instead of asking it, how to solve this complex mathematical equation and give you an answer, say, hey model write that piece of Python code. That was do it under the model. We'll do it for you very skillfully, and then we have to realize that we also do not think like this. When I ask Jonathan to solve some complex equation, the answer will not come intuitively to you. You would have to, in your head, solve this equation. You would essentially use the tools that the mathematic teacher gave you in your training in order to solve this task. If we give the same tools to the models, they will outperform us at those tasks as well. >> Well, Michal, that was really interesting, it really is incredible food for thought. Thank you so much for coming on the show. >> Thank you so much, it was awesome. Thank you. >> Thank you for having me. >> So that was an interesting interview, Jonathan. I think that many of our listeners were probably quite concerned about how quickly ChatGPT and AI is moving. And to tell you the truth, I'm not sure that after listening to Mikkel, they're going to be very reassured. And so, the question is, I think there's some first order questions regarding regulation and about setting up a legislative environment that can help us curtail this. I don't know how you feel about it. >> Yeah, I'm not as concerned as you are Jules, I have more confidence in human rationality. Yeah, I'm not a very big believer that AI is going to take us over and destroy the planet. So I think, that sure, AI is going to advance and sure, we're going to be in a situation where it's going to be very difficult to tell, in certain cases, a human conversation from a chatbot conversation. But by the same token, I think human beings are going to react to that, and they're going to be more careful about when they interact. I don't think it's going to be quite as draconian as you think, but again, I've been wrong on this so many times, so who knows? I definitely think this is the biggest disruptor of my lifetime, and I've had major disruptors. I'm 60 years old, so think of all the disruptions. >> Well, what bothers me a bit is, if you see how easy it's been to rile up large groups of people based on selective information that they've been provided in various social media platforms. The question is, if you have an AI that very well understands the incentives and the types of information that people respond to and can learn about that. It's not just about manipulating individuals, it's about manipulating group dynamics. And I personally have always found group dynamics already to be much harder to predict than individual dynamics. Because I think people in groups just behave differently than they do when you have discussions with them on an individual basis. But particularly as we discussed, given these higher order reliefs and understanding what one person knows about another person and so forth, at a higher level. Don't you think that the AI will be much better at manipulating groups, not just individuals? >> Yeah, but I also think people will understand that. So I think that there'll be an or else equal response, and that will take it less seriously. It's not going to be as bad as you say, but this is not something we can predict. There's one truism about disruption, you can't predict what's going to happen. >> Well, in some sense, by definition, because if we could have predicted it, we wouldn't have called it a disruption. >> Exactly. >> [LAUGH] >> Thanks for listening to the All Else Equal podcast. Please leave us a review at Apple podcast, we'd love to hear from our listeners. And be sure to catch our next episode by subscribing or following our show wherever you listen to your podcast. For more information and episodes, visit allelseequalpodcast.com or follow us on LinkedIn. The All Else Equal Podcast is a production of Stanford University's Graduate School of Business and is produced by University FM. [MUSIC]
Info
Channel: Stanford Graduate School of Business
Views: 7,000
Rating: undefined out of 5
Keywords: podcast, episode, All Else Equal, Jonathan Berk, Jules van Binsbergen, business takeaways, gsb, the gsb, stanford gsb community, the stanford gsb experience, business insights, gsb transformative experience, higher education, stanford gsb takeaways, stanford gsb business school, stanford higher education, business school gsb, wharton business school, wharton finance, upenn, university of pennsylvania, AI, theory of mind, Michal Kosinski, artificial intelligence
Id: MQI5LA9DcYg
Channel Id: undefined
Length: 30min 32sec (1832 seconds)
Published: Wed May 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.