Michael I. Jordan: An Alternative View on AI: Collaborative Learning, Incentives, and Social Welfare

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so our closing event is a special lecture given by Mike Jordan who almost I think does not need an introduction but I'll still give one um Chen distinguished professor at UC Berkeley and Mike has been a leader in computer in the computational and and uh mathematical study of learning for a long time and he's achieved such prominence that I think in 2016 he's been named one of the most influential computer scientists on Earth and so I'm just looking at his computer right now of the most influential computer scientist has been recognized for his work by many contributions for many awards these are members of the National Academy of Science of the National Academy of engineering of the Royal Society he was the inaugural winner of the World laurates award last year in 2022 and he receives a John Von Neumann medal from IEEE in 2020. if I were to list all these Awards I think we would be here for a long long time but I would say that personally there are three things that amaze me about Mike Jordan the first is his range of Interest they span an enormous array of fields that goes all the way from computer scientists to statistics to control Theory signal processing mathematics information Theory cognitive science and now economics and maybe we'll hear about a bit about this today and I think the range and the breadth of his interest is just very unique there's another thing that is unique about Mike he's his track record of training the next generation of students I think if you look at I think he has an increase in very impressive CV but I think if you look at the CV the thing that impresses me the most is the name of his grad students because your students and we have one right behind you liwa we have another one right there so we have already a lot of various they are actually they make up the host who in machine learning today I mean so if you open who's who in machine learning and ask the question is this one of my students the likelihood of a yes is very high and so and I think it's done this by fostering an environment of inclusivity and curiosity that has really moved the entire field and so I think the whole field is grateful for this the third observation is more personal um I think Mike's age is public information at least if Wikipedia is correct so you've been at the top of your game for a long long time and this is really unique and the way I think about Mike in the evening is I think is a kind of the Roger Federer of science someone who has been dominating the circuit for a very very long time with no signs of slowing down Mike welcome to Stanford all right thank you that was perhaps the most fun introduction I've had ever I'm when I next time introduce Emmanuel which will happen I'm sure we have to think about the right metaphor maybe someone has an idea help me out there um I I got to get my mind off that because it's a fun thing to think about um so I'm a pleasure to be here I am a data scientist and in fact it's kind of you know quote that I was like a influential computer scientist is kind of funny to me because I'm trained as a control theory of statistician and I was never a computer scientist I I embrace it because of the entrepreneurial spirit in computer science and just let's try everything I love that and I found less of that in control theory and statistics so that's great but it'll actually I'm a data scientist I really want to think about how data and inference can inform real world decision making and I think that's where it's at I think that it's in the first time my career that sort of all of Campus agrees it's sort of uh you know it's just not the truth you know technologists inside of a computer scientist or statistician wasn't enough this is much more fun and indeed it's economics which is the most thrill thrilling to me right now the connections to economics so that's really what I want to mostly conveyed eyes why I think that's thrilling and important and so on all right the elephant in the room is this thing called Ai and I've never thought of myself as an AI researcher I never aspired to the frankenstein-like thing of create some you know thing and I kind of want to say why not only didn't aspire to it I don't think it's right but it is what everyone's talking about and so I want to say a few things about it so first of all uh the thing that triggered all of this was back propagation uh gradient descent and layered neural networks there's all these other ideas along the way you know unsupervised this and that which didn't really quite pan out but back propagation had this huge impact Dave ramelhart was my advisor and at UCSD and he developed that uh you know it's just it's just great in a sense so you can't say he invented it but he invented the idea of doing it later neural networks and applying to all kinds of problems and he took about a year to do that it was not trivial for him and he was not trying to be an AI person he was just trying to understand learning um I think he'd be somewhat shocked that you know suddenly that becomes AI um to this era all right so um what I think is happening really is not that we have a new technology we don't have this new brilliant idea Ai and then we start applying it everywhere I don't think that's the right way to think about it um and so I think you need to go back in history a little bit and think about uh uh engineering fields that have emerged so like in the 40s and 50s a clinical engineering became a thing the name was actually used I kind of looked into it a little bit more like already by 1890 I think there was a department of chemical engineering at MIT um but a chemical engineer really wasn't a you know it was it was very simple chemistry it kind of done its bigger scale than before but you know polymers and and and and all you know all the kind of things that have triggered you know the revolution that we're all living in that happened in the 40s and 50s before that there was quantum chemistry and there was fluid dynamics and there was you know thermodynamics and also there was a lot of well deep understanding of the phenomenon and people then were able to start to envisage what if I take the Laboratory test tube experiment of how you put molecules together which I do understand because of the quantum chemistry and I do that at a huge scale in a field somewhere uh will that work and of course it didn't really work often those things would explode often they just wouldn't deliver product and and so on so forth but over 20-year period in the 40s and 50s it kind of started to get worked out and an engineering field emerged that had huge impact on all of our lives uh electrical engineer I know less about the history but you know obviously there was Maxwell's equations before there was electrical engineering so it was a full understanding of the phenomenon at some level but it wasn't clear how to bring electricity into homes how to make it you know safe how to you know do communication on top of the waves and so and so forth so a whole field emerged which we now call electrical Journey that did all that you know in the early part of the past Century took a couple of decades again all right so I think that's what's happening right now we have a new engineering field emerging I wouldn't call it AI but it's it's a a field that's based on flows of data networks uh flows inferential ideas large-scale decision making you know Cooperative Endeavor building Transportation Systems you know Commerce systems Healthcare Systems is all part of this engineering field okay that's really what's happening and so it's the first engineering field that has got as its objects of study not just bits and information and atoms and and laws of physics it has humans involved critically so utilities and and aspirations and so on and so economics has minimally got to be involved but the rest of the social sciences as well all right and the implications are vast all right so actually that's the phenomenon and it's going to take 20 or so years the difference though with these others is that it was a deep understanding of some underlying phenomena there we don't have that now we do not understand intelligence I can assure you um so we're calling it AI as if we got this thing this understanding and then it leads to you know technology and I think that's kind of backwards um all right so let's just say why did this happen well first of all there was a you know there was McCarthy and and so on in the 50s who invented this terminology and for good reason um there was a really it was a philosophical uh aspiration almost uh you know there have been discussions of mind and body and you know and now we have a computer it has software and Hardware it looks like mind and body and looks like we can now make Headway on that and let's talk about putting thought in a computer that's really interesting thing to talk about uh and of course you know people got excited about that notion and worked on it we don't have thought in a computer to this day and it's not clear why we really care in some sense um you know it'll somehow emerge and we'll call it thought but it's not clear what that means um and in the meantime that's not what happened you know what computers started to do was Aid humans they became complementary you know search engines and translation systems and all that aided our own intelligence and expanded it and and networks you know expanded a planetary scale um so let's not call it you know uh McCarthy's version of intelligence for sure but that that aspiration still exists some people who study Psychology and Neuroscience and you know and and core AI whatever that is they are working on that it'll take 100 years in my view but it's a worthy thing but we should be waiting for that because in the meantime all these systems are being built in the real world that are having this huge impact we should understand the phenomenon all right now the other part that happened wasn't really so much McCarthy but others uh it had to be autonomous okay why did the AI have to be autonomous well if it's not autonomous if it's Tethered to me it doesn't seem so intelligent it's Tethered to me okay and if it's developed by vast numbers of humans as Engineers who built something it doesn't seem so intelligent so it had to kind of be built by small numbers of people and had to cover all on its own now that's a okay kind of a science fictiony kind of aspiration but it's bad idea for technology you don't develop technology that way okay you don't want self-driving cars to be autonomous they should be highly networked so you think about the overall traffic system so they you don't never have an accident all right just like air traffic control you don't want autonomous airplanes okay so yeah autonomy so there'll be a lot of cartoons in this talk you know I don't mean it's never a good idea just so a burning building I want an autonomous robot you know up on Mars I want to some level autonomy but for most applications I don't want the intelligence to be autonomous I want it to be Federated LinkedIn it's transparent you know Cooperative all those things okay so I think this was a big mistake to add that to the list I think it became kind of about bragging rights look at my autonomous AI how great it is and it's better than your autonomous Ai and again this is all kind of fun in games for like 40 years but it's no longer fun and games it's actually going to hurt the planet all right so here's a Counterpoint which is that first of all this is kind of maybe an obvious statement but uh if we want to talk about intelligence there's not one kind of intelligence it's not just human intelligence it's as much about the collective as it is about the individual and an economist thinks this way all the time they they recognize that a market is composed of many small Decisions by entities they don't have to be them intelligent themselves they just have to kind of know a demand curve and you know and follow you know some of their nose and you're not using huge intelligence that you know within it but the overall Market becomes really intelligent it could do things like bring food into cities you know rain or shine you know at any scale for hundreds of years and and it can create all kinds of opportunities and you know and then there's like ant swarms we talk a lot about you know not individual ant might not be so smart but the Swarm could do amazing things so we're all aware of that but too dimly I don't think we understand that we could be creating new kinds of collectives that are really exciting that do new things as human beings that's what's opening up to me in the era not the Super intelligence replacing a human look at how great that is all right so um in particular if you're gonna you know be a little less exuberant but you're going to say what are the goals for this emergency engineering field it's not make a super intelligence in a computer and you're done right it's rather what is the overall object like the factory in the field is it a transportation system is it a logistics chain is it a Health Care system is it a communication system designed for that level and they think about what the components need to be and what data is needed and all that it sounds more boring than a typical AI person's talk which is we'll solve intelligence and then the intelligence will solve climate change that's a typical Silicon Valley thing to say um it sounds great and you're right get rid of the New York Times but really to me Logistics chains and Supply chains are much more exciting and interesting and important for human life and Healthcare all right and it's not that the AI is going to solve health care it's us designing really good systems with good data science principles and and economic principles all right so I think I've said all this mimicry is just not a good way to think about the implications of collectives autonomy is also uh maybe illusory um and so there might be new forms of collectives okay all right so if you want to read a little bit more about this kind of philosophical ruminations I wrote an article uh our officials that Revel has it hasn't happened yet three years ago and I still very much stand behind everything in there even though we've had this kind of upswing and you know surprising you know Chad gbt abilities uh this was like about where's the data come from what's the provenance what is the bigger scope you know and all that and so if you want to read about that uh and then there was a bunch of commentary by including Emmanuel Candace by some illuminaries and it was quite a lot of fun so um and there was my response to those luminaries and then with some colleagues mostly social scientists all down here we wrote a paper about two years ago how called how AI fails this and it's less about the kind of Economics perspective that I was pushing up above and more about um what are the implications for the you know technology if you've got like autonomous systems being designed by small numbers of people um you know that kind of incentivizes you know entities like open AI that you know they get vast amounts of money for a small number of people they build this thing and stays they it's not for everybody it's State control it and it was supposed to be open it's no longer open and and and and and so this this idea that AI is the future it just has a natural tendency towards making it be in the hands of small numbers of people and and again I think that's this article kind of gets into some of the social science reasons why that's just a really a bad idea and people pretend that it's not happening it's all open and all that but it just that's a pretense it's just not true all right so if we don't if we stop thinking about AI this way I think it'll it'll actually liberate it's from that all right so again I've already sort of said this but just to lean in a little bit more you know McCarthy had this imitative perspective it was a great aspiration Still Remains one it's just not what's happened what really happened was more like IA that's Doug inglebart there who kind of talked about uh technology should augment our intelligence and for certain it has the search engine has augmented my intelligence more than just about any other piece of technology I can think of in addition to everybody's intelligence um and and then but this third bullet is kind of what I think is a better description of what's really emerging it looks kind of like Internet of Things they've got all these little devices around they all send data around and decisions are being made uh it's all delocalized and everything but the Internet of Things was a little too computer sciencey it wasn't thinking about the data and the inferences and the predictions and the people it was just about put things on the internet um but anyway that's still the right spirit and I think this is really what's happened even like the pandemic response of the planet right that was a engineering system that sort of didn't really it worked okay um but it but we could do better now if you go to an ml person or an AI person and say okay aren't you guys thinking about this uh you know is it all this classical AI stuff and they said no no we work up all this we here's here's for example Federated learning it's decentralized learning all right so uh you have a server up there and they're collecting data from a bunch of edge devices and then they're analyzing the data centrally and we're worried about privacy and all that so we we got the social stuff this is our you know we handled all the social stuff all right um now meet a little bit again cartoonish year but um that terminology of Federated learning you know a number of groups were working on it but it's a Google patented and not patented but it's a Google terminology because Google wanted to collect a lot of data for their speech engines right and so everybody has cell phones and it's talking on their phones let's just collect a lot of data from that let's worry about the compression let's get the gradients back cheaply let's also do some differential privacy and that's the technical problem if we solve that wow um right so but what's missing in this picture all right um well I'm going to give some examples of what's missing to make it more clear but what's missing is that these are actual humans here and they have their own values and goals and aspirations and they want to join this Collective for some reason they don't want to just be assumed that they are in the collective because they want Google to build a bitter speech model okay so the nodes are often people and they value their data and and by data I don't just mean like where I went today and you know what was around me and all that I mean things I created works of art uh things I wrote uh songs I wrote et cetera Etc that's my data that's stuff that's on the internet now that's being exploited by other companies and I've lost all value that's wrong um all right so we need to talk about costs and benefits of these these decentralized paradigms where learning is involved so we need learning aware mechanisms and mechanisms that we're learning mechanism is a economics terminology and I want to get into that so just I'm going to give some more kind of industrial real world examples but as an academic you know I needed to kind of think a little bit about uh what's happening academically is are we kind of set up for this emerging discipline whatever you want to call it um and uh I'm not sure we are so you know the three disciplines it's not really the you know disciplines it's the styles of thinking that I think are most important here and I don't want to exclude anybody but uh you know computer science certainly the algorithms the networks and so on statistics and economics and just to say there are pairwise interactions among these fields for quite some time computer science meets statistics that is machine learning in fact I would argue machine learning is just statistics with kind of a computer science way of thinking okay every time I see a new idea in machine learning I know that it already exists in statistics I could and I tell people that they get mad at me but they you know eventually it kind of um and there's lots of ideas as soon as they don't yet know about too um you know I could give lots of examples but I won't um statistics meets economics that's econometrics and I've got hito and others in the audience who are you know masters of that well it's it's great but it's kind of about measuring the economy that's what the main goal has been doing the causal inference to measure the economy and it's less about algorithms and mechanisms and Engineering kind of things artifacts okay so it had it's had its important role but it's missing that third leg in economics means computer science that's called algorithm and game theory that emerged you know 15 20 years ago uh it's very important to field study study of auctions and commentatorial auctions and how they behave in incentives and all that what's missing there is they have no statistics they don't worry about Gathering data and changing the preferences and learning them from as part of the auction and all that so all three of these pairwise things exist but they are critically missing the third leg now the interesting is if you go into industry and I spend the day a week at Amazon and you look at any real world problem they are studying you know like how do we provision how do we interface with third-party sellers blah blah there's always all three disciplines around the table and just to add there are always operations research people who already have kind of ingested all three disciplines just to say and control theorists and mathematicians and so on so I don't mean to exclude anybody but it's never one of those perspectives alone though that that kills you if you just have one of those perspectives you need all three all right here's a real world example that I've been involved in um uh so I'm a musician I listened to be an academic and um I met up it's I have a friend Steve Stout someone introduced me at some point Steve is a legendary producer entrepreneur you know well-known in the hip-hop and the Latin world and and so on uh and he and I kind of came together on this idea of uh you know modern data modern systems platforms should not just be about taking streaming bits music shouldn't just be about streaming it should be about creating two and three-way markets okay and so the idea that we originally sat down and talked about and Steve is the CEO of now a company that has taken this and made it real it's called unitedmasters.com or United Masters is the company um basically provides a three-way market so if you make music and now there's you can sign up with United Masters they give you a a record company in your pocket you're able to kind of produce songs on your cell phone and upload them to the United Masters and then they connect that to uh to a market on the other other side so in particular Steve has gone to the NBA the NBA used to be streaming music from you know the record companies and they would pay the record companies a royalty and they might give some money back to Beyonce or whatever but most musicians are not the big famous ones in fact if you look at the data if you do some actual data science uh today 95 of the songs being listened to in the United States played by people you've never heard of and they're probably between 16 and 20 years old and the song was probably recorded in the last six months so yeah everything everybody thinks we're all listening to The Beatles and Madonna or whatever it's just not true there's this all right so you think wow there's this wonderful Market that's been created because of the ability to stream music and you'd be wrong because it's not a market no one's making money all the the 16 to 20 year olds are not making any money off this they do it for a few years and then they disappear okay um all right so so well what Steve has done is by creating my masters is that a musician signs up and now there's three million Young Musicians signed up on the platform and if you now go to the NBA website and you watch a video there'll be some music behind it that music is streamed from United Masters all right and when every time it's streamed the musician gets paid it's our actual two-way market and it's in fact a three-way Market because it's got the NBA it's got the listener which is you and me and it's got the the person who made the music and now all kinds I could give a longer talk about that but all kinds of other Market sort of forces that are starting to come to play people are reaching out to musicians and partnering with them shows are being made people are playing at weddings there's three million people who now have access to a steady income Stream So This is a sense in which AI can create jobs three million people have access now to a possible job and these are 16 to 20 year olds in the inner city just to say this is fair this is this is quite quite important and that's just in the U.S this can be done in every country around the world and entrepreneurs thinking about a new company instead of thinking about how do I steal some bits from somebody and then sell them should think about how do I create a two-way Market and I just help the market get going you could do this for art you could do this for works of you know scholarly works you could do this for travel information all kinds of things you can start to think more about markets okay so that was the first half of my talk um that was kind of why do I work on what I'm working on okay and and so hopefully you get a little bit more of the picture it really is in some sense econ economics and mechanisms and you know networks and all that but you know with all due respect those fields didn't have enough of a statistics and learning perspective okay they assumed a lot of things were already known or you get certain curves that cross um but they didn't kind of just adapt the market as you went and used large data sets to inform it and have recommendator systems you don't see economics talking about recommender systems recommender systems are the way that social knowledge gets used and exploited among groups of people um so so anyway when you start thinking about what are the new problems that are going to merge if you put these three axes together it's really quite exciting so in machine learning and statistics we're really good about talking about Optima we can find Optum and hundreds of thousands of Dimensions even if they're saddle points and we can guarantee a rate and proof theorems about it and we're really you know we're really good at that but in economics you don't often find Optima you find equilibria and moreover the equilibrium rarely just stationary they're moving around and you need to follow them and so you need to talk about the Dynamics and so now there's topological issues and dynamical systems issues and stochastic process issues all merged together and so there are algorithms you know gradient descent does not work for finding equilibria but extra gradient does work and so on there's a whole emerging it's not it's it's fixed Point Theory so it's most of these ideas go back to the 30s and 40s but they have been forgotten but fixed Point Theory and hundreds of thousands of Dimensions was stochastics that's that's something we can start talking about and do improve rates and you know get really at new algorithms and and there are people doing that now um exploration exploitation incentives in multi-way markets those are words that usually don't come in into the market perspective how do I exploit how to explore and how to put that together with incentives and we talk about some of the rest of these but let me just sort of highlight these are mostly words you will not see on a machine learning person's talk or an AI person's talk they will talk about trust maybe or fairness or privacy that's all good those are social Concepts but they don't embed it in a fuller what are the underlying foundational principles that make it fair or make it private or make it valuable to people they just want to kind of stamp privacy or you know or or fairness on it and that's enough okay so let's try to think about what are these underlying Concepts and let me just say that um I've Loved learning all this economics I had learned a lot of statistics and a lot of I've eventually learned some computer science and that was fun but learning economics has been particularly fun and it's maybe because I already knew the math and I could just kind of go through the books really fast uh but you know this notion of incentives and really thinking about asymmetries and decentralized I really get that out of economics in ways I never got from any other field um so I'm having a lot of fun here and and I'm realizing that if I'd gone back to the 1950s and I'd hung out with David Blackwell and you know and Von Neumann and others that they were doing all this this was kind of the spirit of the era and operations research emerged in that era kind of bringing it all together and then somehow that all got kind of Forgotten we got all buried into building certain kind of systems or doing certain kinds of data analysis or you know measuring certain kind of linear models and we forgot about the overall picture okay so the I'm gonna these blue ones are the ones I'm going to kind of use now as vignettes and the rest of my talk uh and so this is given this is an evening talk I don't want to make this a highly technical academic talk there are archive papers on all this with theorems and so on so forth but I do want to give a sense of what's the problem and what is the theorem all right and what is the consequence of that okay so I'm going to give enough of that to highlight some of these issues so I picked these three to talk about in some order I forget which order okay so here's perhaps my favorite one um I get to recognize two Stanford people Stephen was a was a student with Emmanuel and uh joined my group two years ago three maybe uh you know fabulous intellect uh Michael I I actually don't we've only because the pandemic met online but he is a Stanford person and then Jake was a student with me is now a postdoc with one of Emmanuel's ex students Serena Barber so a lot of nice connectivity there um okay with all I do apologize to the economist of the room I'm going to say a little bit about incentives um you know there's a kind of general theory of incentives there's books on it and um roughly speaking there's kind of three branches to it there's auction theory that you all know about there's uh matching markets and there's contract theory so contract theory is maybe the less well-known outside of Economics but you all know about it because you experience it daily it's where agents provide possess private information and there's a principal who wants to incentivize to do something with that private information uh so why has this happen well you know the boss wants to get the employees to do something and it's not just because the boss can't would do it themselves but you know they have to get the employees to do it the boss didn't know how to do it the employees have local information they're smarter they may if they're incentivized they'll even better work and and so on and now the boss has got to kind of offer them you know incentives so that they'll actually do the labor so this came up in economics sort of after General equilibrium Theory which is very symmetric you know Nash equilibria and everything is very symmetric this recognized that real life is full of asymmetries there's someone trying to get someone else to do something and that person has power because they know something that's not known upstairs um all right so you know about it because you've all like for example you travel um and you you probably have wondered why you know why aren't there is why is there is this so complicated why is there not one fair for every seat on the airplane like there is for like a movie theater all right you all know the answer kind of right because there's different willingness to pay in the population so a business class traveler or a business traveler not business a business traveler maybe the company's paying so they could care less what the fair is or maybe they're really urgently need to get from one place to another they have high willingness to pay and there's a lot of other people who don't have high Wills pay they could wait till tomorrow or they you know and so on so the airlines is really the 80s realize that they could start to price discriminate they could try to figure out who was had higher Wills to pay and charge them higher and who had lower Wills to pay and charge them less and fill the airplane and getting a blend of both kinds and it's you got to be clever to do this right and so what you do if you set a single price that's not going to work and if you try to screen for people like you know look at somebody wearing a suit and tie you say I'm going to charge you more well that person's gonna the next time show up in jeans um all right so people are doing this all the time they're aware they're gaming the system and so you got to think this through um all right so you know the answer what you do is you provide a menu of options you provide a service and you provide a price and a service and a price and everybody gets the same menu all right now what is this menu for the airline well there's this class called business class and the students don't know about this yet but eventually you'll learn about it where you get a little glass of red wine and you get to be first in line and be all proud of yourself and you get a little bit bigger chair and people will pay a thousand dollars more for that it's amazing all right now only class that actually makes money right but the marginal cost of putting people on airplanes is is sort of zero all right so you want to feel the rest of the airplane all right so uh amazing there are people in the back who don't want to spend a thousand dollars for a little glass of red wine and they feel very good about themselves because they didn't spend all their money and they're still in the airplane so everybody's happy that's what's called social welfare and the plane is full that's what's called Revenue okay so you can make mathematics out of all that so you get the usual Crossing curves and all that they're just not the same Crossing curves as in general equilibrium Theory they're different set of things but every one of those texts says it's uh we have missing information we're going to assume there's some probability distribution and we're going to call the whole thing bayesium I was a statistician I like to say wow Bayesian it's not Bayesian there's just means there's a distribution on unknown quantities that's all there's no updating there's no learning and there's none of the above okay all right so wow wonderful opportunity we should work on this and we have okay so um you all know about clinical trials um costs tens of millions of dollars a year uh to run clinical trials in any particular therapeutic area you all know about it for vaccines it's amazingly expensive and it's amazingly important and if you don't do it at the right scale you'll make big mistakes you all know this all right so you would imagine that the FDA does a great job of this and in some level they do they're very good statisticians but they're not good is have been the economics all right so this really should be thought as a contract theory problem the fdaa is a principle and they're trying to decide what drugs go to market but they only have partial knowledge about the drug candidates okay where did the drug candidates come from they come from the pharmaceutical companies pharmaceutical companies know something internally about some candidate they're about ready to send up to the FDA maybe they know they put their best Engineers on it maybe they you know had experience with it maybe a little internal testing so on so forth um all right the FDA is now getting all these candidates and they would like to say uh pharmaceutical company that candidate you just sent me how good is that candidate well the pharmacy does not want to reveal because the FDA if they if they're told it's not a good candidate they'll put yet more they'll I'm sure there's no false positive they'll put yet more you know clinical trial money into it where if they think it's a really good candidate they will they won't and and also the license they will give will be titrated to you know to risk all right so the companies are incentivized to not say um all right but that's a problem all right so now let's think about the actual Paradigm what is the FDA doing well they're being you know statisticians frequent to statisticians so let's uh here's a little name and Pearson kind of set up a bad drug theta equals zero doesn't mean that it hurts people because they definitely screen for that it just means it has no effect all right and there are tons of drugs on the market that have no effect for better for worse and they have a type one error of say 0.05 it's actually more like 0.01 but um they set up a classifier that achieves that and then for the good drugs that are actually of an effect they that's you know they want a high power so you know 0.8 is a kind of a standard number for that is that a good protocol well it's optimal it's the name and Pearson test so yeah of course it's great but is it a good protocol and the answer is no um so let's do a little thought experiment here in situations where there's a small profit to be made it costs 20 million dollars to run the trial but if you're approved let's suppose you would make 200 million so this would be for a niche drug of some kind and so the CEO can do a little calculation as can the FDA conditioning on uh theta equals zero now no one knows if Theta zero a zero knots is a counter factual but thinking conceptually theta equals zero what's my expected profit well you can put all those numbers together and you get minus 10 million all right so the CEO looks at that number they say only send candidates up to the FDA if you're really pretty sure it's a good drug that it's going to get past because it's a good drug don't hope for a false positive we'll we'll go to business that's great now the FDA is mostly getting good drugs and they have a good screening procedure so everything is getting through is looking good right if that were real life that'd be great but here's more like real life so you you have 20 million dollars to run the trial and if you're approved you could make two billion so this would be like ibuprofen or something so this is much this is more common and now the CEO could do the same exact calculation if it was the case that Theta is equal to zero my expected profit will be 80 million so now the CEO is very incentivized to send as many candidates as they can to the FDA and the FDA will get flooded and they do get flooded and they will do these tests and there will be some false positives and these things will go to market they don't hurt anybody but they just don't have any effect and people make money and then eventually you know that changes so this is broken and it's just broken because it's not being thought of as a contractory problem all right um all right so we have now Lords on this we have a paper and we have a an idea we call statistical contract theory um and so here is the protocol um there are four steps to it it's only step three which is new the other three are standard contract theory so an agent comes to this contract and they opt in or they just decide to walk away so the drug company comes and they just looked at it say no I'm not interested or if they opt in they have to pay a reservation price r say 20 million and then they get to select a payout function from a menu and I'm going to say more about what that means here in a moment um but it could be a function from observe the clinical trial okay to the amount you get to licensed for and we're going to design the menu that's going to be our goal as Econo economical statisticians then we do a statistical trial which yields a random variable Z coming from P of theta Theta is the true Theta in nature because we're getting data from the real world no one knows Theta but we get data from P of theta and then there's the payoff so agent gets pay off F of Z they were the one who selected the payout function so they get paid that that amount they selected and the principal receives a utility which is a function of F of Z because they have to pay that and um and Theta because uh the FDA if they make lots of approvals of not so good drugs they'll eventually look bad and they'll so their utility should reflect that agents in this setting need to maximize their payoff their best response is simply to take the ARG Max of the expectation under the state of the payoff that's what they want to maximize okay so that's pretty clear what an agent should be doing in this paradigm all right now if you're going to do economics together with Statistics the key thing you have to think about is incentive alignment am I doing a situation where the incentives or the the what I want to achieve is aligned with people's interest all right so here's a way to set that up for the null agents those who have the the null candidates it should be the case that the utility of the principal is decreasing in F of Z okay whereas for the non-no all agents for a good drug the utility should be increasing in FOC okay so it's kind of obvious um so you know in English the principal wants to attract as transact as much as possible with the good agents the ones that have a good drug all right so now the definition is that a menu of these options is incentive aligned if it is the case for all of the null drugs the expectation under the null of the difference of the payout and the reservation prices less or equal to zero okay if that weren't true then these companies could just make money for free okay so you need to incent you need to have that be the case so the principal would be happy with this the P less than equal to 0.05 protocol that we're used to from statistics is not incentive aligned that's that's simple to see okay um all right so now we have a theorem which is right down at the bottom there which is that it turns out that a contract is incentive aligned if and only if this is a characterization all of the payoff functions are e values right what's an e value uh well it's like a p-value kind of it's a statistical measure of evidence um but whereas our p-value is a tail probability under the null the probability of under the null hypothesis being more extreme than the observed data that's a that's a p-value an e value is under the null hypothesis the expectation of this e value is less than or equal to one okay um it looks a little bit like a Martingale a super Martingale and in fact is the the more General stories these are non-negative super Barn Gales and because they're martingales they kind of compose nicely you could stop them because of stopping theorems they just are a nicer measure of statistical evidence whereas p-values don't compose you can't stop them they just have all these troubles so this is a neat result which is that this concept from theory of contracts uh is exactly the same concept as e values in statistics all right and moreover we have a result which I don't think of a slide on it nope that if we want now want to do how do you actually design a menu and get a say say a maxi Minh menu um you know the maximal overall point overall overall Theta of the minimum risk it turns out to be characterized by taking all possibly values that's your menu so for computational reasons you might want to do not want to do that or for interpretability reasons but that is the that's another if and only if theorem okay so I'm going to move on we're now rolling this out in various domains if you're actually designing menus and contracts but we have this guide we now know how to design the optimal contract we know what we should use e values and we know lots of e values there's a lot of literature on non-negative supermarkets that are e values and so on so we'll be doing that and let's just say we've done this in particular in the Federated learning domain this is not just again the picture of Federated learning but now with a incentive structure so we're able to design an incentive compatible mechanism that incentivizes agents at the edge to contribute data and in particular this handles a problem that has been recognized Leisure which is a free writing problem if I have some data to send out but sitting next to me there is a manual and he has stated ascended and I know that his data is pretty much the same as mine I'm going to watch him send the data and I know I don't have to pre-writing this Paradigm incentivizes against free writing it's a good question I was going to kind of hoping that was going to come up later about or it's all this rational economic stuff uh you know no and and sort of the behavioral economics here is kind of coming in the fact that we're Gathering data okay so um these all these distributions are informed by data and if we just write down the utility okay that's only the Assumption we have to make is that we agree that you want to maximize that and that's usually not so strange and then the data informs it we don't make a distributional assumption about the data so I could get into that a little bit longer but I I you know the behaviorally economics is very much part of this agenda but it's not just that we you know it's broken and you know we think about the psychology of it well no we used to collect data and data is coming from real people so we already have a little bit of a help there so hopefully that partially answers your question anyway if you're interested in this application we have a paper on that and we're continuing on with that um I got two more vignettes I think and I've got to go a little more quickly on these I just want to give you a flavor of these you know classification is the big killer app in machine learning classify yes or no good or bad blah blah blah but if you do this in domains where there are strategic agents you get something called strategic classification so this is work with Tiana zernitch who still student with me and will be joining Emmanuel's group as a postdoc he and I shuttle these Superstar people back and forth and then Eric is now professor at Caltech um all right so here's a little picture to suggest this you know health insurance the health insurance company has got to do a classification problem I fill out a form they have to decide whether to give me insurance or not all right they're going to ask me how much do you uh exercise I'm going to say a lot how much do you drink of wine very little so and so forth now if it's implausible they'll kind of see that but you make it plausible they know that however so they're not going to make it so easy for you so don't ask questions like would you be willing to have us look at your cell phone accelerometer for one day just opt in you know you don't have to but are you willing to do that you say sure and now if my cell phone moves around a lot that shows I'm very active if it sits in one place all day I'm not so active they would use that as data so someone went out and built a device that you put your cell phone on the device and it moves around all day so this is the kind of problem that arises and economists are very much aware of this they call this Good Hearts law you know if you set up a poverty index score at some year this was in Colombia in 1994. it looks very good very gaussian and all that by 2003 people have discovered that if they move just a little bit left of there they get more better housing so everyone cheated a little bit so they could move over and so the poverty index score has now been ruined but this is real life this is what people really will do and they should why not it's not an ethical issue see that's you know ethics is sometimes used a little bit too easily here so the real problem is that when you do learning you rarely have just collected data set and analyze it in the real world you have to say where's the data come from the people's Supply if it's people supplying the data are they aware of what the outcome is and they have some vegetables to interest in it probably they do because if not why would they really be engaged in this whole exercise all right so now we have a stackleberg game it's a game theoretic setting which is sequential uh I send some data up and the essential decision maker say the bank is trying to decide about loans collects a lot of data they build a model that predicts whether I should get a loan or not all right and then that starts to make some decisions people start to realize what's happening maybe the bank has got to reveal by regulatory Reasons I'm using logistic regression or something people realize that and they say okay the next time they send the data they're going to send some they're going to alter their data and that goes back and forth and you want to ask what equilibria rise here we're not trying to optimize any likelihood it's an equilibrium problem okay so um we have studied this as a stackleberg game which is the appropriate Concept in Game Theory uh classically there's a stackable game you have a leader and you have a follower classically the decision maker would be thought of as the leader here they run the whole show and the Agents of the follower you can show in that situation in this setup that the leader gets high utility and the followers Get Low utility just to say so same as all it seems reasonable but that's being if that's an analysis you could do in a synchronous situation where there's a model built data is gathered model built data gathered all synchronized the real world is no synchronization why should people wait till you know there's no synchronization between the central model and me sending up data okay so you could start to think about analyzing different scenarios where there's different kinds of time time scales so here's one where the modeler goes slowly only updating every once in a while and the Strategic engine send data much more rapidly so um does this arise in real life sure this is for example like college admissions the college is gathering all these applicants and they have all this data they're not going to adjust their policy after every applicant they'll do it every couple of years or so and they'll publish it at all for obvious social reasons so that's a real scenario what about the other way around or the central agent updates very very rapidly and agents are much more slow well that happens all the time too that's like YouTube every time someone clicks they update a model in principle okay so these are different scenarios uh and so what happens here so so you can analyze this is now you do the game theory so we were able to prove a theorem that shows first of all that in either order of play you get an equilibrium it's not so hard to see that and to analyze that much more surprisingly is that in these statistical settings where it's a data analysis problem not just an arbitrary stackable game it turns out that when the followers when the these decision maker is a follower and the Strategic agents are the leader the kind of flipped around version the Strategic agents have higher utility than before that makes some sense but also the decision maker has high utility it's a rare example in game three of a win-win going in the order where the Strategic agents are the ones going fast that leads to higher utility for both parties so that's not a true fact about game theory in general but it's a true fact about statistical Game Theory these statistical modeling exercises for generalized linear models just to say I'm going to skip this little part here just to I I like to show pictures of my students so there's Lydia and Haria um and just sort of say this I'm going to show you really quickly the slides but it's a it's a cute little paper where you bring together bandits from machine learning and matchy markets from economics and let me just show you a picture here's here's a Learner in a bandit problem they're trying to find out which is a set of auction options is the best gives the highest reward and their algorithms like upper confidence bound that help you guide you towards uh diminishing your uncertainty and also picking the optimal arm all right so we asked the question about uh what if you put this in a market setting so I don't just have one decision maker I've got uh you know two-sided market and so in particular I might have two decision makers who are selecting actions from the other side of the market and there's preferences on both sides and so you ask questions like what if both of the agents select the same action and so we model this as congestion that one of them gets the reward the other gets no reward at all um and who gets the reward well that depends on the preferences on the right side of the market so both sides are learning about each other and so again you can do the mathematics here and it turns out to be pretty interesting what you're really asking is if there's competition in a banded situation does that make the regret higher or better what does competition do for the learning process of a person trying to learn the best action okay and long story short we did a here's a theory here's a regret bound and so this is more for the experts but the regret is as a function of time which is in logarithmic and N so that's an optimal result from from a classical Bandit Theory so that is still true competition does not hurt your rate of learning there's a denominator term though which is a constant which is a gap between the preferences of nearby agents so if there's competition you have a small gap between me and somebody else we start to compete more and that gives us a higher regret but it's only a constant all right so I put that up there just to sort of show you that it's kind of really fun things to do with simple learning algorithms explore exploit type and simple matching Market kind of ideas and this was motivated by this kind of restaurant setting where you know a hundred thousand of us are out in restaurants looking for a restaurant in Shanghai there's a hundred thousand restaurants and we're all kind of trying things out as we go and getting Rewards or not and everything on both sides of the market have some preferences and how does that market clear that was our question all right so I last two weeks ago I talked about this in this very room so I'm going to kind of go quickly but this is a very exciting uh project here that I want to spend five minutes on a similar collection of uh students but also again Stephen who's a postdoc um Anastasio Stephen Clara and Tiana um so this is really more about the statistics there's a there's a little economics here but but less this is more about how do we do things like use neural Nets to do science just roughly speaking okay so you all know about things like Alpha fold um you know they will make huge numbers of predictions of say these uh you know tertiary structure of proteins um hundreds of millions of structures whereas the hand labeled ones there's only hundreds of thousands of such sequences all right so that was a problem that's now being revolutionized in biology so here's an example of someone in 2004 wrote a very important paper in nature uh studying the relationship between intrinsic disorder of proteins uh where things don't fold they kind of are more strand-like and that's turns out to be very important for like grid like things and phosphorylation which is an important part of it in many Pathways so they wanted to ask is there an association between those two two Notions what the parts of the protein or disorder they tend to be more phosphorylated or not um but they really couldn't test it and now you go forward to 2022 instead of you know this small amount of data we had back in 2004 uh now you have vast amounts of of uh Alpha fold labeled data it's not really data it's predictions but they're good predictions so why not throw them in as if they were data right so someone wrote a paper uh 2022 doing that and so they wanted to quantify this odds ratio probability of intrinsically disordered given phosphoration and kind of amazingly didn't even use any of the hand label the the gold standard data because they had so much of this other stuff they just threw it all in and because it's such a good predictor why not but as a statistician you know better right even if it's very very accurate making predictions that doesn't mean the inferences you make are any good all right so I think this one picture on battery show so there's the mayor this this picture is probably the the the end of my talk really and I'll just kind of scroll through a couple more um so let me take a moment on this one our statistic here is it's an odds ratio we'd like to know if there's an association or not you've all taken elementary statistics we have to find whether it's significantly different from one one would be no association bigger is an association okay we did a Monte Carlo version of this where we we um in in this set of label data we actually got the true odds ratio that's the dotted line there and then we redid the entire experiment with Alpha fold output using predictions okay so what are we doing here that gold region right there is a confidence interval and it's based on taking all of the alpha fold predictions and treating them as real okay and from those you form a confidence interval on the this this odds ratio and that's an elementary statistics exercise to do that right that confidence interval is Tiny that looks really good because you have all this data it's not real data but it looks really good you're very very confident you're just dead wrong all right the the statisticians in the room are will say why'd you do that you know just we know how to do confidence intervals just use the gold standard data don't trust these wild machine learning predictions and our I would do that too that's what my first thought would be that gives you the gray region so it covers the truth as it was asserted to be all right but it also covers one so it doesn't allow you to include There's an actual Association all right so the new method gets the best of both worlds this prediction-powered inference it forms a confidence interval which is guaranteed to cover the truth with 95 you know probability um but it's also much smaller than the um it uses the predictions but it corrects them all right and I think the most fun thing to show you I'm going to skip that slide it's just the examples that we've been applying this to and we have a paper that does these here's voting uh here's a ballot here's a messed up ballot so this was a San Francisco election people use computer vision to look at all the ballots and make a prediction whether it was yes or no all right and now you can feed that in and do a data analysis on that and you can see the little gold region there it's a little small confidence interval it's just missing the truth and again I don't know why some things are not coming out here our new interval is the green one and then there's a missing uh um a classical one there that's much lighter larger but again covers the truth this is uh counting spiral galaxies you know there's some hand labeled here's the spiral goggle here's not and um again you can see the small confidence interval if you use the computer vision algorithm but it's not covering the truth and again we cover the truth and I think this was the last one I want to show yeah here's a Californian census the S demand is a logistic regression coefficient of income when predicting whether a person has Private health insurance and there was a machine learning algorithm run on that you can see the tiny little confidence interval it's very very sure and dead wrong okay um I hope this conveys that you all kind of knew this that you know very accurate machine learning models can still lead to completely wrong inferences okay and I just don't think my machine learning colleagues get that but it definitely can and so but you can get the best of both worlds and I think I'm going to skip all the way to this last slide here and just show you roughly how this happens it's kind of like a bias correction procedure but it's not quite that all right so there is a bias between the predicted parameter using all the predictive data and the true parameter Theta star that bias is a population level quantity if you have the whole population you could just write it as a number right there are ways to estimate bias like the bootstrap and you can take that estimate and correct your estimate that's done a lot we have a different idea which is that we take that quantity that that uh bias or we call it in general rectifier and we don't just estimate it we'd put a confidence interval on it we get all the possible values of that correction now we take the original predictive quantity which is wrong and we correct it with all the possible Corrections that leads to that green region there which is a confidence interval on the corrected predictions all right and then our theorem at the very bottom of the page shows that we do we're good statisticians the probability that this new confidence interval covers the truth is bigger than or equal to 1 minus Alpha and it's much much smaller than if you've forgotten all the predictions altogether okay so I put that up there just at the end of an economics talk it's more statistics but I just think many people in the room everybody already thinking about this and working on it it is one of the critical issues if you're going to do science with machine learning you've got to face this you've got to be a good statistician while exploiting the advantage of the machine learning Paradigm and I think this is a step towards doing that all right so that's my last slide and that's the slide I had up earlier I just wanted to kind of remind you of the big picture of the more provocative issues this to me has kind of been a no-brainer I just uh what's happening in this era well it's just statistics and computer science and econ and all and we're being good Engineers we're trying to deliver artifacts that will help humans and and and how to do that well like like previous generations of Engineers and this kind of Silicon Valley hype thing of you know we've discovered this great thing called Ai and it suddenly we've got to worry about all the things that's going to happen because of that it just to me wrong thank you [Applause] I hope I knew I shouldn't have picked on you or you have a better idea well I have a paper where there's an island of really high quality gold standard data and an ocean of data where you're not quite sure the quality and then we sort of do a shrinkage um of one on to the other but um we we just got Point estimates so we didn't get a confidence intervals yeah I mean so this is a little bit like the semi-supervised Paradigm so we have an ocean of label data and a sorry not sure we had a small pool of label data and an ocean of unlabeled data the machine learning person says oh yeah that's similar supervised no we're using the unlabeled data to find a confidence interval to correct the the sorry we're using the got it wrong the label data to find a confidence interval to correct the unlabeled data and get a confidence interval out of the whole thing um but yes I I think I was aware of that work of yours and let me just say this is not none of this is ever new um statisticians in kind of small data census work did things like this and semi-parametric statisticians did some too so this is again there's there's always somebody who did it probably in the 1950s in statistics and or art or or Brad Efron or whatever it's always any other yes there's two over here um as you pointed out there's been a lot of attention around um uncertainty quantification and similar um you know similar moves in that direction lately in the machine learning space which is great um what does that mean for the future of Applied Bayesian statistics and you know mcmc that sort of thing given that it's uh computationally less tractable than a lot of modern machine learning training techniques is there's still a place for it uh whereas thanks yeah good thank you for asking yeah no I tend to be a Bayesian as with most statisticians sometimes I'm amazing and sometimes I'm not and I'm a Bayesian when I'm working with a scientist over two or three years because I'm trying to kind of get out the knowledge that they have and use it all right and that's a prior and and so why don't why would I not do that if I'm going to work with him a long time I'm a frequentist when I'm trying to produce a piece of software that people all over the world will use because I'm not going to work with them and get the pride I'm just going to put it out there and I want to put a stamp certificate that 99 of the time is going to work for whoever uses it that's the two perspectives I have now a little more Nuance to that um Bayesian way to structure models is very nice you get hierarchies you get sharing of shrinkage social network kind of things are naturally Bayesian I think that Brad Ephron who has been the luminary in statistics and and uh here at Stanford but uh worldwide I had it right which is that you often will go into a problem you think Bayesian you start to structure the model think about what I could know what would be together with what and then you become frequentists you say I'm gonna do empirical base I'm not gonna just run the Bayesian mcmc Paradigm I'm going to some point just say okay there's something I can estimate I can plug them in at the Bayesian procedure and I'll get the benefit of Both Worlds now that's that's I totally agree uh so um you know a lot of things you saw here have a kind of an empirical base interpretation and a lot of the uh but it's true the conformal things and the uncertainty confidence you're talking about don't necessarily maybe Emmanuel could correct me there but those are kind of pure hardcore frequentists at some level but I tend to when I would use those in practice I would probably have not just one conformal predictor over here I'd have another one over here and another one over here I would want to shrink them towards each other I want to have them related because in real domains if you really start to scale the Bayesian way of thinking helps you immensely so yeah it's funny I I you know Bayesian frequences do conflict it but it's like wave particle duality it's my kind of my metaphor right waves and particles are both correct and they conflict a little bit but if you throw out one and just use the other one you're gonna do bad physics and same thing with statistics yeah emphasis Rising the importance of forming collectives in solving the problems and I'm very curious to know how you think about how those collectives can actually be formed to pursue a a goal especially because I mean I don't know if a principal or a platform is required to kind of create the market where agents can actually trust that and cooperate or do you actually feel like a decentralized kind of a network is possible that's a fantastic question I'm delighted to have it and just for especially the young people in the room I hope you see the questions like that are kind of the thing of the era and there's their heart I don't have an answer to your question it's gonna be the short answer the colleagues I had on that paper the social science colleagues that they talk a lot about model new models for democracy and they emphasize that democracies tend to arise when you have multiple layers of like bring 200 people together get some consensus take 200 people here get some consensus and put the consensus together and form cities and countries and all that that's what humans have done throughout history and we've had this experiment now that we have this thing called you know Twitter and we're assuming that it's all good that all of us talk all the time or that we all listen to one person and those are terrible ways to do democracy so there are experiments that have been underway for quite some time for people like that like those famous examples in Taiwan where they have a legislator which is using lots of data analysis together with kind of structured Assemblies of ways to kind of come to coherent decisions and to get consensus and Ireland has used this and their latest this they they kind of legalized abortion at some point that's very hard to do in Ireland they did it partly because of these new assemblies new structures so I love this you know people thinking about out of the box of new mechanisms that bring together visibility of you know but it's still among relatively small numbers of people that's really critical and I think that the technologists who just built the YouTubes and the you know the the Facebooks and all that we're trying to do this experiment on human beings it was just destined to fail you know the big broadcast channels were terrible you we want communities and we wanted how to think about how to structure those uh so I don't know much more to say about that other than you know how do you form collectives and support them and make them healthy and all that is hugely interesting important and there are social scientists who spend their life doing this this is definitely not a just a technology issue we need to both cooperate and listen and have a dialogue with those kind of social scientists um there's many others we should cooperate with but I think that's a particularly pregnant one economics certainly talks about collaborative things there's Cooperative Game Theory and how do coalitions form and but it's a little bit dry and maybe hido can help me a little bit with kind of maybe there's more to it but it's you know a little bit about how do I do negotiation and get the most money out of the deal I can and so on yeah our line with incentives and so on so it but that just means that we just haven't thought about it enough and for the young people in the room wow that's a great topic to think about how do I start to structure collaborative efforts in data-oriented ways so I think economists didn't have enough kind of they talk about communication and signaling but they didn't really have enough data to kind of really signal interesting things and do it in adaptive interesting ways so let me just lean in again to this you know there's a lot of young people in the room this is the most exciting hero to be in the previous eras kind of gave us Grady to sin and gave us networks and all that and all these tools and now they threw them out there in the world and they kind of work and they kind of don't we got better Commerce we got better transmission and we can sort of fix all those we can also think a lot more about wow new things good things could happen if we start to think in the right way and what problems are needed to do that don't just work on self-driving cars and what you know and whatever or make Facebook advertisements better you know work on problems that you believe in and and there are plenty of them but uh you know bring these two Fields together though don't just think of yourself as a system builder thank you
Info
Channel: Stanford Data Science
Views: 4,350
Rating: undefined out of 5
Keywords:
Id: 3zlDHdtSXt4
Channel Id: undefined
Length: 66min 29sec (3989 seconds)
Published: Wed Aug 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.