Andrew Ng: Bridging AI's Proof-of-Concept to Production Gap

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- Really exciting to kickoff the Stanford's new academic years, HAI Seminar. For those of you who are joining us for the first time, HAI is a relatively newly established Institute at Stanford for Human Centered Artificial Intelligence. My name is Fei-Fei Li. I'm a Professor at Computer Science Department and also co-director of HAI with Professor John H. Madey. And just very exciting that joining us today for the first kickoff seminar is one of the world's most famous renowned, and also beloved member of Stanford, AI community, Dr. Andrew Ng. Before I give a brief introduction about Andrew, let me just say a couple of words about HAI. HAI is a Institute whose mission is to advance AI research, education policy, and outreach to better human conditions. We are a highly interdisciplinary Institute here at Stanford that works on very advance advancing AI technology, as well as many of the social and human issues related to AI, whether it's fairness and bias ethics, future of work geopolitics, and we work with important professional schools at Stanford, such as school of education, business medicine, and all across the campus. So for those of you who are part of the community, we really invite you to join us in any way, whether you're students or researcher or faculty, or just alum, for those of you who are joining us around the world we welcome you to sign up to our meeting list or events list. We hold a lot of exciting events and our weekly seminar is intended to bring you the latest thinkings by AI scholars and salt leaders who are at the forefront of making changes in AI. So, like I said, it's so what an honor to introduce Andrew, who has been a longterm friends for more than a decade, speaks of our age Andrew that.. Andrew is literally the first person I met when I joined Stanford. And he is the founder and CEO of Landing AI, also founder of deeplearning.ai and co-chairman and co-founder of Coursera, all of these entities are changing the world as we speak. I can't imagine a person does all of them and his currently also an Adjunct Professor at Stanford University. He was a Chief Scientist at Baidu and part of the leading team of Google brain project many, many years ago when it was just starting. Andrew will share with us his topic of bridging AI's proof of concept to production gap. And before we start our session today, I just want to introduce also a really important colleague of mine whom you'll become very familiar with as the academic year goes. And this is our new Director of Research at HAI Dr. Deep Ganguli. And he'll be sharing a few house rules with everybody. Thank you. And thank you, Andrew, for joining us. I'll be listening to your talk attentively. - Yeah, thank you, Fei-Fei. - And thank you Fei-Fei. And then before we kick off just a few house rules. So first thanks everyone for tuning in today. I'll be moderating the question and answer session at the end of Andrew's talk. So to submit a question, please use the Slido website. There's a link in the chat box of the Zoom, and you can also point your phone at that QR code, or you can go to our events website at hai.stanford.edu/events and click join the conversation. And without further ado, Andrew, please take it away. - Great. Thank you. Thanks Deep and thanks Fei-Fei. It was surprising when Fei-Fei mentioned that HAI is a relatively new Institute, because I think thanks for her leadership and John Ashman, and his leadership at Stanford in an AI world, it feels like it's already a major institution. So it's interesting to reminded that despite HAI's presence and all of these wonderful events I see across the campus, the virtual campus all the time from HIA is interesting to reminder that is given just in the early days. So it's nice to see everyone here. Last night when I was looking at the attendees list, I noticed that all of you watching this is a very diverse audience I counted 23 CEOs on the attendee list. There are a few dozen professors, a few hundred students, also a few hundred machine learning engineers and machine learning researchers, and one person, on the registration form listed themselves as a poet. So whatever you are, CEO, a professor or a students, machinery engineer, researcher, or poets, I'm really glad to see you here today. And thank you for joining us. What I'm going to do is share with you a perspective on one of the challenges facing AI. AI has created a ton of value, but a challenge and almost a bottleneck or barrier to create even more value. And this is something I see in multiple universities, companies, industries is bridging the proof of concept and to production gap. So let me share my slides. And what I hope to do today is share with you a perspective on this proof of concept of production gap, as well as what all of us in academia or in business, or maybe even I saw a few government leaders as well maybe they do to overcome some of these challenges so that AI can become even more useful. So I've been say for I think about five years now that AI is the new electricity, similar to the rise of electricity by the a hundred years ago, AI is poised to transform every industry, but what have we really done? I think AI has already transformed the software industry, especially the consumer internet industry. So, we've transformed, we as a community, we've collectively transformed web search on an advertising, machine translation, social media, a lot of great things, also some problematic things, but the software internet tech industry has many teams that know how to use AI well, there's still a lot more work to be done, but it's clear it's created tremendous value. Once you look outside the software industry, I think AI is impacted is still matters and growing. But I think that look into the future. The impacts of AI outside the software industry will be even bigger than his impact on the software industry, but the way we build and deploy our systems in all of these other industries, what will have to change a bit in order to make them more effective. Now, given how diverse today's audiences, I want to take one slide to just say what I mean by AI. So AI means a lot of things these days, but as some of you will know, 99% of the value created by today's AI technology is through one idea which is called supervised learning, which we learn inputs, output mapping. So if an email an output is this spam or not, that's just spam filter or you put an audio clip and outputs, a text transmitter does speech recognition, powering voice search, powering the smart speakers you may have in your homes, most lucrative application of this is a probably online advertising, not the most inspiring application, but certainly very lucrative for some large online app platforms in which you input have an AI system that was an ad and some information about you and tries to figure out if you click on the ad or not, because showing slightly more relevant has a very direct impact on the bottom line at the launch online app platforms. Some work that my team at Landing AI has been doing is visual inspection, where we'll take as input, a picture of manufactured objects. So a picture of a phone. And we tried to tell if this object that's been manufactured is scratch little gentle or has some other effect or medical imaging. My Stanford group does a lot of work on where we input chest x-ray image and output do we think this patient has has a pneumonia or some other condition. So the AI world has generated a lot of amazing research progress, a lot of amazing proof of concepts in the business world as well. For example, this is a one result that my collaborators and I had announced sometime back this was Pranav Rajpurkar, Jeremy Irvin, Matt Lungren, Cart Langlotz, Bhavik Patel and many others. Chest x-rays is one of the most commonly done medical procedures. We use it to help diagnose pneumonia, lung cancer also COVID. So there's about 2 billion chest x-ray for procedures per year worldwide. And so we announced a result where we claim that deep learning achieved radiologist level performance on the 11 pathologies and did not achieve radiologist performance on three pathologies. And many groups have announced results of this flavor. Well, Fei-Fei's groups, published wonderful papers of this flavor of other groups of Stanford. Sebastian through and through diagnosing skin cancer. And then just many groups around the world have published results saying that AI does as well as a human doctor on diagnosing something from some type of medical modality. So given all this amazing research progress and these amazing proof of concepts, why aren't these systems widely deployed to hospitals yet, if you were to get a chest x-ray today in most countries in certainly in the United States, but in most countries is very actually in, in all countries is very unlikely that there's an AI system reading your chest x-rays. So why is that? If there are research papers that are peer reviewed and that I will stand behind my papers, saying that supposedly they outperformed even both certified Stanford radiologists. So what I see across the AI world is that there are many research studies or proof of concepts. These are the things that works well on a researcher's laptop running in the Jupiter notebook, but that still needs to bridge that proof of concept to production gap. So what I hope to do today is share with you three of, I think the top challenges in bridging the proof of concept reproduction gap in hope that wherever you are, whether you're in academia or business, or for profit non-profit government or governments that you are, have an exciting idea. If you can help your team get to proof of concept, that's wonderful that should totally be celebrated, but watching out for some of these challenges, I hope will also help more AI projects get into practical deployments. So I think few of the top challenges in bridging the proof of concept production gap are challenges of small data of generalizable and robustness and of changed management. So then go with these three and then also talk a bit about the full cycle of machine learning projects, which I think will help all of us as a community, take more AI projects as successful productions deployments. So let's start a small data. A lot of AI had grown up in consumer internet companies, right? They're very large tech companies that have hundreds of millions or billions of users. And when you have that many users, you have big data. So I find that a lot of AI philosophies and tools and approaches were tuned to big data given the nature of the companies in which AI had grown up, but a lot of industries have much smaller data sets. And for AI to be useful in those industries, we have to have pets, a small data algorithm. For example, take visual inspection of smartphones the example that I alluded to earlier. If you have a million pictures of strap smartphones, then today there are, you know, at least dozens, maybe hundreds of teams that can build a new network to diagnose if a phone is stretched. And in fact, really building on this can't be emphasize more some of the work on emission, all of that open source work, right? That people built on top of many of those models that work well for the very important, big data problems. But fortunately, no factory has manufactured a million scratched smartphones, which would therefore have to be thrown away. And the question is given only a hundred pictures of scratched smartphones, which may be all the data that exists. Are you able to build a accurate inspection system? And this is critical for breaking open these applications of machine learning in vision inspection where only small data sets exist. To dive more deeply into small data here's another example. So the result I mentioned just now was I said, deep learning achieve radiology performance on 11 pathologies and did not on three pathologies, right? Well, these are the 14 pathologies you can focus just the first and the last column, the middle two columns has has a accuracy and confidence are the most. But let's dive into a few of these rows. So for the condition of a fusion, we have a lot of data. We have 11,000 examples. And so there a deep learning algorithm was able to diagnose at a level of accuracy that was statistically indistinguishable from radiologist. But if we look at a rare condition like earlier, where we have about a hundred examples there, radiologists still all perform the learning algorithm. So it turns out that learning algorithms work well on datasets, where the distribution is like that's on the left. You have a thousand examples of every class then is it's not easy, but it is relatively easier to get the learning outcomes to do well on all of the classes, but it doesn't do as well when your data distribution looks like that's on the right, which is what we actually face in the medical domain. And I've been in a lot of conversations. Oh, actually, I've listened in on long conversations between the machine learning engineer and a product leader or business leader or a hospital leader. And the conversation goes like this, the machine learning person will go, "Oh, look, I have achieved very high accuracy "on the test set." And there's a fair test said that I looks at, to the peaks as a fair test a validation, and then the hospital leader or the doctor or the business leader probably that says, congratulations on your high test set result, and on your research paper publication, but your system just doesn't work. And the machine, the only researcher or the machine engine says. "Yes, but I do really well on the test set." And then the conversation ends there, unfortunately. I think our job as machine learning researchers and engineers and developers is not just to do on the test set is to solve the problem that actually matters so the use case wants to address, and I find that common metrics such as average accuracy, do not reflect these small data occurrences problems. So for example, if your data distribution is, I'll say that on the right is completely fine to ignore the hernia condition. Just never predict the hernia and your accuracy is just fine because hernias are so rare. But for practical applications is, proudly medically unresectable to deploy a system that misses completely obvious cases of hernia. And so even though hernia is very rare and on an average accuracy standpoint is less important for the practical hospital needs my team's work with a few hospitals. So we're on the ground doing this work is important I just use them rare cases as well. So I think both on the... I think fortunately I think that research community and the business community is making progress on better algorithms for handling small data. For example, I'm excited about synthetic data generation using... GANs were created by my former student, Yenko Fellow, who was actually a Stanford student way back. But with GANs does actually example in vision inspection, generating stretches of cars using GANs. So you don't need a million scratch costs to learn to detect scratches. You can synthesize scratches that actually, I can't tell a synthetic scratch from a real scratch. And it's exciting research probably is also on one shot learning and few shot learning where algorithms are able to learn from very few trainees examples. And I think GPT-3 released just a couple months ago, because just quite recently, it was a very exciting step to one shot learning a few-shot learning language still made. I'm excited about self-supervised learning and self-taught learning where we learn from large amounts of unlabeled data to do a label task transfer, learning and anomaly detection. But I think all of these are technologies that I think are exciting to help us overcome the small data challenges that are much more pervasive. Once you go outside your software, consumer internet. Other than small data, a second challenge in bridging their proof of concept to production gap is generalizable in robustness. So going back to the AI Deeplearning for extra diagnosis example, it turns out those of us that work a lot in research and also production settings. We'll, know this, a model that works workload is there a published paper, often doesn't work in a production setting. So for example, when we collected data from Stanford Hospital. Stanford has a relatively modern x-ray machines and very well trained technicians. So we collect images there. And when we train in tests on images collected from Stanford hospital, we can publish papers that are peer review. And then I will stand behind showing that we can help perform human radiologists when we're sharing the tests on data from the same hospital. But it turns out if you take this model and walk me down the street, maybe it's an older to hospital using older x-ray machines and where the imaging technician, where the x-ray technician uses a slightly different imaging protocol. So maybe the patients are tilted at a slight angle, then the performance degrades. And this is in contrast to the performance of any human radiologists that would be able to walk down the street from Stanford Hospital to this other hospital and do just fine. So. There is a huge gap between what works in a research lab versus what will work in production. This is true not just for healthcare. This is true for many other industries as well. And so I think one thing that we should work on both from the research and on the practical engineering side is better tools and processes to make sure our rubber was generalized to different datasets. Then those trains I'll share a few more thoughts on this when I talk about the full cycle of machine learning projects. Finally change management. AI technology can take a workflow and automate part of it and that can transform the work of a lot of people around it. And I think we need to get better at managing that overall change. So here's an example. This is some work I did with on Anand Avati, Nigam Sham, Ken Jung and others on palliative care. So palliative care, which is roughly end of life care helps patients with terminal illness enjoy high quality of life. We know that here in the United States, doctors in general make fewer palliative care referrals than we would like. I think, doctors are good people, many doctors want to keep fighting for the patient because they care so deeply about the patient. And my father's a doctor, and I know it's actually hard for a doctor to give up and you just keep fighting for the patient, which is a great attitude we want adults to have, but we know across the country that doctors make fewer palliative care referrals than one might wish. Now there is in many hospitals, including Stanford Hospital, there's a specialized palliative care unit. Those palliative care doctors could proactively reach out, but the volume of patients mix chart review, reviewing patient records and feasible. So what we did was we built a learning algorithm to predict the chance of mortality, of a patient over the next three to 12 months. And this recommends patients for consideration and for palliative care. So this is the workflow that we actually build. at Hila palliative care staff, the hospital, and the data is made up for student privacy, but this is actually pretty much what she was seeing every morning. But so Dr. Harman would wake up in the morning and pull up a table from the database, which looks pretty much like this. Where she would see patient IDs, age of the patient and the learning algorithms, estimated probability of mortality. This allows her to decide what patient chance to review in greater detail and what doctors to co-op to recommend for consideration their patients for palliative care, or to make sure that the advanced care director is taken care for example. So what do you think happened when we first rolled out the system? Right? When a doctor calls up another doctor and says. Hey, I think your patient is at high risk of mortality. What do you think happens? Well, maybe not surprisingly the doctor on the receiving end the phone call goes, who are you? And who are you to tell me that my patient is at high risk of mortality? So what we realized was that we had to perform the change management process better because the system like palliative care affects a lot of stakeholders. It affects doctors, it affects nurses, it affects Hospital administration, insurance, outpatient services. And of course, most importantly of all that affects the patient. And so I like in law projects, I work on, I've learned over and over to go through the appropriate change management process, because when we take a hospital's workflow and automate just a piece of it, the reading x-rays, or making palliative care predictions or something else, it disrupts or transforms the work of so many people around it that, by she didn't have time identify stakeholders, for fact reassurance experience happening. Right-sized first project all of these things are important for us as technologists and our business leaders and or academic researchers. If we want to play a role in making sure our amazing technologies go out there to have an impact. And I just mentioned key technical tools to managing change are explainable AI and auditing. And I just go to the show I think Fei-Fei and HAI have been real thought leaders in helping at that conversation on both of these important topics. I know that the, some AI leaders, they think explainable AI is not important that you train a black ball Deeplearning. Why do you care to explained that and now she probably just don't agree with that. And maybe a quick story when we started, and explainable AI is actually complicated. So, one quick story of of where I got it wrong. When we built it first palliative care system, and we start to show it to some doctors. The doctors feedback was, "Hey, how can you learn the algorithm "possibly tell me that this patient "has a 78% chance of dying in the next eight to 12 months. "Like, how could I possibly trust your AI system." And so we actually built, Anand Avati actually build a system using an algorithm semester aligned. It's a technology that I've heard of it is we tried to generate explanation for the doctor that says, we think this patient is a harvest of mortality because looking at the health record, the EHR, they're trying to have record there they received this diagnosis and they went this test. So this is why we think this patient is at high risk of mortality. And guess what happened? Doctors looked at it, they looked at a small handful of patients looked at the explanation speech generated and they go, "Oh, got it." And then they never looked at the explanations again. And the lesson either and was the doctors didn't actually need us to explain to them why a patient is at a high risk of mortality, they're completely qualified to look at an EHR and judge themselves if this patient is at high or low risk mortality. What they actually wanted was some reassurance that our machine learning algorithm, then our AI system was generating reasonable conclusions. So what they wanted was just enough of an explanation for them to feel like our system was being reasonable. And once they had that level of comfort, they didn't care about explanations anymore. They just didn't want to look at it anymore. And they would just look at our recommendations, use our system for screening, but then look at the patient charts, patient records themselves in order to decide what they wanted to do. So I think one of the reasons explainable AI is so complicated is because I think a few keeps on confusing, who is this for? Are you trying to generate explanation for the doctor, or for the patient, or for a regulator or for someone else? And also what is the action you want them to take? Do you need them to do something on a patient by patient basis? Or do you want them to just be generally comfortable or do you want a regulator to help you spot that as a major floor? So I think it's variable AI is important. And I think auditing as well, face recognition today is a technology that seems highly problematic. And I think given where we are, I think society, we have a hard time trusting many face recognition systems, certainly here in the United States, unless we have some fair third party audit to reassure us that they're doing the right things. So these are important technical things for us to work on to then bring in to part of the overall change management. Now so I talk about the major issues and what I think machine learning should do is not to be better at thinking systematically about the full cycle of machine learning projects. And so here's what I mean, we've been celebrating a lot, the development of better machine learning algorithms. And when the team develops a successful research paper or proof of concept, that's wonderful celebrate that this is a phenomenal progress. And the progress the work needed to actually take a system to production is even much bigger than that. There's all this stuff that needs to be done. And so I think what.. It does is actually a very influential paper out on Google title, "The High Interest Credit Cards Technical Debt." Sorry, not necessarily the title, "The Technical Debt to Machine Learning, "High Interest Credit" something apologies for messing up the title. Is very influential paper, all of Google several years ago they talk about this. And I think in addition to building the machine learning model, all this other stuff is something that I hope we can become more systematic at. Now, when I talk about these things, some people ask me, is this engineering or is it research? And I think that can be either. I remember when a decade ago leading researchers were telling me that they thought neural networks were in scientific, they're leading researchers let me say in computer vision, not Fei-Fei other leading researchers in computer vision. They'll tell me neural network wasn't scientific, well she said, he argued get real, like, why are you just messing around with finger off neurons? It didn't feel scientific to them at the time. And I think someone I'll talk about today will feel like engineering, but I think both the entering community and the research community can do a lot to make AI engineering much more repeatable and systematic. So this is one of, to think of as, so the major phases of an AI project, we had to scope the project, decide what problems to solve, acquire data, carry out the modeling, right? Build the model and then take it to deployment. And what they wanna do is go backwards and run through these four major phases. And very quick lessons learned from each of these. So let's not to deployment, they'll go back to modeling data in this scope here. So deployments, obviously train machine learning model. We still need to build a car or H implementation and built monitoring tools. And I think the business world we're getting better at how to deploy these systems to production. So for example, one design pattern that often use is a so called shadow deployment, where we may, for example, deploying x-ray diagnosis system, but not use it to make any decisions, but just shadow a doctor, right? So this is safe because it's not doing anything it's just shadowing doctor. And this gives us time to monitor the performance of the system. Initially, it's making reasonable predictions, reasonable diagnosis before we then allow it to play a role in making recommendations. Canary deployments is another common design pattern where we roll out to a small subset of users to monitor it to make sure the data distribution hasn't changed. And only after doing this, did we ramp up deployment. So I find it right now. A lot of this work is done by engineering teams, but more systematic tools, as well as research to hope, make this whole process more systematic, I think will make the deployment process more repeatable and reliable. And of course we also need some longterm monitoring and maintenance. So one thing I'm trying to do into your educational context, as well as I think we've done a lot, right? Stanford's deeplearning.ai really around the world. Many institutions, many universities, many causes to teach people how to build models. I think we still should teach more people how to go through this deployment process. So that we can have more highly qualified machine learning engineers. Going backwards let's talk about the modeling process. So it turns out that I find that building machine learning models is highly it's a process. It feels to me much more like debugging software than developing software. So this is an interesting cycle, right? You come up with an idea for an AI architecture and then you will code it up and train them all though. And then it never works the first time. And so you analyze the result. I remember sometime back, I trained a Softmax Regression model on my laptop. There was a small experiment around, so it was small. So I didn't need GPU anything. Booted up Jupiter notebooks on my lap Mac laptop, actually the same laptop that I'm using to speak of you via Zoom right now. Coded up simple Softmax Regression, stipulate data cleaning all at Jupiter notebook, train Softmax model, and then the work the first time. And I still remember to this day, my personal sense of surprise, 'cause I just couldn't believe it. Wow, I trained them all they're worth the first time that never happens. And then actually spent several hours debugging it 'cause I just didn't believe it. It turned out it actually was working, but then she learning, it almost never works the first time. And so a lot of the loop is caring analysis to figure out what's wrong with the model. So that can change the algorithm with major architectural what are view and you go around this loop and I find it in the execution machine learning projects are from the way we carry out sprint planning. If you're using agile development process, right? A lot of this iterative it feel more like debugging, hopefully that guidance will be helpful to some of the managers having issue with any projects. And if you look at the way we develop learning algorithms today a machine learning model has three major inputs. You need the training data. You need to choose your algorithm, the neural network architecture or whatever the piece of code and you have the choice of hyperkeratosis. And so when building research results, we often download the standardized training data or depth set, test set benchmark and then and how you feed all of these to train a machine learning model. When doing research, we tend to keep the training data fix and vary the neural network architecture and vary the hyper-parameters. And we do that so that different algorithms can be compared to each other on a one-to-one basis. But I find that in a marveling production setting, I often find myself holding the algorithm fix often holding the hyper-parameters fix and just varying the training data. So, one example and actually I've actually given direction to my teams where I'll tell the teams, hey everyone, the algorithm is good enough. Please just use retina net and please don't change the algorithm. Let's just keep changing the training data in order to make the algorithm work well. Maybe one example, when I was working on speech recognition, we do law work on the speech model, but eventually I thought, all right, this algorithm is good enough. The algorithm and the code works hyper-parameters yes, still a little bit of time tuning, but our daily daily workflow was, we will look the speech recognition systems output and do error analysis. We'll figure out, oh, I'll speech recognition system has a really hard time listening to people with a certain accent, right? I was born in the UK. So just as a hypothetic example, let's say has a really hard time with people have a British accent that wasn't what we show. But let's just say British accent since I was born in the UK and they will say, great, let's go and get some more British accented training data. And that was actually the ration we keep on shifting the training data in order to improve the machine learning models performance, and that keeping the algorithm fix and varying the training data was really in my view, the most efficient way to build the system. Now I know some of you, I know that a lot of professors and researchers in the call, some of you are saying, "Hey, Andrew is this research is his injury." And I'll say, I actually don't know. I think it's both. But I think the research community can do a lot to help make this process much more systematic as well. All right, working backwards. So talk about deployment modeling data. How do you acquire data for the model, in a corporate setting I've seen, I sometimes I've met actually I think there are 23 CEOs have signed up for this. So sometimes it talks to the CEO and they'll say, "Hey, Andrew, give me two years or three years "to let my IT get into shape "then we'll have this wonderful data "then we'll do AI on top of that." And I think that's almost always a terrible idea. You should almost always often most companies have enough data as to start getting going and it's only by building a system that you can then figure out how to build out your IT infrastructure, because there's so much data you can collect. Do you want more user data, more click stream data? What data do you want? Is often by starting to build an AI system that you can then work backwards to help decide what additional data to collect. And one of the aspect that I think is under appreciated in terms of thinking about the full cycle of machine learning is deciding on clear data definitions. So, here's an example I got from my friend Kian Katanforoosh, that teachers, Deeplearning CSU 30 with me on campus at Stanford. And she's also CEO of a Workera. But so Kian really likes IGUANA. So he ends up Christian Bathelogo came with this example. So, let's say you give labelers instructions to draw bounding boxes around IGUANA. So yeah, well, some labeler will draw this down the box, different labeler will draw boundary box like this, right? Two boundary bosses for you're two iguanas. Another labeler would draw upon the bosses like this. And so I find that there's a lot of inconsistency in how labelers will label things unless you drive to very clear data definitions. And yeah, this is true for yeah. Well labeling iguanas may there be example, but I see this all the time in manufacturing. I see this all the time in healthcare where even two doctors don't agree on the right label and they see those in speech recognition, agriculture and other domains as well. And I feel like we've used the concept of human level performance to improve AI. And what I see is there's so many items that measure human level performance and then they will go and say, "Hey, my AI system, outperforms human level performance. "Therefore I have proven that my AI system "is better than humans and thus you must use it." Right? And I find that whereas human though performance is a very useful development tool, is a very useful benchmark, is great for publishing papers. I find that in a practical deployment context, the exercise of proving were superior to humans. That's often not, like that's actually not the right approach because ultimately what we want in a healthcare system setting is not just superiority to humans. We want to solve a problem and diagnose accurately if a patient has a certain condition or not. So I think that does actually time to rethink how we benchmark and how we use human level performance in building AI systems channel about that later as well. So, just a couple more slides, it's just a few more sites than a wrap up. I find that. Yeah, well, one of the things I'm still trying to get better at is a scoping, a useful problems to work or to solve. And so, I find that when AI is very interdisciplinary, I find that AI by itself is totally useless. What's AI for? It has to be applied to some important application or some useful application for it to create value. So when it meets with my healthcare friends or when I meet with business leaders, in manufacturing, in telco, in agriculture, I usually tell them, don't tell me about the AI problems. I don't hear about your AI problems. Tell me about your healthcare problems or your business problems or your telco problems, or even the fashion problems, whatever. And it is my job to work with you to see if there's an AI solution. But so my common workflow is to learn about my collaborators business problems and then to work together to brainstorm AI solutions. I think there's certain set of things that AI can do. So instead of the things that are valuable for business, and then when I use the term business, I mean in a very generic way, I mean also, for collaborative research lab or for a non-profit or for a government entity, right? And then, but we want to select projects at the intersection of these two sets, only AI experts today have a really good sense of what's in the sense on the left. And the only domain experts have really good sense of what's in the sets on the right. And so tend to go to partners that are domain experts, ask them to tell me what their problems and then brainstorm solutions. And I think also we all did the process of diligence on value and feasibility and resourcing and milestone. So these are important parts of the scoping process. So just wrap up, I drew this picture, this currently highly data process, where sometimes you'd go from the latest stages to the early stages. Building models is a focus of a lot of AI research, which is great because, that's made a lot of progress there, but I feel that for AI to reach its full potential, especially outside the consumer internet industry, which is maybe the one industry that's gotten really good at this, I think there's a lot we need to do to get better at the cross functional based on me to pick projects, the data acquisition as well as was the deployment technologies and processes. I wanted to end just two more slides. McKinsey had a study estimating $13 trillion worth of value creation through AI, which sounds like a lot, that's a lot. But the most interesting thing to me about their study was showing that this untapped options you may live outside the consumer incident industry, that the amount we could do to help people around the world, in all of these other industries, from retail to travel, to trap transportation, to various forms of manufacturing, to healthcare, to other industries, maybe even bigger than what we've seen in the consumer internet tech industry so far. But to realize that value, we need better research and better engineering in order to make that happen. To summarize, much work in industries, outside consumer internet still needs to be done to bridge the proof of concept to production gap. So the key challenges of small data generalizable is the robustness and change management. And I think we should think more systematically about the full cycle machine learning projects. Today in our intro to programming costs. CS 101 or CS 106, right? Stanford has wonderful lecturers like, Miran, Sham and others that teach undergrads how to debug software. And I think we as an industry have turned software industry increasingly into systematic engineering discipline, where we can now relatively accurately predict what a software engineering team can and cannot do. I think machine learning is still too much of a black art where there are people who've experienced it can get it to work for some strange reason, but why is it that I think that we together academia and industry should work to turn machine learning from this, black art intuition still based discipline into systematic engineering discipline. And there's only if we do that, we need to develop the processes and then also teach people their processes. Then that will be a big step in breaking open AI, into many other industries. So with that let me say, thank you very much. I'm looking forward to talking some of the questions on slide you as well so thank you. - Thank you so much, Andrew. Thank you for the wonderful talk. There's some really interesting questions in Slido, but first, I have a burning question for you, which is, when you're working on the checks net problem of chest x-ray diagnosis, you can write down objective function, it's a supervised classification problem. You can build an algorithm and you can go on off and do it. But at the end of the day, you have a radiologist that's trying to make a diagnosis. You have an algorithm that's trying to make a diagnosis. On the one hand, you can try to make a decision just based off of the algorithm. On the other hand, you could just have the physician make the diagnosis and then there's a whole spectrum in between. How might you systematically study what that right human interaction is with that predictive model. And how do you handle accountability if the decision goes awry between man and machine, a person and machine. - Yeah, right. Great, question. I think the short and simple answer is this is complicated. So I think, there are different groups, including Sanford Pranav Rajpurkar, Jeremy Irvin, Matt Lungren, Cart Langlotz many teams developing AIs to enable this human machine interaction. And what we've found for example is that if the AI is poorly designed, we could unfortunately influence doctors to think to just go with the AI decision. So we're actually designing you on this to try to let the AI convey to the physician that we don't really know, we think is like 70% chance but so that's actually not very certain. So please take a careful look and figure it out yourself. So that AI design was complicated and still evolving. And I think also one thing I'd love to see rise in AI is auditing. I feel like, yeah, who wants someone to audit my calls? They just trust me it works, right? I think that's actually the wrong attitude because for these systems to be deployed safely, we do need to build trust. And sometimes I look at the systems that my team is built and I look and go, gee, do I want to trust this myself? And I would appreciate a third party not to audit my work in a negative way but to just help me spot problems so that I don't deploy a system and then find out much later, it has some really undesirable, bio to some other problems. So I think the AI community should welcome auditor's to have third parties help us find problems in our own work proactively. So we don't deploy a system which we've seen, right? Systems deployed figures really bias against some ethnicity or some gender. So, I think that would be important step as well. - Yeah, I completely agree. And sort of a related question here is something you said earlier in your talk where the doctors, at some point they didn't need an AI explainability tool. What they just wanted was the ability to build trust in that system. So is there a way to sort of study that systematically, like you mentioned the UI, but are there kind of best practices for allowing a human that's interacting with an AI system to sort of build trust in a collaborative relationship? - Yeah. So I think one of the reasons the explainable AI field, it sounds so complicated is because when we talk about explainable AI, sometimes all the different use cases get mushed together. And I don't think it's possible. At least, I don't know how to build one technology, one visualization tool or whatever that simultaneously serves a purpose of explaining to a machine learning engineer what's wrong and does how they iterate to improve the algorithm. And also explains to an end user why the AI system generated this conclusion and why we hope they're comfortable with it. And also, show a doctor a subject matter expert, why an AI system generated the conclusion, but we want them to understand what we did, but also intervene in and actually think about it. So those are really different purposes, as well as regulates. This is another stakeholder. So the fact that the machine learning numbers is so different and the stakeholders are so different and we hope the stakeholders to take very different actions are ranging from, be comfortable with it if a low making algorithm either a step. So someone reject someone for a loan, maybe there's an appeal process for a lot of times. We just want them to understand the decision, maybe appeal it but most of the time kind of just understand. So I don't think as possible as I say, they don't know how to build one technology. And I think that if we could clearly accept all the stakeholders and the purpose for explainable AI, then we can build more distinct tools for these different groups. - Yeah, I got it. These are all sort of tough things at the heart of human centered AI, which is what HAI is all about. So if I may. - Just one thing. I feel like I really welcome the input of sociologists, because I think a lot of the problems we face is, I love getting economists and sociologists and other stakeholders to help me think through how do we deal with these things that aren't a pure technology problem, but where the algorithms we develop kind of a big role to help as well. - So, here's the question from an economist actually, and I'll just read it verbatim 'cause it's really well written. What do you see as the best way to address challenges of growing inequities, especially economic inequality that AI may bring? Not an easy question, but I think a good concerned one. - Yeah, oh cool. Oh, actually I see a certain from Erik Brynjolfsson. Hey Eric, great to see you here. I really enjoy interactions with Erik and reading his many books over the years. So thanks to those of you. If you're looking for good folks on AI and economics check out Erik Brynjolfsson. I think one of the things about AI, which I think Erik is alluding to is I hate to say it, but I think AI is and will, has a risk of accelerating economic inequality, right? Just at the very high level pattern is let's see. Yeah, once upon a time here in the United States, you could be a small scale chicken farmer and have a pretty nice life farming chickens and selling chickens and so on. But now with first of the internet, essentially the player, say Tyson Chicken, I've never relationship with them, right? Can now use IOT to get sensors from around the country on what's going on, centralized the data, using the internet to headquarters or one data center, use AI to process the data in a centralized way, and then pushed conclusions to IPO technology back out to all of these farmers potentially. And so what we're seeing already is in the software internet world there is the winner take most or winner when they take all the dynamics. So that's why there is relatively small handful of leading research engines and relatively small half of leading social media companies. And because tech has now infected every industry, fortunately or unfortunately we are infecting almost every industry from agriculture to manufacturing, to retail, to logistics with more and more of the winner take all type of dynamics. And this is contributing to any policy, unfortunately. I wish I knew how to add just this. I think that government needs to play a huge role in this to ensure that we're gonna create tremendous well, that's clear have already done so as a community we're workers to do so. I love to see governments and actually Erik and I, we're chatting a lot about ideas like unconditional basic income, or even conditional basic income to give people a safety net. And then I think education is still not the panacea, but they're very powerful to make sure that people's whose livelihoods have disrupted. You have a chance to learn more and contribute to the economy and earn a livelihood for themselves and their families. I feel like as an AI technologist, I have seen AI create tremendous value, but I hope to all of us on this call, that when you weigh a hot seat and make a decision and do try to make a decision to bias things, towards making sure that the wealth we create, is fairly shared. - Yeah, I completely agree. And I think that's a nice segue to another question that sort of bubbled up to the top about the role of sort of industry versus academia and sort of a half and half nots, right? So an industry you have access you've had a consumer internet based company, right? You have access to more data and more compute. So you can do things like OpenAI can with Microsoft to build things like GPT-3 that for example, academics and other people without those resources cannot. So this is of course the cutting edge algorithm. And for now there's an inequity here. And the question here is like, what effect do you think that this impact will have on society? And will we see an evening even playing field in the future? Like how should we all think about that? - I feel like the future is not yet determined and it is up to all of us. What we saw in the semiconductor industry, making microprocesses is the center of gravity has shifted significantly from academia to corporations because most of the universities just don't have the process, just don't have the resources to design and take a new semiconductor chip to fab. And so a lot of the influence is now concentrated in you, if you great companies like Nvidia, Intel, AMD, right? And a few others. I think AI has made some of that shifts where today there is somewhere that is much easier to do in a corporate setting, than in an academic setting. On the flip side, there's still plenty of stuff to do in academia. And I think that, there's plenty of stuff. I look at all of the amazing research that goes on across HAI and across Stanford and across academia. I think if you look at the papers say, at top conferences, like ICML in Europe, I clear and so on. It is true that large corporates has a growing share. And that's great. 'Cause I'm really glad that the large corporate are spending resources doing research and sharing the results with us that's great. But university is also has plenty of great research as well and still do my smaller companies. I think one thing I love about the AI community is there is it feels to me that the AI community grew up with a genuine spirit of sharing. And I think we, as a community, all of us, any of you, if any of you listening to this work for large corporation, you have a voice and your work matters go ahead and try to influence your large corporation to stay true to the spirit of sharing ideas, because it's only by doing that, that we could create more value for everyone. So I think the future is not set. And I think the values of all of us as a community will have a big influence on whether there continues to be very diversified, very wide spear research with ideas fair to share it, or ends up being more concentrated. I am an optimist. I think we're actually on a good path, but then it is up to us to keep pushing in that direction. Oh, sorry, and of course we all know that yesterday supercomputers is today smart. - I think my Zoom crashed back. Oh, we roll back. So yeah, I tend to agree with you that the future is not set but like, as of now, at least for like the best performing large language model, at least in sort of a few shot task, right? Everyone that's not open as sort of stuck licensing the technology. And I can see a way in which we move towards a more equitable version of the distribution of that technology. But let's say if we were to be stuck in that, like what would be the right way, is this like a good thing, a bad thing? Like how should we think about it in the present? - Yeah, I think it'd be a terrible thing if everyone has a license GPT-3 technology from a single provider lots of credit to the OpenAI team for building that. Fortunately I think there are multiple companies on the planet with the capability to replicate that. So I would certainly, welcome right, more large companies to build these things, to make sure that this healthy competition, it is an unfortunate dynamic of AI that there is this, we can low value, but we also celebrate this when it takes all dynamics diluted to. It possible that eventually we need to make sure that we have good regulatory frameworks to ensure that we have the societal effects that we want. Although for now, if you look at cloud computing does relatively small number of cloud providers, so that I'm not saying that's not a problem. Fortunately it's remained relatively competitive, but this is something that I think government should play a role to make sure that we as a society, get to the outcomes that we want. - Okay, well, there's one minute left. I'll try to sneak this one in quickly. What do you think are the challenges for privacy preserving machine learning in the healthcare industry and what do you think needs to be pushed forward the most in that area? - I feel like the one challenge we have in privacy. So it's the buyers and other things is we've not yet come to agree on what is the standard we want to stick to. And this is two effects, both bad one and AI team we'll build something. And as long as they know they're doing okay, and then deploy it. And then, two years later, some new standard arises, like I think we all know this certain protective characters you should not bring the case. But when I actually here's a concrete example, I went to major image search engine and I was searching deliberately for image search queries that will show gender bias. It took me about 10 minutes to find one, one of the major image search engines eventually I figured out. I think I found elected official person that search query on one of the major search show, old men and the first page. Now so we could get sensationalist upset about this 'cause it was bias. This is horrible. If my 19 month old daughter Nova, if she sees it as maybe she'll think, wow, elected official should be all men. Maybe she should never aspire to be elected official. So we could get alarm is about this. The other side of the story is that she told me 10 minutes to find a higher highly bias query. So probably more the fact that over that much, one of the problems with these is that a lot of these issues of bias privacy, they're statistical concepts, and we as a society need to get better at not latching on to anecdotal evidence and to measure these in a statistical way, because I think I'm not gonna name the search engine because I think on average, the team actually did a great job and on average their queries are relatively fair. And I think part of what we need to do both business corporate regulators is establish fair standards to clearly lay out what we want and don't want. And then also to rigorously audit the assistance against established criteria and this will help two things, one, it diminishes on their research engineering side, the fear then I'm gonna roll out something. And then two years later there'll be some new things. I just never thought of. Hey, who are the thought that I can't describe, some new criteria maybe a figure out we're discriminating against people that live in my hometown of Los Altos. And is that okay or not? Sounds like it was not okay, but if an established care criteria and an audit then it makes life easier for the engineers and researchers. And also when we roll these things out, gives us a sense of what are exactly the privacy standards and the fairness standards we want the systems that are here to, but until we get there, I think, we ended up with just more confusion and people getting surprised criticism, which isn't a go the goes to make the systems fair not to randomly make people feel bad about the systems. So I hope we can get there. - I tend to agree with you. Well, we're a little over time. So with that I just really want to thank you for coming and kicking us our inaugural our HAI Research Seminar Series. And thank you to the community for tuning in. The recorded seminar will be posted on YouTube by the end of this week. And next week, same time, same place. We'll have Percy Liang discussing semantic parsing for natural language interfaces. So you can check that on our website. And thank you so much again, Andrew. - Yeah thank you. - Thank you, Andrew. Fantastic. Thank you. - [Andrew] Thanks everyone.

Info

Channel: Stanford HAI

Views: 16,962

Rating: undefined out of 5

Keywords:

Id: tsPuVAMaADY

Channel Id: undefined

Length: 62min 8sec (3728 seconds)

Published: Wed Sep 23 2020