- Really exciting to kickoff the Stanford's new academic
years, HAI Seminar. For those of you who are
joining us for the first time, HAI is a relatively newly
established Institute at Stanford for Human Centered
Artificial Intelligence. My name is Fei-Fei Li. I'm a Professor at
Computer Science Department and also co-director of HAI
with Professor John H. Madey. And just very exciting
that joining us today for the first kickoff seminar is one of the world's
most famous renowned, and also beloved member
of Stanford, AI community, Dr. Andrew Ng. Before I give a brief
introduction about Andrew, let me just say a couple
of words about HAI. HAI is a Institute whose
mission is to advance AI research, education policy, and outreach to better human conditions. We are a highly interdisciplinary
Institute here at Stanford that works on very advance
advancing AI technology, as well as many of the social and human issues related to AI, whether it's fairness and bias ethics, future of work geopolitics, and we work with important
professional schools at Stanford, such as school of education,
business medicine, and all across the campus. So for those of you who
are part of the community, we really invite you
to join us in any way, whether you're students
or researcher or faculty, or just alum, for those
of you who are joining us around the world we welcome you to sign up to our meeting list or events list. We hold a lot of exciting
events and our weekly seminar is intended to bring you the latest thinkings by AI scholars and salt leaders who are at the forefront
of making changes in AI. So, like I said, it's so what
an honor to introduce Andrew, who has been a longterm
friends for more than a decade, speaks of our age Andrew that.. Andrew is literally the first person I met when I joined Stanford. And he is the founder
and CEO of Landing AI, also founder of
deeplearning.ai and co-chairman and co-founder of Coursera, all of these entities are
changing the world as we speak. I can't imagine a person does all of them and his currently also
an Adjunct Professor at Stanford University. He was a Chief Scientist at Baidu and part of the leading
team of Google brain project many, many years ago when
it was just starting. Andrew will share with
us his topic of bridging AI's proof of concept to production gap. And before we start our session today, I just want to introduce also
a really important colleague of mine whom you'll
become very familiar with as the academic year goes. And this is our new
Director of Research at HAI Dr. Deep Ganguli. And he'll be sharing a few
house rules with everybody. Thank you. And thank you, Andrew, for joining us. I'll be listening to
your talk attentively. - Yeah, thank you, Fei-Fei. - And thank you Fei-Fei. And then before we kick
off just a few house rules. So first thanks everyone
for tuning in today. I'll be moderating the
question and answer session at the end of Andrew's talk. So to submit a question, please use the Slido website. There's a link in the
chat box of the Zoom, and you can also point
your phone at that QR code, or you can go to our events website at hai.stanford.edu/events and
click join the conversation. And without further ado,
Andrew, please take it away. - Great. Thank you. Thanks Deep and thanks Fei-Fei. It was surprising when
Fei-Fei mentioned that HAI is a relatively new Institute, because I think thanks for her leadership and John Ashman, and his leadership at Stanford in an AI world, it feels like it's already
a major institution. So it's interesting to reminded
that despite HAI's presence and all of these wonderful
events I see across the campus, the virtual campus all
the time from HIA is interesting to reminder that is given just in the early days. So it's nice to see everyone here. Last night when I was looking
at the attendees list, I noticed that all of you watching this is a very diverse
audience I counted 23 CEOs on the attendee list. There are a few dozen professors,
a few hundred students, also a few hundred
machine learning engineers and machine learning
researchers, and one person, on the registration form
listed themselves as a poet. So whatever you are, CEO,
a professor or a students, machinery engineer, researcher, or poets, I'm really glad to see you here today. And thank you for joining us. What I'm going to do is
share with you a perspective on one of the challenges facing AI. AI has created a ton of
value, but a challenge and almost a bottleneck or barrier to create even more value. And this is something I see
in multiple universities, companies, industries is
bridging the proof of concept and to production gap. So let me share my slides. And what I hope to do today is
share with you a perspective on this proof of concept
of production gap, as well as what all of us
in academia or in business, or maybe even I saw a few
government leaders as well maybe they do to overcome
some of these challenges so that AI can become even more useful. So I've been say for I
think about five years now that AI is the new electricity, similar to the rise of electricity by the a hundred years ago, AI is poised to transform every industry, but what have we really done? I think AI has already
transformed the software industry, especially the consumer internet industry. So, we've transformed, we as a community, we've collectively transformed
web search on an advertising, machine translation, social
media, a lot of great things, also some problematic things, but the software internet
tech industry has many teams that know how to use AI well, there's still a lot more work to be done, but it's clear it's
created tremendous value. Once you look outside
the software industry, I think AI is impacted is still matters and growing. But I think that look into the future. The impacts of AI outside
the software industry will be even bigger than his impact on the software industry, but the way we build
and deploy our systems in all of these other industries, what will have to change a bit in order to make them more effective. Now, given how diverse today's audiences, I want to take one slide to
just say what I mean by AI. So AI means a lot of things these days, but as some of you will know, 99% of the value created by
today's AI technology is through one idea which is called
supervised learning, which we learn inputs, output mapping. So if an email an output
is this spam or not, that's just spam filter
or you put an audio clip and outputs, a text transmitter
does speech recognition, powering voice search,
powering the smart speakers you may have in your homes, most lucrative application of this is a probably online advertising, not the most inspiring application, but certainly very lucrative for some large online app platforms in which you input have an AI system that was an ad and some
information about you and tries to figure out if you click on the ad or not, because showing slightly more relevant has a very direct impact
on the bottom line at the launch online app platforms. Some work that my team at Landing AI has been doing is visual inspection, where we'll take as input, a picture of manufactured objects. So a picture of a phone. And we tried to tell if this
object that's been manufactured is scratch little gentle
or has some other effect or medical imaging. My Stanford group does a lot of work on where we input chest
x-ray image and output do we think this patient
has has a pneumonia or some other condition. So the AI world has generated a lot of amazing research progress, a lot of amazing proof of concepts in the business world as well. For example, this is a one
result that my collaborators and I had announced sometime back this was Pranav Rajpurkar,
Jeremy Irvin, Matt Lungren, Cart Langlotz, Bhavik
Patel and many others. Chest x-rays is one of
the most commonly done medical procedures. We use it to help diagnose pneumonia, lung cancer also COVID. So there's about 2 billion chest x-ray for procedures
per year worldwide. And so we announced a result where we claim that deep learning
achieved radiologist level performance on the 11 pathologies and did not achieve
radiologist performance on three pathologies. And many groups have announced
results of this flavor. Well, Fei-Fei's groups,
published wonderful papers of this flavor of other
groups of Stanford. Sebastian through and through
diagnosing skin cancer. And then just many groups
around the world have published results saying that AI does
as well as a human doctor on diagnosing something from
some type of medical modality. So given all this
amazing research progress and these amazing proof of concepts, why aren't these systems widely
deployed to hospitals yet, if you were to get a chest x-ray today in most countries in certainly
in the United States, but in most countries is very actually in, in all countries is very unlikely
that there's an AI system reading your chest x-rays. So why is that? If there are research papers
that are peer reviewed and that I will stand behind my papers, saying that supposedly they outperformed even both certified Stanford radiologists. So what I see across the AI
world is that there are many research studies or proof of concepts. These are the things that
works well on a researcher's laptop running in the Jupiter notebook, but that still needs to
bridge that proof of concept to production gap. So what I hope to do today
is share with you three of, I think the top challenges in
bridging the proof of concept reproduction gap in hope
that wherever you are, whether you're in academia or business, or for profit non-profit government or governments that you
are, have an exciting idea. If you can help your team
get to proof of concept, that's wonderful that should
totally be celebrated, but watching out for
some of these challenges, I hope will also help more AI projects get into practical deployments. So I think few of the top
challenges in bridging the proof of concept production gap are
challenges of small data of generalizable and robustness
and of changed management. So then go with these three
and then also talk a bit about the full cycle of machine
learning projects, which I think will help
all of us as a community, take more AI projects as
successful productions deployments. So let's start a small data. A lot of AI had grown up in
consumer internet companies, right? They're very large tech
companies that have hundreds of millions or billions of users. And when you have that many
users, you have big data. So I find that a lot of
AI philosophies and tools and approaches were tuned to big data given the nature of the companies in which AI had grown up, but a lot of industries
have much smaller data sets. And for AI to be useful
in those industries, we have to have pets,
a small data algorithm. For example, take visual inspection of smartphones the example that I alluded to earlier. If you have a million
pictures of strap smartphones, then today there are, you
know, at least dozens, maybe hundreds of teams
that can build a new network to diagnose if a phone is stretched. And in fact, really building on this can't be emphasize more some
of the work on emission, all of that open source work, right? That people built on top
of many of those models that work well for the very
important, big data problems. But fortunately, no
factory has manufactured a million scratched smartphones, which would therefore
have to be thrown away. And the question is given
only a hundred pictures of scratched smartphones, which may be all the data that exists. Are you able to build a
accurate inspection system? And this is critical for breaking
open these applications of machine learning in vision inspection where only small data sets exist. To dive more deeply into small
data here's another example. So the result I mentioned
just now was I said, deep learning achieve
radiology performance on 11 pathologies and did not on
three pathologies, right? Well, these are the 14
pathologies you can focus just the first and the last column, the middle two columns has has a accuracy and confidence are the most. But let's dive into a few of these rows. So for the condition of a
fusion, we have a lot of data. We have 11,000 examples. And so there a deep learning
algorithm was able to diagnose at a level of accuracy
that was statistically indistinguishable from radiologist. But if we look at a rare
condition like earlier, where we have about a
hundred examples there, radiologists still all perform
the learning algorithm. So it turns out that learning
algorithms work well on datasets, where the distribution is like that's on the left. You have a thousand examples
of every class then is it's not easy, but it is relatively easier to get the learning outcomes to do well on all of the classes, but it doesn't do as well
when your data distribution looks like that's on the right, which is what we actually
face in the medical domain. And I've been in a lot of conversations. Oh, actually, I've listened
in on long conversations between the machine learning engineer and a product leader or business leader or a hospital leader. And the conversation goes like this, the machine learning person will go, "Oh, look, I have achieved
very high accuracy "on the test set." And there's a fair test
said that I looks at, to the peaks as a fair test a validation, and then the hospital leader or the doctor or the business leader probably that says, congratulations on your
high test set result, and on your research paper publication, but your system just doesn't work. And the machine, the only researcher or the machine engine says. "Yes, but I do really
well on the test set." And then the conversation
ends there, unfortunately. I think our job as machine
learning researchers and engineers and
developers is not just to do on the test set is to solve the problem that actually matters so the
use case wants to address, and I find that common metrics
such as average accuracy, do not reflect these small
data occurrences problems. So for example, if your
data distribution is, I'll say that on the right
is completely fine to ignore the hernia condition. Just never predict the hernia
and your accuracy is just fine because hernias are so rare. But for practical applications is, proudly medically unresectable
to deploy a system that misses completely
obvious cases of hernia. And so even though hernia is very rare and on an average accuracy standpoint is less important for the
practical hospital needs my team's work with a few hospitals. So we're on the ground doing this work is important I just use
them rare cases as well. So I think both on the... I think fortunately I think
that research community and the business community
is making progress on better algorithms
for handling small data. For example, I'm excited about synthetic
data generation using... GANs were created by my former student, Yenko Fellow, who was actually
a Stanford student way back. But with GANs does actually
example in vision inspection, generating stretches of cars using GANs. So you don't need a million
scratch costs to learn to detect scratches. You can synthesize
scratches that actually, I can't tell a synthetic
scratch from a real scratch. And it's exciting research
probably is also on one shot learning and few shot learning where algorithms are able to learn from very few trainees examples. And I think GPT-3 released
just a couple months ago, because just quite recently, it was a very exciting
step to one shot learning a few-shot learning language still made. I'm excited about self-supervised learning and self-taught learning where we learn from large
amounts of unlabeled data to do a label task transfer,
learning and anomaly detection. But I think all of these are technologies that I think are exciting
to help us overcome the small data challenges
that are much more pervasive. Once you go outside your
software, consumer internet. Other than small data, a second challenge in bridging
their proof of concept to production gap is
generalizable in robustness. So going back to the AI
Deeplearning for extra diagnosis example, it turns out those
of us that work a lot in research and also production settings. We'll, know this, a
model that works workload is there a published paper, often doesn't work in
a production setting. So for example, when we collected data
from Stanford Hospital. Stanford has a relatively
modern x-ray machines and very well trained technicians. So we collect images there. And when we train in
tests on images collected from Stanford hospital, we can publish papers
that are peer review. And then I will stand behind
showing that we can help perform human radiologists when we're sharing the tests
on data from the same hospital. But it turns out if you take this model and walk me down the street, maybe it's an older to hospital
using older x-ray machines and where the imaging technician, where the x-ray technician uses a slightly different imaging protocol. So maybe the patients are
tilted at a slight angle, then the performance degrades. And this is in contrast to the performance of any human radiologists that would be able to walk down the street from Stanford Hospital
to this other hospital and do just fine. So. There is a huge gap between
what works in a research lab versus what will work in production. This is true not just for healthcare. This is true for many
other industries as well. And so I think one thing
that we should work on both from the research and on
the practical engineering side is better tools and processes to make sure our rubber was generalized
to different datasets. Then those trains I'll share
a few more thoughts on this when I talk about the full cycle of machine learning projects. Finally change management. AI technology can take a
workflow and automate part of it and that can transform the work of a lot of people around it. And I think we need to
get better at managing that overall change. So here's an example. This is some work I did
with on Anand Avati, Nigam Sham, Ken Jung and
others on palliative care. So palliative care, which is roughly end of
life care helps patients with terminal illness
enjoy high quality of life. We know that here in the United States, doctors in general make
fewer palliative care referrals than we would like. I think, doctors are good people, many doctors want to keep
fighting for the patient because they care so
deeply about the patient. And my father's a doctor, and I know it's actually
hard for a doctor to give up and you just keep
fighting for the patient, which is a great attitude
we want adults to have, but we know across the country that doctors make fewer
palliative care referrals than one might wish. Now there is in many hospitals, including Stanford Hospital, there's a specialized
palliative care unit. Those palliative care doctors
could proactively reach out, but the volume of
patients mix chart review, reviewing patient records and feasible. So what we did was we
built a learning algorithm to predict the chance of mortality, of a patient over the
next three to 12 months. And this recommends
patients for consideration and for palliative care. So this is the workflow
that we actually build. at Hila palliative care
staff, the hospital, and the data is made
up for student privacy, but this is actually pretty much what she was seeing every morning. But so Dr. Harman would
wake up in the morning and pull up a table from the database, which looks pretty much like this. Where she would see patient IDs, age of the patient and
the learning algorithms, estimated probability of mortality. This allows her to decide
what patient chance to review in greater detail and what doctors to co-op to
recommend for consideration their patients for palliative care, or to make sure that the
advanced care director is taken care for example. So what do you think happened when we first rolled out the system? Right? When a doctor calls up
another doctor and says. Hey, I think your patient is
at high risk of mortality. What do you think happens? Well, maybe not surprisingly the doctor on the receiving end the
phone call goes, who are you? And who are you to tell me
that my patient is at high risk of mortality? So what we realized was that we had to perform the change
management process better because the system like
palliative care affects a lot of stakeholders. It affects doctors, it affects nurses, it affects Hospital administration, insurance, outpatient services. And of course, most importantly of all that affects the patient. And so I like in law projects, I work on, I've learned over and over
to go through the appropriate change management process, because when we take a hospital's workflow and automate just a piece
of it, the reading x-rays, or making palliative care
predictions or something else, it disrupts or transforms the work of so many people around it that, by she didn't have time
identify stakeholders, for fact reassurance experience happening. Right-sized first project
all of these things are important for us as technologists and our business leaders
and or academic researchers. If we want to play a role
in making sure our amazing technologies go out
there to have an impact. And I just mentioned key
technical tools to managing change are explainable AI and auditing. And I just go to the show I think Fei-Fei and HAI have been real
thought leaders in helping at that conversation on both
of these important topics. I know that the, some AI leaders, they think explainable AI is not important that you train a black ball Deeplearning. Why do you care to explained that and now she probably just
don't agree with that. And maybe a quick story when we started, and explainable AI is
actually complicated. So, one quick story of
of where I got it wrong. When we built it first
palliative care system, and we start to show it to some doctors. The doctors feedback was, "Hey, how can you learn the algorithm "possibly tell me that this patient "has a 78% chance of dying in
the next eight to 12 months. "Like, how could I possibly
trust your AI system." And so we actually built, Anand Avati actually build a system using an algorithm semester aligned. It's a technology that I've heard of it is we tried to generate explanation
for the doctor that says, we think this patient is
a harvest of mortality because looking at the
health record, the EHR, they're trying to have
record there they received this diagnosis and they went this test. So this is why we think this patient is at high risk of mortality. And guess what happened? Doctors looked at it, they looked at a small handful of patients looked at the explanation speech generated and they go, "Oh, got it." And then they never looked
at the explanations again. And the lesson either and was
the doctors didn't actually need us to explain to them why a patient is at a
high risk of mortality, they're completely
qualified to look at an EHR and judge themselves if this patient is at high or low risk mortality. What they actually wanted
was some reassurance that our machine learning algorithm, then our AI system was generating
reasonable conclusions. So what they wanted was just
enough of an explanation for them to feel like our
system was being reasonable. And once they had that level of comfort, they didn't care about
explanations anymore. They just didn't want
to look at it anymore. And they would just look
at our recommendations, use our system for screening, but then look at the patient charts, patient records themselves
in order to decide what they wanted to do. So I think one of the
reasons explainable AI is so complicated is because I think a few keeps on confusing, who is this for? Are you trying to generate
explanation for the doctor, or for the patient, or for a
regulator or for someone else? And also what is the action
you want them to take? Do you need them to do something on a patient by patient basis? Or do you want them to just
be generally comfortable or do you want a
regulator to help you spot that as a major floor? So I think it's variable AI is important. And I think auditing as well, face recognition today is a technology that seems highly problematic. And I think given where
we are, I think society, we have a hard time trusting
many face recognition systems, certainly here in the United States, unless we have some fair third
party audit to reassure us that they're doing the right things. So these are important technical
things for us to work on to then bring in to part of
the overall change management. Now so I talk about the major issues and what I think machine learning should do is not to be better
at thinking systematically about the full cycle of
machine learning projects. And so here's what I mean,
we've been celebrating a lot, the development of better
machine learning algorithms. And when the team develops
a successful research paper or proof of concept, that's
wonderful celebrate that this is a phenomenal progress. And the progress the work
needed to actually take a system to production is even
much bigger than that. There's all this stuff
that needs to be done. And so I think what.. It does is actually a very influential paper out on Google title, "The High Interest Credit
Cards Technical Debt." Sorry, not necessarily the title, "The Technical Debt to Machine Learning, "High Interest Credit" something apologies for messing up the title. Is very influential paper, all of Google several years
ago they talk about this. And I think in addition to building the machine learning model, all this other stuff is something that I hope we can become
more systematic at. Now, when I talk about these
things, some people ask me, is this engineering or is it research? And I think that can be either. I remember when a decade
ago leading researchers were telling me that they
thought neural networks were in scientific,
they're leading researchers let me say in computer vision, not Fei-Fei other leading
researchers in computer vision. They'll tell me neural
network wasn't scientific, well she said, he argued get real, like, why are you just messing
around with finger off neurons? It didn't feel scientific
to them at the time. And I think someone I'll talk about today will feel like engineering, but I think both the entering community and the research community can do a lot to make AI engineering much
more repeatable and systematic. So this is one of, to think of as, so the major phases of an AI project, we had to scope the project,
decide what problems to solve, acquire data, carry out
the modeling, right? Build the model and then
take it to deployment. And what they wanna do is go backwards and run through these four major phases. And very quick lessons
learned from each of these. So let's not to deployment, they'll go back to modeling
data in this scope here. So deployments, obviously
train machine learning model. We still need to build a
car or H implementation and built monitoring tools. And I think the business
world we're getting better at how to deploy these
systems to production. So for example, one design pattern that often use is a so
called shadow deployment, where we may, for example,
deploying x-ray diagnosis system, but not use it to make any decisions, but just shadow a doctor, right? So this is safe because
it's not doing anything it's just shadowing doctor. And this gives us time to monitor the performance of the system. Initially, it's making
reasonable predictions, reasonable diagnosis before we
then allow it to play a role in making recommendations. Canary deployments is
another common design pattern where we roll out to a
small subset of users to monitor it to make
sure the data distribution hasn't changed. And only after doing this,
did we ramp up deployment. So I find it right now. A lot of this work is
done by engineering teams, but more systematic tools,
as well as research to hope, make this whole process more systematic, I think will make the deployment
process more repeatable and reliable. And of course we also need
some longterm monitoring and maintenance. So one thing I'm trying to do
into your educational context, as well as I think
we've done a lot, right? Stanford's deeplearning.ai
really around the world. Many institutions, many universities, many causes to teach
people how to build models. I think we still should teach
more people how to go through this deployment process. So that we can have more highly qualified machine learning engineers. Going backwards let's talk
about the modeling process. So it turns out that I find
that building machine learning models is highly it's a process. It feels to me much more
like debugging software than developing software. So this is an interesting cycle, right? You come up with an idea
for an AI architecture and then you will code it up
and train them all though. And then it never works the first time. And so you analyze the result. I remember sometime back, I trained a Softmax
Regression model on my laptop. There was a small experiment
around, so it was small. So I didn't need GPU anything. Booted up Jupiter notebooks
on my lap Mac laptop, actually the same laptop that I'm using to speak
of you via Zoom right now. Coded up simple Softmax
Regression, stipulate data cleaning all at Jupiter notebook,
train Softmax model, and then the work the first time. And I still remember to this day, my personal sense of surprise, 'cause I just couldn't believe it. Wow, I trained them all
they're worth the first time that never happens. And then actually spent
several hours debugging it 'cause I just didn't believe it. It turned out it actually was working, but then she learning, it almost
never works the first time. And so a lot of the
loop is caring analysis to figure out what's wrong with the model. So that can change the algorithm with major architectural what are view and you go around this loop and I find it in the execution
machine learning projects are from the way we carry
out sprint planning. If you're using agile
development process, right? A lot of this iterative it
feel more like debugging, hopefully that guidance will be helpful to some of the managers having
issue with any projects. And if you look at the way we
develop learning algorithms today a machine learning
model has three major inputs. You need the training data. You need to choose your algorithm, the neural network
architecture or whatever the piece of code and you have the choice of hyperkeratosis. And so when building research results, we often download the
standardized training data or depth set, test set benchmark and then and how you feed all of these to train a machine learning model. When doing research, we tend to keep the training data fix and vary the neural network architecture and vary the hyper-parameters. And we do that so that different
algorithms can be compared to each other on a one-to-one basis. But I find that in a
marveling production setting, I often find myself
holding the algorithm fix often holding the hyper-parameters fix and just varying the training data. So, one example and
actually I've actually given direction to my teams
where I'll tell the teams, hey everyone, the
algorithm is good enough. Please just use retina net and please don't change the algorithm. Let's just keep changing the training data in order to make the algorithm work well. Maybe one example, when I was
working on speech recognition, we do law work on the speech model, but eventually I thought, all right, this algorithm is good enough. The algorithm and the code
works hyper-parameters yes, still a little bit of time tuning, but our daily daily workflow was, we will look the speech
recognition systems output and do error analysis. We'll figure out, oh, I'll
speech recognition system has a really hard time listening to people with a certain accent, right? I was born in the UK. So just as a hypothetic example, let's say has a really hard time with people have a British
accent that wasn't what we show. But let's just say British
accent since I was born in the UK and they will say, great, let's go and get some more
British accented training data. And that was actually the
ration we keep on shifting the training data in order to improve the machine learning models performance, and that keeping the algorithm fix and varying the training
data was really in my view, the most efficient way
to build the system. Now I know some of you, I know that a lot of professors
and researchers in the call, some of you are saying, "Hey, Andrew is this
research is his injury." And I'll say, I actually don't know. I think it's both. But I think the research community
can do a lot to help make this process much more systematic as well. All right, working backwards. So talk about deployment modeling data. How do you acquire data for the model, in a corporate setting I've seen, I sometimes I've met actually
I think there are 23 CEOs have signed up for this. So sometimes it talks to
the CEO and they'll say, "Hey, Andrew, give me
two years or three years "to let my IT get into shape "then we'll have this wonderful data "then we'll do AI on top of that." And I think that's almost
always a terrible idea. You should almost always
often most companies have enough data as to start getting going and it's only by building a system that you can then figure out how to build out your IT infrastructure, because there's so much
data you can collect. Do you want more user data,
more click stream data? What data do you want? Is often by starting to build an AI system that you can then work
backwards to help decide what additional data to collect. And one of the aspect that I think is under appreciated in terms of thinking about the full cycle of machine
learning is deciding on clear data definitions. So, here's an example I got from my friend Kian Katanforoosh, that teachers, Deeplearning
CSU 30 with me on campus at Stanford. And she's also CEO of a Workera. But so Kian really likes IGUANA. So he ends up Christian
Bathelogo came with this example. So, let's say you give
labelers instructions to draw bounding boxes around IGUANA. So yeah, well, some labeler will draw this down the box, different labeler will draw
boundary box like this, right? Two boundary bosses
for you're two iguanas. Another labeler would draw
upon the bosses like this. And so I find that there's
a lot of inconsistency in how labelers will label things unless you drive to very
clear data definitions. And yeah, this is true for yeah. Well labeling iguanas
may there be example, but I see this all the
time in manufacturing. I see this all the time in healthcare where even two doctors don't
agree on the right label and they see those in speech recognition, agriculture and other domains as well. And I feel like we've used
the concept of human level performance to improve AI. And what I see is there's so many items that measure human level performance and then they will go and say, "Hey, my AI system, outperforms
human level performance. "Therefore I have proven that my AI system "is better than humans
and thus you must use it." Right? And I find that whereas
human though performance is a very useful development tool, is a very useful benchmark, is great for publishing papers. I find that in a practical
deployment context, the exercise of proving
were superior to humans. That's often not, like that's actually
not the right approach because ultimately what we want in a healthcare system setting is not just superiority to humans. We want to solve a problem
and diagnose accurately if a patient has a
certain condition or not. So I think that does
actually time to rethink how we benchmark and how we
use human level performance in building AI systems channel
about that later as well. So, just a couple more slides, it's just a few more sites than a wrap up. I find that. Yeah, well, one of the things I'm still trying to get
better at is a scoping, a useful problems to work or to solve. And so, I find that when AI
is very interdisciplinary, I find that AI by itself
is totally useless. What's AI for? It has to be applied to
some important application or some useful application
for it to create value. So when it meets with
my healthcare friends or when I meet with business
leaders, in manufacturing, in telco, in agriculture,
I usually tell them, don't tell me about the AI problems. I don't hear about your AI problems. Tell me about your healthcare problems or your business problems
or your telco problems, or even the fashion problems, whatever. And it is my job to work with you to see if there's an AI solution. But so my common workflow
is to learn about my collaborators business problems and then to work together
to brainstorm AI solutions. I think there's certain set
of things that AI can do. So instead of the things that
are valuable for business, and then when I use the term business, I mean in a very generic way, I mean also, for collaborative research
lab or for a non-profit or for a government entity, right? And then, but we want to select projects at the intersection of these two sets, only AI experts today
have a really good sense of what's in the sense on the left. And the only domain experts
have really good sense of what's in the sets on the right. And so tend to go to partners
that are domain experts, ask them to tell me what their problems and then brainstorm solutions. And I think also we all did
the process of diligence on value and feasibility and
resourcing and milestone. So these are important parts
of the scoping process. So just wrap up, I drew this picture, this currently highly data process, where sometimes you'd go
from the latest stages to the early stages. Building models is a focus
of a lot of AI research, which is great because, that's
made a lot of progress there, but I feel that for AI to
reach its full potential, especially outside the
consumer internet industry, which is maybe the one industry that's gotten really good at this, I think there's a lot we
need to do to get better at the cross functional
based on me to pick projects, the data acquisition as
well as was the deployment technologies and processes. I wanted to end just two more slides. McKinsey had a study estimating
$13 trillion worth of value creation through AI, which
sounds like a lot, that's a lot. But the most interesting thing
to me about their study was showing that this untapped
options you may live outside the consumer incident industry, that the amount we could do to
help people around the world, in all of these other industries,
from retail to travel, to trap transportation, to
various forms of manufacturing, to healthcare, to other
industries, maybe even bigger than what we've seen in the consumer internet tech industry so far. But to realize that value, we need better research
and better engineering in order to make that happen. To summarize, much work in industries, outside consumer internet
still needs to be done to bridge the proof of
concept to production gap. So the key challenges of small data generalizable is the robustness
and change management. And I think we should
think more systematically about the full cycle
machine learning projects. Today in our intro to programming costs. CS 101 or CS 106, right? Stanford has wonderful
lecturers like, Miran, Sham and others that teach undergrads how to debug software. And I think we as an industry
have turned software industry increasingly into systematic
engineering discipline, where we can now relatively
accurately predict what a software engineering
team can and cannot do. I think machine learning is
still too much of a black art where there are people who've experienced it can get it to work
for some strange reason, but why is it that I think
that we together academia and industry should work to
turn machine learning from this, black art intuition still based discipline into systematic engineering discipline. And there's only if we do that, we need to develop the processes and then also teach
people their processes. Then that will be a big
step in breaking open AI, into many other industries. So with that let me say,
thank you very much. I'm looking forward to
talking some of the questions on slide you as well so thank you. - Thank you so much, Andrew. Thank you for the wonderful talk. There's some really
interesting questions in Slido, but first, I have a
burning question for you, which is, when you're working
on the checks net problem of chest x-ray diagnosis, you can write down objective function, it's a supervised classification problem. You can build an algorithm and
you can go on off and do it. But at the end of the day, you have a radiologist that's
trying to make a diagnosis. You have an algorithm that's
trying to make a diagnosis. On the one hand, you can
try to make a decision just based off of the algorithm. On the other hand, you could just have the
physician make the diagnosis and then there's a whole
spectrum in between. How might you systematically study what that right human interaction is with that predictive model. And how do you handle accountability if the decision goes awry
between man and machine, a person and machine. - Yeah, right. Great, question. I think the short and simple
answer is this is complicated. So I think, there are different groups, including Sanford Pranav Rajpurkar, Jeremy Irvin, Matt Lungren, Cart Langlotz many teams developing AIs to enable this human machine interaction. And what we've found for example is that if the AI is poorly designed, we could unfortunately influence doctors to think to just go with the AI decision. So we're actually designing you on this to try to let the AI
convey to the physician that we don't really know, we think is like 70% chance but so that's actually not very certain. So please take a careful look
and figure it out yourself. So that AI design was
complicated and still evolving. And I think also one thing
I'd love to see rise in AI is auditing. I feel like, yeah, who wants
someone to audit my calls? They just trust me it works, right? I think that's actually the wrong attitude because for these systems
to be deployed safely, we do need to build trust. And sometimes I look at the systems that my team is built
and I look and go, gee, do I want to trust this myself? And I would appreciate a third
party not to audit my work in a negative way but to
just help me spot problems so that I don't deploy a system and then find out much later, it has some really undesirable, bio to some other problems. So I think the AI community
should welcome auditor's to have third parties
help us find problems in our own work proactively. So we don't deploy a system
which we've seen, right? Systems deployed figures really bias against some ethnicity or some gender. So, I think that would be
important step as well. - Yeah, I completely agree. And sort of a related question
here is something you said earlier in your talk where the doctors, at some point they didn't need
an AI explainability tool. What they just wanted was the ability to build trust in that system. So is there a way to sort of
study that systematically, like you mentioned the UI, but are there kind of best
practices for allowing a human that's interacting with an AI
system to sort of build trust in a collaborative relationship? - Yeah. So I think one of the reasons
the explainable AI field, it sounds so complicated is
because when we talk about explainable AI, sometimes
all the different use cases get mushed together. And I don't think it's possible. At least, I don't know how
to build one technology, one visualization tool or
whatever that simultaneously serves a purpose of explaining to a machine learning
engineer what's wrong and does how they iterate
to improve the algorithm. And also explains to an end
user why the AI system generated this conclusion and why we hope
they're comfortable with it. And also, show a doctor
a subject matter expert, why an AI system generated the conclusion, but we want them to
understand what we did, but also intervene in and
actually think about it. So those are really different purposes, as well as regulates. This is another stakeholder. So the fact that the
machine learning numbers is so different and the stakeholders are so different and we
hope the stakeholders to take very different
actions are ranging from, be comfortable with it if a low making algorithm either a step. So someone reject someone for a loan, maybe there's an appeal
process for a lot of times. We just want them to
understand the decision, maybe appeal it but most of the time kind of just understand. So I don't think as possible as I say, they don't know how to
build one technology. And I think that if we
could clearly accept all the stakeholders and the
purpose for explainable AI, then we can build more distinct tools for these different groups. - Yeah, I got it. These are all sort of tough things at the heart of human centered AI, which is what HAI is all about. So if I may. - Just one thing. I feel like I really welcome
the input of sociologists, because I think a lot of
the problems we face is, I love getting economists and sociologists and other stakeholders
to help me think through how do we deal with these things that aren't a pure technology problem, but where the algorithms we
develop kind of a big role to help as well. - So, here's the question
from an economist actually, and I'll just read it verbatim 'cause it's really well written. What do you see as the best way to address challenges
of growing inequities, especially economic
inequality that AI may bring? Not an easy question, but I
think a good concerned one. - Yeah, oh cool. Oh, actually I see a certain
from Erik Brynjolfsson. Hey Eric, great to see you here. I really enjoy interactions
with Erik and reading his many books over the years. So thanks to those of you. If you're looking for good
folks on AI and economics check out Erik Brynjolfsson. I think one of the things about AI, which I think Erik is alluding
to is I hate to say it, but I think AI is and will, has a risk of accelerating
economic inequality, right? Just at the very high
level pattern is let's see. Yeah, once upon a time
here in the United States, you could be a small scale chicken farmer and have a pretty nice
life farming chickens and selling chickens and so on. But now with first of the internet, essentially the player, say Tyson Chicken, I've never relationship with them, right? Can now use IOT to get sensors from around the country
on what's going on, centralized the data, using
the internet to headquarters or one data center, use AI to process the data in a centralized way, and then pushed conclusions
to IPO technology back out to all of these farmers potentially. And so what we're seeing already
is in the software internet world there is the winner take most or winner when they take all the dynamics. So that's why there is relatively
small handful of leading research engines and relatively
small half of leading social media companies. And because tech has now
infected every industry, fortunately or unfortunately we are infecting almost every industry from agriculture to
manufacturing, to retail, to logistics with more and
more of the winner take all type of dynamics. And this is contributing to
any policy, unfortunately. I wish I knew how to add just this. I think that government needs
to play a huge role in this to ensure that we're gonna
create tremendous well, that's clear have already
done so as a community we're workers to do so. I love to see governments
and actually Erik and I, we're chatting a lot about ideas like unconditional basic income, or even conditional basic income to give people a safety net. And then I think education
is still not the panacea, but they're very powerful to make sure that people's whose
livelihoods have disrupted. You have a chance to learn more and contribute to the economy and earn a livelihood for
themselves and their families. I feel like as an AI technologist, I have seen AI create tremendous value, but I hope to all of us on this call, that when you weigh a hot
seat and make a decision and do try to make a
decision to bias things, towards making sure that
the wealth we create, is fairly shared. - Yeah, I completely agree. And I think that's a nice
segue to another question that sort of bubbled up to the top about the role of sort of
industry versus academia and sort of a half and half nots, right? So an industry you have access you've had a consumer
internet based company, right? You have access to more
data and more compute. So you can do things like
OpenAI can with Microsoft to build things like
GPT-3 that for example, academics and other people
without those resources cannot. So this is of course the
cutting edge algorithm. And for now there's an inequity here. And the question here is like, what effect do you think that this impact will have on society? And will we see an evening even
playing field in the future? Like how should we all think about that? - I feel like the future
is not yet determined and it is up to all of us. What we saw in the semiconductor industry, making microprocesses
is the center of gravity has shifted significantly from academia to corporations because
most of the universities just don't have the process, just don't have the resources to design and take a new
semiconductor chip to fab. And so a lot of the influence
is now concentrated in you, if you great companies like
Nvidia, Intel, AMD, right? And a few others. I think AI has made some of that shifts where today there is somewhere that is much easier to do
in a corporate setting, than in an academic setting. On the flip side, there's still plenty of
stuff to do in academia. And I think that, there's plenty of stuff. I look at all of the amazing
research that goes on across HAI and across Stanford
and across academia. I think if you look at the papers say, at top conferences, like ICML
in Europe, I clear and so on. It is true that large
corporates has a growing share. And that's great. 'Cause I'm really glad
that the large corporate are spending resources doing research and sharing the results
with us that's great. But university is also has
plenty of great research as well and still do my smaller companies. I think one thing I love about
the AI community is there is it feels to me that the
AI community grew up with a genuine spirit of sharing. And I think we, as a community,
all of us, any of you, if any of you listening to this
work for large corporation, you have a voice and your work matters go ahead and try to influence
your large corporation to stay true to the
spirit of sharing ideas, because it's only by doing that, that we could create
more value for everyone. So I think the future is not set. And I think the values of
all of us as a community will have a big influence
on whether there continues to be very diversified,
very wide spear research with ideas fair to share it, or ends up being more concentrated. I am an optimist. I think we're actually on a good path, but then it is up to us to
keep pushing in that direction. Oh, sorry, and of course
we all know that yesterday supercomputers is today smart. - I think my Zoom crashed back. Oh, we roll back. So yeah, I tend to agree
with you that the future is not set but like, as of now, at least for like the best
performing large language model, at least in sort of a
few shot task, right? Everyone that's not open as sort of stuck licensing the technology. And I can see a way in
which we move towards a more equitable version
of the distribution of that technology. But let's say if we were
to be stuck in that, like what would be the right way, is this like a good thing, a bad thing? Like how should we think
about it in the present? - Yeah, I think it'd be a terrible thing if everyone has a license GPT-3 technology from a single provider lots
of credit to the OpenAI team for building that. Fortunately I think there
are multiple companies on the planet with the
capability to replicate that. So I would certainly, welcome
right, more large companies to build these things, to make sure that this
healthy competition, it is an unfortunate dynamic
of AI that there is this, we can low value, but we also celebrate this when it takes all dynamics diluted to. It possible that eventually we need to make sure that we
have good regulatory frameworks to ensure that we have the
societal effects that we want. Although for now, if you
look at cloud computing does relatively small
number of cloud providers, so that I'm not saying
that's not a problem. Fortunately it's remained
relatively competitive, but this is something
that I think government should play a role to make
sure that we as a society, get to the outcomes that we want. - Okay, well, there's one minute left. I'll try to sneak this one in quickly. What do you think are the challenges for privacy preserving machine learning in the healthcare industry and what do you think needs to be pushed forward the most in that area? - I feel like the one
challenge we have in privacy. So it's the buyers and other things is we've not yet come to agree
on what is the standard we want to stick to. And this is two effects, both bad one and AI team
we'll build something. And as long as they
know they're doing okay, and then deploy it. And then, two years later, some new standard arises, like I think we all know this
certain protective characters you should not bring the case. But when I actually
here's a concrete example, I went to major image search engine and I was searching deliberately for image search queries that will show gender bias. It took me about 10 minutes to find one, one of the major image search engines eventually I figured out. I think I found elected official person that search query on one
of the major search show, old men and the first page. Now so we could get
sensationalist upset about this 'cause it was bias. This is horrible. If my 19 month old daughter Nova, if she sees it as maybe she'll think, wow, elected official should be all men. Maybe she should never aspire
to be elected official. So we could get alarm is about this. The other side of the story
is that she told me 10 minutes to find a higher highly bias query. So probably more the
fact that over that much, one of the problems with these
is that a lot of these issues of bias privacy, they're
statistical concepts, and we as a society need to get better at not latching on to anecdotal evidence and to measure these in a statistical way, because I think I'm not
gonna name the search engine because I think on average, the team actually did a great job and on average their
queries are relatively fair. And I think part of what
we need to do both business corporate regulators is
establish fair standards to clearly lay out what
we want and don't want. And then also to rigorously
audit the assistance against established criteria and this
will help two things, one, it diminishes on their
research engineering side, the fear then I'm gonna
roll out something. And then two years later
there'll be some new things. I just never thought of. Hey, who are the thought
that I can't describe, some new criteria maybe a figure out we're discriminating against people that live in my hometown of Los Altos. And is that okay or not? Sounds like it was not okay, but if an established care criteria and an audit then it makes life easier for the engineers and researchers. And also when we roll these things out, gives us a sense of what are
exactly the privacy standards and the fairness standards
we want the systems that are here to, but until we get there, I think, we ended up
with just more confusion and people getting surprised criticism, which isn't a go the goes
to make the systems fair not to randomly make people
feel bad about the systems. So I hope we can get there. - I tend to agree with you. Well, we're a little over time. So with that I just really
want to thank you for coming and kicking us our inaugural our HAI Research Seminar Series. And thank you to the
community for tuning in. The recorded seminar
will be posted on YouTube by the end of this week. And next week, same time, same place. We'll have Percy Liang
discussing semantic parsing for natural language interfaces. So you can check that on our website. And thank you so much again, Andrew. - Yeah thank you.
- Thank you, Andrew. Fantastic. Thank you. - [Andrew] Thanks everyone.