DAVID J. MALAN: Well, welcome, everyone. We're joined today by Dr. Matt Welsh. And we'll be joined toward the end of
the talk by pizza as well, which we'll serve right out there on folks way out,
also as an opportunity to chat more casually with Matt toward the end. I actually got to know Matt when
I was back in graduate school. And I spent quite a bit of
time with him and his students when his focus was particularly on,
what are called, sensor networks, which are these distributed networks
of very small, low power, low resource devices, which
made it very hard, at the time, to actually write code that
interconnects them and generally solves problems. And among the problems some of
my classmates were working on were monitoring volcanoes, for
instance, and the integrity of bridges. And in my own interest,
being able to set up these mesh networks
of sorts in emergency medicine so that they
could talk among each other without wires or without
any central access. Matt went on since then to
work full-time at Google, and most recently at fixie.ai. As you might have seen
from today's description, he portends a future in which computers
will do the writing of code for us. So if you're struggling in cs50,
61, 161, or anything in between, not to worry. AI is now here, as is Dr. Matt Welsh. MATT WELSH: Thanks, David. Thanks for having me. It's been, I don't know, 13
years or something, 12 years since I gave a lecture at Harvard. So we'll see if I've still got it. I was joking yesterday with
David Parks, who's now the Dean. And he and I were peers when
I was on the faculty here. And I said, it's remarkable,
David, on becoming Dean of [? C's. ?] I don't think we're
old enough to be Dean quality yet. And then, actually, I realized we are. So anyway, I'm here to tell you that
the field of computer science is doomed. And I actually mean
this, although, I'm going to put it in somewhat humorous terms,
that if you think about computer science, what is the field about? What does it mean? Where did it come from? What's the core idea of it? It's the idea of taking an idea, an
algorithm, or a concept or a data structure, and translating
it into a program that can generally be run by a Von
Neumann architecture machine, right? So that's computer
science in a nutshell. The problem is that the
goal of CS has always had this core fundamental assumption or
axiom that the programs that we're all talking about here have been
implemented, maintained, and have to be understood
by humans, that if I print out the code for a program, a
human, some human, maybe not everyone, but at least maybe the
person who wrote it, if not, someone else can understand it. Now here's the problem. Humans suck at all
three of these things. We're terrible at writing programs. We're terrible at maintaining them. And we're absolutely terrible
at understanding them. So what does that really
mean for the field? So I want to make this claim
that 50 years of research into programming languages
has done effectively nothing to solve this problem. We've been at this for a long time now. 50 years is a long time. And we keep inventing new languages
and new programming concepts and new abstractions and new data
types and new proof methodologies. But none of the stuff
that we've developed, in terms of tooling or languages or
proof techniques or documentation or linters, has actually
solved this problem. And I don't think another 50
years is going to solve it. I think this idea of building
automated tools to help humans write better software has played itself out. Now if you disagree with me, let's
just take a look at the history here. So let's rewind the clock
all the way back to 1957. This is Conway's Game of
Life, implemented in Fortran. I don't remember which
dialect of Fortran this is. But Fortran came about in about 1957. I just claim, this is
really hard to understand. I claim that you can't look
at this, and unless you had some idea of the intent of the
program or what the hell does this do. You could work it out. You could spend some time reading it. You could probably understand
it with some effort. But it's not trivial. It's not straightforward. So we tried to make programming easier. We came up with something
called Basic in 1964. This is not the original Basic. Again, it's had many dialects
because, obviously, the first one wasn't good enough. We had to keep improving the language. This is the same program in Basic. I don't think this is
any easier to understand. I could spend some time
reading it and convince myself that it does a certain thing. But it's quite challenging to get. So then we came up with APL. This is Conway's Game of Life in APL. I would say, raise your
hand if you understand this, but I know there's probably a few
people in the audience who do. I don't, right? This is a programming
language so complex you needed a special
keyboard to type it. But this is what we thought was the
practice of developing programming languages back in the '60s was this. Certainly, it doesn't do the job. All right, well, I've been talking about
stuff that's kind of old-fashioned. What about the new hotness? Let's talk about Rust. Everybody's programming in Rust. It's the latest and greatest
thing since sliced bread. I spent two years running
engineering at a startup that was completely Rust-based. I ran a big team full
of Rust developers. I actually learned Rust myself, kind of. This is the same program in Rust. I don't make heads or tails of this. It is incredibly hard to write programs
that are easy to understand, easy to maintain, easy to reason about. So that's the kind of state-of-the-art. This is where we've gotten in
50 years, from Fortran to this. And I just want to make the claim
that this is not going to work. We're done. Game over. So what's next? Well, this is how I write code today. This was a prompt passed
to the GPT 4 model. And it's part of a
larger program that reads in some text of a transcript that's
been derived from a podcast audio feed. We're feeding the
transcript into the model. And we're giving it these instructions. We're saying, please summarize
the following segment of this podcast transcript. Only use the information in the text. Do not, in caps-- this is important by the way,
the all caps is super important. Do not use any information
you know about the world. Include the title of the
podcast, the name of the episode, and the names of the speakers if known. This English statement
here encodes an algorithm. It describes something that I want to do
with an input data and the output data that I want and my expectations
about the kind of thing that's in the output data. So a few things to notice about this. The first thing to notice about
this is I don't think anyone could ever write down
the algorithm for what this is supposed to do in any existing
programming language or any programming language that we're likely to
come up with in the future. How do you write this algorithm? You can't, right? There's no pseudocode. There's no proof. There's no mathematical
symbology here, right? The other thing to notice
is, at least for me. I don't know about any of you. Do you understand this? Do you understand what it's saying? Does it make sense? Can you read it? Can you reason about
what it's supposed to do? Yes, of course, right? It's in plain English. Doesn't have to be English, by the way. It could be in Mandarin
Chinese or Esperanto. Have you all seen the XKCD about the
guy who walks into his friend's house and he says, OK, Alexa, order
five tons of creamed corn. OK, Alexa, confirm order. That's how he makes sure that no
one's got a speaker listening to him. So the point being that this is
now how I am actually writing code. And what's funny about
this is a lot of it is trial and error and experimentation. By the way, that's the same when
I'm writing normal computer code. And the other thing that's
interesting about this is there's a lot of subtlety in
terms of how you instruct the model and how you know what it's going
to do with your instructions. You can't write a manual that
says, well, here's the set of words that you need to use to get
the model to do x, y, or z. You have to just try out certain things. In this case, I found out the do
not, in all caps, really helped, because I really wanted to
emphasize that point to the model. This reminds me of another
programming language that someone came up with a
while ago called INTERCAL. INTERCAL was meant to be one of
these obscure or maybe satirical joke programming languages. INTERCAL had these
interesting features, such as you had to use the keyword, please. And if you use the
keyword please too often, the compiler would reject your program. If you didn't use it enough, it
would also reject your program. And it turned out that
feature was undocumented. It's exactly like what
we're doing today, right? We have to say please
and do not in all caps to get the language
models to do what we want. So where am I going with all this? I think what I'm saying
here is we are now in an era where we have machines that can
take natural language in and produce results, algorithmic results,
computational results, but for which no human has written
a program in anything resembling a conventional programming language. And I claim that these models are
going to get so good at doing this that our whole concept
of programming computers is going to get replaced
over time with instructing language models to do things for us. So let's take a look at the state
of programming language technology. This is a programmer without
CoPilot in around 2020 colorized. I think I met that guy out in
Central Square this morning. And here's a programmer
with CoPilot in 2021, right? So clearly, we're evolving very
rapidly as a species of programmers. Unfortunately, both of
these cases are male. I apologize for that. So how many people here have
used CoPilot or one of its ilk in terms of helping you write code? Don't be shy. I know you're like-- my professor in here? Oh, shit. All right, so CoPilot,
if you haven't used it, is a complete game changer in terms of
how real world developers write code. Yes, it's also a huge
boost for students who want to effectively shortcut their
homework, speed run their homework. But for someone working in the
industry writing code every single day, if I don't have CoPilot,
I absolutely feel naked. I was on the airplane out here. I was writing code. The Wi-Fi was not quite fast enough. So I would type out my half a line of
code and just sort of wait for CoPilot to finish it for me like I always do. But normally that happens
in about less than a second. And this time, it was
just taking so long. I was like, oh, damn it, I guess
I have to write this myself, just like I used to a year ago. CoPilot is incredible for a few reasons. I think one of the things that
people don't fully appreciate is that it keeps you in
the zone of writing code. It used to be the case that any time
I'd hit a little snag, I'd be like, oh, crap, I can't quite
remember the syntax for how I reverse a list in
whatever language I'm working in. Crap. Well, I know where to find the answer. I'll just Google it. It's on Stack Overflow somewhere. And so I go and I Google
it, and I find the thing. It's probably not a direct answer, so
I have to read the article a little bit and piece together, oh yeah, that's
the snippet I was looking for. And then 45 minutes
later, what am I doing? I'm on Reddit somewhere. I've gone down the rat hole
of surfing the internet. I got out of the zone of writing code. So by keeping you in the zone,
I think people are so much more productive with this, to
the point where we mandated, every developer at our
company has to use CoPilot. If there's somebody not using
CoPilot, they're going to be fired. Well, I didn't say that. But it's kind of the idea. So a lot of people have
chastised or criticized CoPilot for being a little dumb, right? It's like, well, it's just trained
on stuff it found on the internet, on GitHub, and homework assignments. How good can it be? It's incredibly good. It's not just parroting back
things that it's seen elsewhere. It's interpreting your
program and your intent. It's looking at other parts of your code
to understand what you might do next. It's understanding your data structures. It's not just looking at a little
context window in this current file you're editing. It's looking elsewhere in the code to
find something that might be relevant. And the only thing that
is stopping CoPilot from getting really, really good at
this is just more data and more compute. And guess what? We have both of those in abundance. There's nothing that's
going to stop this from getting incredibly good over time. So here's another similar use case. This is not CoPilot. This is ChatGPT, which I'm
sure we're all familiar with. But if you are trying to figure out
how to do something-- and in this case, I was using the deepgram Python SDK to
transcribe audio files for this podcast thing I mentioned earlier,
I could have spent 15, 20 minutes reading their documentation,
finding some example code on the internet, following a
tutorial, or because we're all-- programmers are
incredibly lazy, just say, hey, look I'm trying to do this thing. Can you just give me the code I need? And it does it. CoPilot is not just understanding
homework assignments. ChatGPT is not just understanding
homework assignment, it understands other people's APIs
and SDKs and programming libraries and abstractions and best practices
and bugs that might occur. I mean, it's really
got a lot of knowledge. And so with very little
effort, then I can just cut and paste this code right into
my program and get on with my life. Shel Silverstein, who
wrote A Light in the Attic. This is a children's book--
a book of children's poetry that I read when I was a kid. I saw this on Reddit
a couple of days ago. He completely predicted this. This is 1981. The Homework Machine,
oh the Homework Machine. Most perfect contraption
that's ever been seen. Just put in your homework, then
drop in a dime, Snap on the switch, and in ten seconds' time, Your homework
comes out, quick and clean as can be. Here it is-- "nine plus four?"
and the answer is "three". Three? Oh, me. I guess it's not as perfect
as I thought it would be. Exactly. Cost a dime, takes about ten seconds. It gets the answer wrong. This is very much what
we're dealing with today. By the way, and this
is a complete aside, but I can't resist when I
mentioned Shel Silverstein. If you don't know what
he looked like, this was the photo on the dust jacket
of one of his first books. This guy, I love this guy, a children's
poetry book author from the '70s. And that's what he looked like. Amazing. All right, so now I want
to talk about, well, if this AI technology
is getting so good, then what's going to
happen to our industry? What does this mean
for all of us who might be looking to get jobs in
this industry in the future and expecting to get those big, fat
paychecks and stock option grants and buy Teslas or whatever
we're expecting to do? How much does it cost to replace
one human developer with AI? Well, I did the math. So let's say that a typical software
engineer salary in Silicon Valley or Seattle is around 220,000 a year. That's just the base salary,
doesn't include benefits, doesn't include equity
packages, doesn't include your free lunch and your bowling
alley and all that kind of stuff. So let's just assume that
stuff costs 92K a year. This is, again, a little conservative. So the total cost to your employer
is roughly 300, 312K for one SWE. How many working days
are there in a year? About 260. And so it costs $1,200 a day to employ
you as a SWE at one of these companies. Fair enough? Let's do the math. How many lines of code do you
think an average developer checks into the code base every day? I mean, finalized, tested, reviewed,
and approved lines of code. Most of us who have worked in industry
know that the median value is 0, because there are so
many days that you go by where you're waiting on somebody
else or you're in meetings all day, you didn't get anything done. You didn't check it in. But let's just be generous
here and say, it's about 100. I know, 100 doesn't sound like a lot. People are like, but I
was programming all day. Yes, but 90% of your code
you ended up throwing out or somebody reviewed it and said it
was no good, you have to rewrite it, you were trying to figure out
what to do, you were revamping it. So the final result of your output is
something like 100 lines of code a day. That's the final result. How many GPT-3 model tokens is that? It's about 10 tokens
per line, more or less. And the cost for GPT-3-- actually, this is probably
a little out-of-date. But at the time I made this slide,
it was $0.02 for 1,000 tokens. So if you do the math,
then the total cost for the output of one human software
developer on GPT-3 is $0.12. This is a factor of 10,000. This should scare us all. This suggests, potentially, a
very large shift in our industry. I don't think we can ignore
this and just write it off and say, well, the AI
is not very good today, so therefore, it's not going
to be good in five years. This radically changes
how we think about it. The only reason that
programmers are paid so much is that it requires
years and years and years of education and training and knowledge
and specialization to be good at it. But there's no reason that I need to
hire a super smart, Harvard educated student to do this if I can get
ChatGPT to do most of the work for me and have a human typing it in. There's a lot of other advantages
to hiring the robots instead of the humans, right? Robots are not going to take breaks. The robot is not, today, expecting
free lunches and on-site massage. That could change. The robot takes the same length
of time to generate its code, whether it's the rough proof of concept
or the final production-ready code. When you go as a PM to your
engineering team and you say, OK team, there's eight of you here. We have to ship the billing page. How soon can we do it? You're going to spend at
least an hour and a half having the conversation, well,
if we do it quick and dirty, we can maybe do it in three weeks. And if it's got to be
production-ready, give us 12. Or you can go to the proverbial
homework machine, push the button, and have the code right now. And the other thing is, yes,
the robot makes mistakes. But those mistakes can
happen incredibly quickly, to the level of speed where iterate,
iterate, iterate, iterate, iterate, iterate, iterate is perfectly fine. You can say to the robot, you know what? This whole thing, 5,000 source files,
20,000 lines of code, whatever it is, blow it away. Start over, boom. Five seconds later, you have
a brand new version of it. Try that with a live
human engineer team. So I think this is all something that
we really have to take seriously. I don't think that this is just-- I am exaggerating for effect. But the industry is going to change. So the natural question
then is, well, what happens when we cut humans out of the loop? How do we build software? How do we ship product? I found this video on, I think
it's Microsoft's website, and it's titled What
Do Product Managers Do? That was a little bit of an unintended
joke, I think, because as an engineer, we often go, what do
product managers do? But if you imagine what the software
team of the future might look like, I think this is one
very plausible approach, which is have a product manager--
this is probably still a human-- taking the business and the product
requirements, the user requirements, and translating them
into some form, probably English, maybe a little
bit technical English, that you then can provide to
the army of AI code generators. The AI code generators give
you a whole bunch of code, and probably, for a
while still, we still have humans reading
and reviewing the code to make sure that it does
what it was supposed to do. Now, that read is a little
different than what we have today. Today, when we review code, if I have
another engineer on my team writing code and I'm reviewing it,
standard practice in the industry is to do code review for one another. We don't just check in code. We read each other's code. We make detailed comments on it. We suggest improvements, cleanups,
clarifications, comments, documentation. In this case, it's not
absolutely essential that this code be
maintainable by a human. I think for a while, we're
going to want that, right? Most people are not going
to feel comfortable just letting the robots do all the coding. But at some point, as long
as I can convince myself that the code does what
it's supposed to do, I don't really care how messy it is. I don't really care how it's structured. I don't really care how reusable it is. All of those factors are
only because poor humans have to wrangle with this stuff. Oh, it needs to be modular. We need to have abstraction boundaries. All the things, sophomore
level computer science, right? Why? For the sake of poor humans having
to deal with this complex code base. But if the robots are
the ones generating it, and we don't really need to
maintain it in a conventional way, why not just generate the code you need? It doesn't really matter if it's
duplicative or repetitive or modular or nicely abstracted. It doesn't matter. Does the job. So one of my hypotheses around
why everyone has been freaking out about ChatGPT is because
unlike other industries, this revolution seemed
to occur overnight. Unless you're like an AI
professor and have really been following the literature for years
and years and years, to most of us, myself included, this seemed to just go
from, AI was crappy to AI was amazing, literally, overnight. So to use an analogy, this would be
as if the field of computer graphics went from Pong to Red Dead Redemption
2 in the span of about three months. People's heads would
explode if that happened. But that's not what
happened in graphics, right? In graphics, it took decades
to get to this point. And everyone could see it gradually
getting better and better and better. I remember when Toy Story came out. That was like the first CG movie. People's minds just
melted watching that. They were like, whoa. And now we watch it and you
just go, yeah, that's cute. I could render that on my laptop
in Scratch or whatever, right? The other thing that's happened, I
think, in this field that's interesting and there's a big
societal shift happening is the dialogue around our
expectations of what AI can achieve. And so in 1972, Hubert Dreyfus wrote
this book What Computers Can't Do. And this was at the dawn of the PC era. And there was a lot of
popular press and dialogue around this scaremongering around AI. And we had movies come
out, like WarGames. Does anybody remember that? I think WarGames-- by the way, that
movie is why I am a computer scientist. I was like, I want to
be Matthew Broderick in this room with all these monitors
and my analog modem and hacking into the school computer. That was me as a kid. So at this time, I think a lot of people
were saying, well, hold on a minute. Computers are fundamentally dumb,
and they can't do these things. And they never will. And that was the thesis
of this book here. And I think that was the
consensus view, right? We calmed down a little
bit about the technology. We all kind of realized,
yeah, OK, VisiCalc is not going to put me out of a job. But now fast forward to 2014,
I highly recommend this book if you haven't read it, by Nick
Bostrom called Superintelligence. This is a book that wrestles in
a tremendous amount of detail with the philosophical
and the moral questions of how does human society
respond to an AI that is more intelligent than humans? And I know we've got a lot
of sci-fi around that topic. But this is a very serious
academic work about, what does it mean for our society if
we have AI that is smarter than us? And people are taking
that very seriously today. So I think, my point being
that the dialogue that we've been having in society at large has
shifted away from AI as a toy, to AI might actually destroy society. So let's just talk rapidly about the
evolution of programming as I see it. So in the dawn of time, we had humans
directly writing machine instructions and inputting them with toggle
switches and stuff like that. That was before programming,
in the conventional sense, was really invented. Then we had early prehistory, and
people started writing programs in higher level languages. That's Bjarne Stroustrup
who invented C++. And in modern times, we have a world
in which humans are writing their code, but they're heavily assisted by AI. And they can get away with things
like, well, I'll just write a comment and have the AI write
the code for me, right? But my claim is that the future of
this really is skipping the programming step entirely. I think a lot of people who've
read my article on this topic-- it was in the CACM earlier this year-- misinterpreted it as saying, AI
is going to write code for us. Therefore, programmers should not exist. I'm not saying that. I'm actually saying
something much worse, which is you won't have
to have programs at all. You just tell the language
model what you want, and it directly computes the results. There's no program step. And I think that opens up-- it is an interesting
challenge for our field. But I think it opens up
a tremendous opportunity, because now the question
is, how do I effectively teach these models what to do? Coming back to my
example earlier of having to use the words do not in all
caps, what are the best practices? And beyond best practices,
can we turn this from effectively a dark
art into a science, into an engineering discipline? And people have talked about
prompt engineering as a thing. I think that's meant tongue in cheek. Prompt engineering is
not really a thing yet. But it may well be in the
future if we do this right. One of the things that people
often say about these models is that there's no way they can do
anything interesting or creative because all they're
doing is autocompleting based on large corpora of text that
they've seen and been trained on. I beg to differ. Now we obviously don't really know
what's going on inside these models. But if you ask a large language
model to take a complex problem and effectively run a computation, that
is to manipulate a model of the world in its mind, in this case, I've
come up with a simple problem here. I've said, I've got three stacks of
cards, red, green, and blue cards. And they're all shuffled
up in the following way. Please tell me how to lay them
out into three stacks one red, one green, one blue. Simple problem, right? A child could do this. Now the key phrase here was,
as was discovered not long ago, a few months ago, you have to
say the words, the magic words, let's think step-by-step. If you say that to the
model, that somehow triggers it to go into
computation mode now. It's no longer just
parroting back some answer. It's actually going to say, OK,
well, I have to actually elucidate each of my instructions. And so it does it, absolutely does it. And the fact that it's
able to manipulate some kind of internal model of this
stack of cards that I described and tell me exactly how it's
going to work and it's correct is fascinating to me. It's not hard to trip it up. There's plenty of places
you can give it a problem, and it's going to
immediately fall over and go, sorry, it's going to
give back bogus results. So the question is why. What do we do in this case? How do we understand what the
limits of these models are? So I do think that over time,
we're going to get to a place where programming ends up
getting replaced by teaching these models new skills and teaching
them how to interface to APIs and pulling data from
databases and transforming data and how to interact with
software meant for humans. That's going to become an
entire discipline right there. And one way of thinking
about where this might go is what I like to call the
natural language computer. So the Von Neumann architecture has
served us well for many decades. This is the new architecture. And the new architecture, you give
it a program in natural language. You use a language model that then
can call out to external systems and software as peripherals. It can store results and tasks
in its memory, assisted by things like vector databases and so forth. And it can run autonomously in
a cycle, executing this program, creating tasks, accessing outside data
sources, generating new knowledge, and so forth. And tons of people are
out there, and we are too, building things that
effectively work this way. And I think this is a new
computational architecture that we see emerging right now. And I don't think anybody--
we don't have it, right? Nobody has it. But we're seeing the inklings of it. What we have today is kind of
the equivalent of, I don't know, the PDP 11 or the Apple 1 of this
architecture coming together. So I'm legally mandated
to pitch my startup. So I'm going to spend just a
little bit of time, not too much, talking about what we're doing at
Fixie because it's germane to this. It's actually relevant
to how we're thinking about the future of building software. So what we're doing at
Fixie is while we have this long-term vision about
the natural language computer, the question is, as
an early stage startup that needs to get some business, get
some customers, get some traction, start to demonstrate that this thing
can make money for our investors, what do we build today? What can we build today? And what we're focused on
at Fixie is effectively making it super easy for developer
teams to go from a pile of data that they've got to a live chat bot
embedded on a website that understands all of that data and can answer
questions and take action, call APIs, do all the fancy things you want. So kind of like a fully custom ChatGPT
for your application, for your site, for your data. So that's effectively
what we're doing at Fixie. And you can go and log in to our
website, sign up, get an account. It's free. Try it out. Send me feedback. Flame me, whatever. I'd love to hear what
people build with that. One of the things that we found
is that it's really important to come up with a good programming
abstraction that meshes together the natural language and
the programming language. Because today, you've
got funny things where you've got your natural
language prompts sitting in a text file and your programming
language program sitting over here, and they kind of reference
each other in some funky way. But they're not integrated. And it's very clumsy and cumbersome. So we've come up with this
framework called AI.JSX. Which if you know
React, this is basically React for building
LLM-based applications. One of the interesting
things about AI.JSX is doing things like composing
operations is a very natural thing. Here's an example where at the top,
I've got a function called KidSafe. And the idea with KidSafe is
take whatever you're given and rewrite it so that it's OK for kids. Again, I challenge anyone to
write down the algorithm for that. Please, tell me what the algorithm is. But the language models
have no problem with this. They do an incredibly good job. So if I take the KidSafe component. It just says rewrite the user's
message so it's safe for kids. And then that children component there,
I can wrap anything in a KidSafe block, and I know that it's
going to be kid safe. So you get this nice programmatic
composition of capabilities. You can reuse these operators. You can combine them
in interesting ways. Those of you who know what
retrieval augmented generation is, this is the idea of fetching
data from a data source, giving it to the language
model, and asking it to answer questions about that data. It's a very complex process. There's a lot of pieces. There's a lot of steps. There's a lot of finetuning. In AI.JSX, this is how you would do
it, basically, in 10 lines of code. You say, use this information. Look up some data from a corpus. Here's the query. You're done. And finally, when you're done
building your AI.JSX application, because it's just React, you can drop it
into your website as a React component. And so you get a chat box and
a UI and all those nice things. So you get something-- sorry, I didn't
know that there was an animation. But it looks like that. I want to show you really briefly a demo
of something that we've put together that highlights all these
pieces coming together. And one of the things that
we spent a lot of time on is how to make the interactions with
the language model as fast and natural and interactive as possible. We've also found that in
a lot of applications, people want to use voice to
communicate not typing it in So we built this demo. Imagine you're at the drive-thru
of our donut restaurant, which we call Dr. Donut. And this was to split the difference. If I said Krispy Kreme, everybody
in Boston would get angry at me. If I said Dunkin' Donuts, the West
Coast has no idea what that is. So Dr. Donut, here's Dr. Donut's menu. And Justin, I'm not going to do
this as a live demo right now. I could, but I'm not that bold. So I'm going to show you
a video, very short video of what this looks like in person. [VIDEO PLAYBACK] - Welcome to Dr. Donut, home
of the best donuts in town. How can I help you? - Yeah, could I get two of
the pumpkin spice iced donuts? - Sure. I've added two pumpkin spice
iced donuts to your order. Can I assist you with anything else? - What's your best coffee? - Our pumpkin spice coffee is
a popular choice this season. And our caramel macchiato
is a favorite all year round, which one would you prefer? - Yeah, let's go with
two of the macchiatos. - Great. I've added two caramel
macchiatos to your order. Can I get you anything else? - Yeah, what are your thoughts
on large language models? - Sir, this is a Dr. Donut. - OK, just the donuts
and the coffee then. - Sure. Your order is two pumpkin spice iced
donuts and two caramel macchiatos. The total is $9.56. Please pull up to the next window. [END PLAYBACK] MATT WELSH: Now I
recognize that by itself, that may not seem all that impressive. But if you were to try to go and build
that just using off-the-shelf stuff, just grabbing OpenAI, API keys, getting
a speech model, getting a voice model, getting all those things, all
those pieces put together, a vector database, and all that, it
would be excruciatingly slow, right? We saw, I think, OpenAI released
their little ChatGPT voice demo. And they say, hello, and then
it takes four to five seconds before it responds. So a lot of work has to go into
streamlining the process of how do you pass data between
all these different systems, and how do you pass it back in order
to get to that level of performance. And actually, since
we've done this video, we've gotten the performance
down even better than that. So things are starting to look very
promising for having a real-time voice interaction with these things. Now we return you to your
regularly scheduled talk. So the last thing I
want to say is, as I've been saying, I think it's time
for us to really think about, how do we evolve this field
in light of this tech. I don't think it's too early. I think anyone who's teaching computer
science today is already seeing it. Students are using ChatGPT and CoPilot. They're learning a lot from those tools. They're allowing for
levels of automation that they couldn't get
just a few years ago. So we've had evolutions in various
engineering and scientific disciplines in the past. I mean, the slide rule used to be
the way to perform calculation. Everyone needed one. Everyone needed to know how to use it. It was a critical tool
for every single person in any kind of engineering discipline. And I haven't seen a
slide rule in years. Actually, I have one. I own one that I bought off
of eBay as kind of a relic just so I could own one,
but haven't used it. So I wonder if maybe like that,
our concept of computer science, this image here, is also going
to be seen as a relic of the past at some point, this idea
that there's a human. They're paid a lot of money. They're writing code. That's the way we get
computers to do things for us. I'm not sure. Here's one plausible idea. Not everyone will agree with this. But maybe over time, the
field of computer science looks a little bit like the field of EE
does with respect to computer science today, right? Computer science evolved
out of mathematics and EE. Didn't exist before. Then the new technology came along,
and gradually, computer science emerged out of those two disciplines. EE didn't go away. As I understand it, math
didn't go away either. But how do we think about
the relationship here? EE is super critical. We rely on it all the time. But do you need everyone
to understand it? No, it's a more specialized discipline. So if we think about a future in which
people that are building software are not writing programs in the
conventional way that we do today, and instead, having an AI do their
bidding, what does that mean? And I think there's actually a really
hopeful side to this, which is possibly this greatly expands access to computing
to the entirety of human population. Today, if I was working in a
bank in a small town in Ethiopia, places that I've visited, and I needed
to build some kind of automation for something that I'm
doing in my work, good luck. Good luck finding
somebody that could write the code for me, that could
understand my problem, that could iterate with me on it,
that could maintain it for me, that could evolve it over time. Good luck. But with this technology,
maybe that person who doesn't have any formal
training in computer science but understands they've
got these spreadsheets and they've got these reports and
they've got these things that they need to do, could ask an AI to just do it. That's tremendously empowering. I think we should all, as a field,
aspire to that level of access to the power of computing. It should not remain in the priesthood. So back in 1984, John Gage said
the network is the computer. This was a famous catchphrase
that Sun Microsystems used. I never quite understood what it meant. But this was the idea, the
network is the computer. Well, this is my new catch
phrase, the model is the computer. And so I'm not saying that
there's no challenges here. I have been painting a
rosy picture, because I think that it's important for us to
understand the tidal wave that's coming and to think about what
it means for our field. It is not to say that all the problems
have been solved, nowhere near it. The biggest dirty secret
in the entire field is no one understands how language
models work, not one person on this planet. And I think if I had Jeff
Dean here or Jeff Hinton, I think they would completely
agree with that statement, right? This idea of chain of
thought reasoning, the idea that I got a language model to perform
computation by using the magic phrase, let's think step-by-step, that
was discovered empirically. It was not trained in any model. No one knew it was there. It was a latent ability of these
models that, effectively, somebody stumbled across and wrote
a paper about it, and said, hey, if you say let's
think step-by-step, the model starts to do computation. Whoa, right? That's amazing. That's amazing that we're
discovering that these things can perform computation. And then maybe the silver
lining is, a lot of people have expressed consternation to me. But really, programming
kind of sucks, right? It's kind of a pain. It's frustrating. It's slow. It's mentally tiring. Maybe we can get to a place where
we just let the robots do it and then spend our time
doing something else. So that's it. And thank you very much. [APPLAUSE] Before we go to questions, I don't
know what the status of pizza is. It's come for the talk,
stay for the pizza? Do you want to do that now or do you
want to have a few questions first? Or how would you-- DAVID J. MALAN: Questions first and then
[INAUDIBLE] casually if we have time. MATT WELSH: Sounds good. Questions? Yes? AUDIENCE: Just about how an AI model
could replace the programmer and yield code that works, but is sort
of incomprehensible to a human. How do you test that? Because I posit that
if programming sucks, writing test cases sucks 10 times more. MATT WELSH: Yeah, it's
a very good question. And I think we're going to
see in the next few years how this plays itself out. Oh, to repeat the question. Thank you, Harry. So the question was, if the AI generates
code that a human can't understand, how do you test it? How do you know that
it did the right thing? And writing tests really sucks. Writing tests is often easier than
writing the logic that you're testing. So that's one thing. You don't need as much specialization. If you have a spec for what the
program should do, writing the test is not infrequently a fairly
straightforward thing to do, OK? It's a lot easier than
manipulating a database and standing up
infrastructure and all that. You just write your tests. There's a lot of work that's going
on right now with AI-generated tests. Now we should all be maybe scared
to death of the idea of the AI generating our code
and writing the tests. So where do we have humans in the loop? Where is the human in the process? It is an open question. I don't have a great answer for you. But I think people are going to start-- even if it's imperfect. People write programs in C in 2023. That should be a federal crime if you
think about how many software mistakes bugs crashes have endangered and
actually killed people as a-- I'm not making this up. This is true that people have died
because of overflow bugs in C programs. We still have a need for some
methodology around testing and safety and regulation and
understanding how things work. You can't just say, well, the
code is written and it's done and it seems to do its job. I tested it two or three times. Ship it. So I'm not saying at all that we
should throw away all that other stuff. But we do need to find a way to
leverage the AI in an effective way while still thinking
about that safety problem. And I don't know. It's a good question. In the back. AUDIENCE: If this is
the future and we're standing at the beginning
of the journey, what are the major milestones
we'd have to [INAUDIBLE] to actually get to the future? And what are the technical
obstacles we'll see happening? MATT WELSH: Yeah, so the question is,
if this is the beginning of the future-- and I think by definition it is. And this is the future that I envision. What are the milestones to get there? What are the technical challenges that
we need to overcome to achieve that? One of the interesting things here
is I am banking very much on the idea that effectively throwing more
transistors at the problem is going to make these models thousands
of times better than they are today. I think most people
in the industry would agree that if you throw more transistors
and more data at the problem, you're going to get a
much, much better model. I think one of the-- and so one
of the challenges ends up being, how do we get all those transistors? Because NVIDIA can only make so many. There's a lot of interesting
work going on in that space. I'm going to plug a former
Harvard student named Gavin Huberty, who happens to be
the son of our CTO, brilliant guy. He went off and moved to San
Francisco a few months ago to start a company to build
chips specifically designed to run these models. And he was working with Gu [? Yanjie ?]
and David Brooks here on that. So there is some hope
that custom hardware might help to solve some of that problem. I'd say the bigger and probably
more thorny and uncertain problem is, how do we reason
about the capabilities of these models in a formal way? That is, how can we make
any kind of statement about the correctness of a model
when asked to do a certain task? Now before we go down
that path too far, I think we have sort of a
natural human tendency to view an AI model
as a machine that has to conform to some specification that's
written down in a manual somewhere. And now we've got this
machine, but there's no manual. So it's like that TV show,
The Greatest American Hero, we have to come up with the manual. We have to derive the manual
through experimentation. The other way of viewing
these things is if you think of an AI model as a really,
really smart college student that you just hired as an
intern into your company, you have some degree of faith that, that
intelligent person that you interviewed for half an hour will be able
to do the things that you ask them to do faithfully
and ethically and correctly, whether it's write a report, prepare
a presentation, use the fax machine. But do you have any guarantees of that? Can I promise you that
person that I hired is going to do that thing
correctly every time? No. And yet, human society flourishes. So what I'm driving at
here is perhaps our way of thinking about this problem
might need to shift more towards, in some sense, the social
sciences, if you will, and systems that allow us to
reason through how the AIs operate in our society at large rather
than just treat them like a machine that we have to prove
the correctness of. Yes? AUDIENCE: So can you build a
model to explain the [INAUDIBLE]---- but can you have models kind of
trying to explain each other? MATT WELSH: Yeah, so the
question is, could you have one model effectively
explain another model? AUDIENCE: There's nobody
who understands it. MATT WELSH: Yeah, no one understands it. That is an interesting idea. It's not one that I've
considered before. And actually, I think there's been
some interesting research on this. I think the whole field of
explainability and observability for language models, we're
struggling to understand these models much in
the same way that we struggle to understand the human brain. I saw some research recently where
they said, hey, look at what happened. We took this large language
model and we isolated the neuron that does this function. People are going to be publishing like
nature articles on this stuff, right? That's crazy, because it is an
artifact we created it, but not really. It was trained. So the question is, could a language
could one model, inspect, explore, probe, understand, and give us some
understanding of another model? That's a good idea. I have no idea. It's a good question. AUDIENCE: What are the implications
of [? Godel's ?] theorem for building [INAUDIBLE] the intelligence of it? MATT WELSH: I'm just a poor systems guy. So the last thing I'm
going to do in front of a group of Harvard
computer scientists is say anything about theory. Stuart? AUDIENCE: So you're very optimistic
about more data and more circuits. And I thought ChatGPT has most of
the access to most of the internet and the thoughts of 8 billion people,
which you get diminishing returns with more knowledge, and we're not
producing another 8 billion people. Moving from 8 bits to 4 bits
for how we process things would get us near constant factors. How does the limits of-- how
do you get that much more data and that much more computation? MATT WELSH: Yeah, the
computation I spoke to earlier. So the question is, if you
believe in the scaling law here that more circuits, more
data gets us better models, well, isn't there a diminishing
returns over time because there's only so
much data in the world, and there's only so many
transistors in the world. So I spoke to, hopefully, some
thoughts about how we might address the transistor problem in the future. The data problem is a very real one. I don't know what the
latest thinking is here in terms of how much more
data do you need to say 10x the current generation of models. That's kind of the question. Do I need 10x more data or not, right? Because it all depends on
the training regime and-- AUDIENCE: There's diminishing
returns with data. MATT WELSH: The one thing
that I want to emphasize is I do think that ChatGPT
and friends have only looked at the tip of the iceberg of the
volume of data produced by humanity. It is the tip of the iceberg. There is a vast amount of
knowledge out there in the world, both in digital form and in analog
form, that these models have never had access to. So one of the things you're
going to notice, like, ChatGPT and everything
else is heavily biased towards text that is on the internet. Who created text that
was on the internet? English speaking people in the
Western world, predominantly. And of course, a shift is
happening now because it's going to shift more to Asia and
other countries and other languages. But there's a huge amount out
there, and there's a massive trove that it's never seen. It's only seen publicly
accessible web data. Our customers and other companies
that are operating in this space are working with companies
that have vast amounts of data that is absolutely not public
and that language models could leverage to get greater understanding
and to perform more tasks. So I'm actually in a
belief that maybe we've scraped the surface
of the available data, but there's a lot more that
we haven't touched yet. In the front, yes? AUDIENCE: So I really
liked Sam Altman's tweet when he said his favorite
analogy is that ChatGPT basically is an e-bike for the mind, so
it just makes things easier. MATT WELSH: Yes, an e-bike for the mind. Sam Altman said that, right? So Steve Jobs said the Macintosh
was a bicycle for the mind, so ChatGPT is an e-bike for the mind. AUDIENCE: You said that
the software engineering profession is about to change. But I'm just wondering, as you
referred to the data that's out there in the world,
but not everything that makes the software
engineer, the software engineer, he or she is, is
provided in actual data. So there's the human aspect to it. MATT WELSH: Yep. AUDIENCE: So I'm just
wondering, wouldn't it be more likely that future software
engineers by 2030 and beyond are just 10,000 times more
effective, but they still have to remain the SWE role because
they're lacking all the things that makes them human because the
data is just not out there, not even in the--
there's no place on Earth that some ethical rule about
life in Boston or Cambridge is laid out perfectly
like it is in our mind. MATT WELSH: Yeah, so the question
is, it's sort of this idea that maybe there's an ineffable
quality to being a human software engineer, something about our
training, our knowledge of the world, our ethics, our socialization
with other humans, that a model isn't going to capture, a
language model is not going to capture. And so maybe the future is that
a software engineer is still a software engineer, but they're
10,000 times more productive than they are today. I think it's a good question. I do think we're going
to hit a limit in terms of what we can do with programming
languages and tools and things that humans have to reason
about and understand. So here's one way of
thinking about this. The facetious answer to you is,
let's imagine that humans are still the ones predominantly writing code, but
they get a hell of a lot of help on it. We're still going to have to deal
with CSS, that pile of garbage that thousands of millions of engineers
have to deal with every single day. And the reason for that is because
it's part of our technology corpus. It's part of the knowledge of humanity. It's part of the stack that we all use. So the problem there is there's
a bandwidth limit, which is an individual mind has to go through
this syntactic description of what they want to do in these God awful
languages like CSS and JavaScript and Python and Rust. The problem that I have with that
is that I think it really it-- it's a barrier to actually enabling
what you could build with computation from actually becoming a reality. It's like drinking through
a very narrow straw. So I think what we need to do is get
the humans out of the loop on that and change the
relationship between humans and the way software is built so
that we can unlock that potential. And exactly what that
looks like, I don't know. But that's my core belief. Yes? AUDIENCE: The talk was
mostly about coding. And this is about coding. How about the algorithms? I'm an astrophysicist. And in our case, every telescope
is one thing in the world. They're all unique. And same as the data processing systems. So we have some unique algorithm
that only a few people in the world can design or understand. And I wouldn't expect that
a large language model would help you developing such an algorithm. So do you see-- I guess in biology or in bioinformatics,
the problems are similar. So do you think there
is still niche for LLMs to develop to help there
in this particular area? MATT WELSH: Yeah, so
the question is we've been talking about the coding
but not the algorithms. Who came up with that algorithm? What was the spark of the idea that
produced the algorithm that we're then translating into these clunky
programming languages, right? And I think it's a very
good point, actually, because there's a question right
now-- and this came back to my point earlier about, we don't really
know the logical reasoning limits of these models. And so I don't really know
if I said to the model, give it some complex problem, data
analysis problem that I want to solve, if it could actually
derive a new algorithm that hadn't been known before. It's a good question. I tend to think it could,
maybe not in today's models. I believe in the future, it can. But then the question really is now
coming back to the dual problem of, how do I ask the model what I want? How do I express myself? And then how do I teach
it most effectively to get it to the right answer? So the answer might end
up being that it really ends up being a symbiosis between
the human and the AI model iterating together on something, where the AI
model is doing the stuff it's good at. The human is doing the
things it's good at. And we already see that happening
with things like CoPilot. It's just it's operating at a very
low level of abstraction, right? It's write the four lines of
Python code to reverse this list or whatever the thing is. When you start getting into
higher level of abstractions, developing algorithms, doing data
analysis, any of those things, I think the kind of tooling-- it's not going to be CoPilot in an IDE. It's going to be something else. I don't know what that
something else is. Maybe it's Jupyter Notebooks on
steroids or something like that, right? Let me do this. Let me just take one more question. And I'll take it from you because
you had your hand up earlier. AUDIENCE: Thanks. I think you're talking about
a new age of programming, where the programs are
now an abstraction on top of what we're doing currently. So 50 years in the
future, we have people that are only used to that
paradigm of developing programs, do you think the classical training
that we have today will be helpful or if it's completely
abstracted away in 10 years, where even having this
knowledge [INAUDIBLE]?? MATT WELSH: Yeah, so
the question is the way that we train people in software
engineering disciplines, is it relevant? Is the way we train today
relevant in a future in which AIs are doing more of
this, or more prompt engineering? That's the real question. And I think speaking to that at the
end, it's like, as a computer science undergraduate at Cornell, yes,
I had to go take some EE classes and understand how circuits worked. That was important. And when I taught here, I did
teach operating systems and systems programming and what's a
stack, this kind of thing. So it's important to have some
of that foundational knowledge. But the question is,
where does the emphasis end up being in terms of how we think
about creating programs and managing programs? I think it would be a mistake
for, say, university programs to not pay attention
to this and to assume that teaching computer science the way
it's been done for the last 25 years is the right thing in this future. I don't know what they
should evolve it to. What I can say, though, is
that when somebody gets out of their academic thing and
they're hitting industry, well, that's already a huge gap
between what you learn in college and what you're having
to do in the real world. And that's why we have things like
internships and other methodologies. So maybe the goal of academic
computer science education should not necessarily
be vocational per se. But I do think that we
have to think about, how do people reason about these models? At the minimum, I would hope that
CS50 or whatever the equivalent class is at another university, can
go deep into understanding some of the mechanics
behind things like ChatGPT, understanding data, how
it comes in, understanding how models are constructed, how they're
trained, what their limitations are, how to evaluate them, because
the fear that I have is that students just view this thing
as this magical black box that will do anything for them and have
no critical thinking around that. However, I do know
from my own experience that it is a magical black box. And I don't understand how it works. But see, I'm OK with that, because
it does so many great things for me. Anyway, thank you very much. And I'll be around for pizza too. [APPLAUSE]