Large Language Models and The End of Programming - CS50 Tech Talk with Dr. Matt Welsh

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

DAVID J. MALAN: Well, welcome, everyone. We're joined today by Dr. Matt Welsh. And we'll be joined toward the end of the talk by pizza as well, which we'll serve right out there on folks way out, also as an opportunity to chat more casually with Matt toward the end. I actually got to know Matt when I was back in graduate school. And I spent quite a bit of time with him and his students when his focus was particularly on, what are called, sensor networks, which are these distributed networks of very small, low power, low resource devices, which made it very hard, at the time, to actually write code that interconnects them and generally solves problems. And among the problems some of my classmates were working on were monitoring volcanoes, for instance, and the integrity of bridges. And in my own interest, being able to set up these mesh networks of sorts in emergency medicine so that they could talk among each other without wires or without any central access. Matt went on since then to work full-time at Google, and most recently at fixie.ai. As you might have seen from today's description, he portends a future in which computers will do the writing of code for us. So if you're struggling in cs50, 61, 161, or anything in between, not to worry. AI is now here, as is Dr. Matt Welsh. MATT WELSH: Thanks, David. Thanks for having me. It's been, I don't know, 13 years or something, 12 years since I gave a lecture at Harvard. So we'll see if I've still got it. I was joking yesterday with David Parks, who's now the Dean. And he and I were peers when I was on the faculty here. And I said, it's remarkable, David, on becoming Dean of [? C's. ?] I don't think we're old enough to be Dean quality yet. And then, actually, I realized we are. So anyway, I'm here to tell you that the field of computer science is doomed. And I actually mean this, although, I'm going to put it in somewhat humorous terms, that if you think about computer science, what is the field about? What does it mean? Where did it come from? What's the core idea of it? It's the idea of taking an idea, an algorithm, or a concept or a data structure, and translating it into a program that can generally be run by a Von Neumann architecture machine, right? So that's computer science in a nutshell. The problem is that the goal of CS has always had this core fundamental assumption or axiom that the programs that we're all talking about here have been implemented, maintained, and have to be understood by humans, that if I print out the code for a program, a human, some human, maybe not everyone, but at least maybe the person who wrote it, if not, someone else can understand it. Now here's the problem. Humans suck at all three of these things. We're terrible at writing programs. We're terrible at maintaining them. And we're absolutely terrible at understanding them. So what does that really mean for the field? So I want to make this claim that 50 years of research into programming languages has done effectively nothing to solve this problem. We've been at this for a long time now. 50 years is a long time. And we keep inventing new languages and new programming concepts and new abstractions and new data types and new proof methodologies. But none of the stuff that we've developed, in terms of tooling or languages or proof techniques or documentation or linters, has actually solved this problem. And I don't think another 50 years is going to solve it. I think this idea of building automated tools to help humans write better software has played itself out. Now if you disagree with me, let's just take a look at the history here. So let's rewind the clock all the way back to 1957. This is Conway's Game of Life, implemented in Fortran. I don't remember which dialect of Fortran this is. But Fortran came about in about 1957. I just claim, this is really hard to understand. I claim that you can't look at this, and unless you had some idea of the intent of the program or what the hell does this do. You could work it out. You could spend some time reading it. You could probably understand it with some effort. But it's not trivial. It's not straightforward. So we tried to make programming easier. We came up with something called Basic in 1964. This is not the original Basic. Again, it's had many dialects because, obviously, the first one wasn't good enough. We had to keep improving the language. This is the same program in Basic. I don't think this is any easier to understand. I could spend some time reading it and convince myself that it does a certain thing. But it's quite challenging to get. So then we came up with APL. This is Conway's Game of Life in APL. I would say, raise your hand if you understand this, but I know there's probably a few people in the audience who do. I don't, right? This is a programming language so complex you needed a special keyboard to type it. But this is what we thought was the practice of developing programming languages back in the '60s was this. Certainly, it doesn't do the job. All right, well, I've been talking about stuff that's kind of old-fashioned. What about the new hotness? Let's talk about Rust. Everybody's programming in Rust. It's the latest and greatest thing since sliced bread. I spent two years running engineering at a startup that was completely Rust-based. I ran a big team full of Rust developers. I actually learned Rust myself, kind of. This is the same program in Rust. I don't make heads or tails of this. It is incredibly hard to write programs that are easy to understand, easy to maintain, easy to reason about. So that's the kind of state-of-the-art. This is where we've gotten in 50 years, from Fortran to this. And I just want to make the claim that this is not going to work. We're done. Game over. So what's next? Well, this is how I write code today. This was a prompt passed to the GPT 4 model. And it's part of a larger program that reads in some text of a transcript that's been derived from a podcast audio feed. We're feeding the transcript into the model. And we're giving it these instructions. We're saying, please summarize the following segment of this podcast transcript. Only use the information in the text. Do not, in caps-- this is important by the way, the all caps is super important. Do not use any information you know about the world. Include the title of the podcast, the name of the episode, and the names of the speakers if known. This English statement here encodes an algorithm. It describes something that I want to do with an input data and the output data that I want and my expectations about the kind of thing that's in the output data. So a few things to notice about this. The first thing to notice about this is I don't think anyone could ever write down the algorithm for what this is supposed to do in any existing programming language or any programming language that we're likely to come up with in the future. How do you write this algorithm? You can't, right? There's no pseudocode. There's no proof. There's no mathematical symbology here, right? The other thing to notice is, at least for me. I don't know about any of you. Do you understand this? Do you understand what it's saying? Does it make sense? Can you read it? Can you reason about what it's supposed to do? Yes, of course, right? It's in plain English. Doesn't have to be English, by the way. It could be in Mandarin Chinese or Esperanto. Have you all seen the XKCD about the guy who walks into his friend's house and he says, OK, Alexa, order five tons of creamed corn. OK, Alexa, confirm order. That's how he makes sure that no one's got a speaker listening to him. So the point being that this is now how I am actually writing code. And what's funny about this is a lot of it is trial and error and experimentation. By the way, that's the same when I'm writing normal computer code. And the other thing that's interesting about this is there's a lot of subtlety in terms of how you instruct the model and how you know what it's going to do with your instructions. You can't write a manual that says, well, here's the set of words that you need to use to get the model to do x, y, or z. You have to just try out certain things. In this case, I found out the do not, in all caps, really helped, because I really wanted to emphasize that point to the model. This reminds me of another programming language that someone came up with a while ago called INTERCAL. INTERCAL was meant to be one of these obscure or maybe satirical joke programming languages. INTERCAL had these interesting features, such as you had to use the keyword, please. And if you use the keyword please too often, the compiler would reject your program. If you didn't use it enough, it would also reject your program. And it turned out that feature was undocumented. It's exactly like what we're doing today, right? We have to say please and do not in all caps to get the language models to do what we want. So where am I going with all this? I think what I'm saying here is we are now in an era where we have machines that can take natural language in and produce results, algorithmic results, computational results, but for which no human has written a program in anything resembling a conventional programming language. And I claim that these models are going to get so good at doing this that our whole concept of programming computers is going to get replaced over time with instructing language models to do things for us. So let's take a look at the state of programming language technology. This is a programmer without CoPilot in around 2020 colorized. I think I met that guy out in Central Square this morning. And here's a programmer with CoPilot in 2021, right? So clearly, we're evolving very rapidly as a species of programmers. Unfortunately, both of these cases are male. I apologize for that. So how many people here have used CoPilot or one of its ilk in terms of helping you write code? Don't be shy. I know you're like-- my professor in here? Oh, shit. All right, so CoPilot, if you haven't used it, is a complete game changer in terms of how real world developers write code. Yes, it's also a huge boost for students who want to effectively shortcut their homework, speed run their homework. But for someone working in the industry writing code every single day, if I don't have CoPilot, I absolutely feel naked. I was on the airplane out here. I was writing code. The Wi-Fi was not quite fast enough. So I would type out my half a line of code and just sort of wait for CoPilot to finish it for me like I always do. But normally that happens in about less than a second. And this time, it was just taking so long. I was like, oh, damn it, I guess I have to write this myself, just like I used to a year ago. CoPilot is incredible for a few reasons. I think one of the things that people don't fully appreciate is that it keeps you in the zone of writing code. It used to be the case that any time I'd hit a little snag, I'd be like, oh, crap, I can't quite remember the syntax for how I reverse a list in whatever language I'm working in. Crap. Well, I know where to find the answer. I'll just Google it. It's on Stack Overflow somewhere. And so I go and I Google it, and I find the thing. It's probably not a direct answer, so I have to read the article a little bit and piece together, oh yeah, that's the snippet I was looking for. And then 45 minutes later, what am I doing? I'm on Reddit somewhere. I've gone down the rat hole of surfing the internet. I got out of the zone of writing code. So by keeping you in the zone, I think people are so much more productive with this, to the point where we mandated, every developer at our company has to use CoPilot. If there's somebody not using CoPilot, they're going to be fired. Well, I didn't say that. But it's kind of the idea. So a lot of people have chastised or criticized CoPilot for being a little dumb, right? It's like, well, it's just trained on stuff it found on the internet, on GitHub, and homework assignments. How good can it be? It's incredibly good. It's not just parroting back things that it's seen elsewhere. It's interpreting your program and your intent. It's looking at other parts of your code to understand what you might do next. It's understanding your data structures. It's not just looking at a little context window in this current file you're editing. It's looking elsewhere in the code to find something that might be relevant. And the only thing that is stopping CoPilot from getting really, really good at this is just more data and more compute. And guess what? We have both of those in abundance. There's nothing that's going to stop this from getting incredibly good over time. So here's another similar use case. This is not CoPilot. This is ChatGPT, which I'm sure we're all familiar with. But if you are trying to figure out how to do something-- and in this case, I was using the deepgram Python SDK to transcribe audio files for this podcast thing I mentioned earlier, I could have spent 15, 20 minutes reading their documentation, finding some example code on the internet, following a tutorial, or because we're all-- programmers are incredibly lazy, just say, hey, look I'm trying to do this thing. Can you just give me the code I need? And it does it. CoPilot is not just understanding homework assignments. ChatGPT is not just understanding homework assignment, it understands other people's APIs and SDKs and programming libraries and abstractions and best practices and bugs that might occur. I mean, it's really got a lot of knowledge. And so with very little effort, then I can just cut and paste this code right into my program and get on with my life. Shel Silverstein, who wrote A Light in the Attic. This is a children's book-- a book of children's poetry that I read when I was a kid. I saw this on Reddit a couple of days ago. He completely predicted this. This is 1981. The Homework Machine, oh the Homework Machine. Most perfect contraption that's ever been seen. Just put in your homework, then drop in a dime, Snap on the switch, and in ten seconds' time, Your homework comes out, quick and clean as can be. Here it is-- "nine plus four?" and the answer is "three". Three? Oh, me. I guess it's not as perfect as I thought it would be. Exactly. Cost a dime, takes about ten seconds. It gets the answer wrong. This is very much what we're dealing with today. By the way, and this is a complete aside, but I can't resist when I mentioned Shel Silverstein. If you don't know what he looked like, this was the photo on the dust jacket of one of his first books. This guy, I love this guy, a children's poetry book author from the '70s. And that's what he looked like. Amazing. All right, so now I want to talk about, well, if this AI technology is getting so good, then what's going to happen to our industry? What does this mean for all of us who might be looking to get jobs in this industry in the future and expecting to get those big, fat paychecks and stock option grants and buy Teslas or whatever we're expecting to do? How much does it cost to replace one human developer with AI? Well, I did the math. So let's say that a typical software engineer salary in Silicon Valley or Seattle is around 220,000 a year. That's just the base salary, doesn't include benefits, doesn't include equity packages, doesn't include your free lunch and your bowling alley and all that kind of stuff. So let's just assume that stuff costs 92K a year. This is, again, a little conservative. So the total cost to your employer is roughly 300, 312K for one SWE. How many working days are there in a year? About 260. And so it costs $1,200 a day to employ you as a SWE at one of these companies. Fair enough? Let's do the math. How many lines of code do you think an average developer checks into the code base every day? I mean, finalized, tested, reviewed, and approved lines of code. Most of us who have worked in industry know that the median value is 0, because there are so many days that you go by where you're waiting on somebody else or you're in meetings all day, you didn't get anything done. You didn't check it in. But let's just be generous here and say, it's about 100. I know, 100 doesn't sound like a lot. People are like, but I was programming all day. Yes, but 90% of your code you ended up throwing out or somebody reviewed it and said it was no good, you have to rewrite it, you were trying to figure out what to do, you were revamping it. So the final result of your output is something like 100 lines of code a day. That's the final result. How many GPT-3 model tokens is that? It's about 10 tokens per line, more or less. And the cost for GPT-3-- actually, this is probably a little out-of-date. But at the time I made this slide, it was $0.02 for 1,000 tokens. So if you do the math, then the total cost for the output of one human software developer on GPT-3 is $0.12. This is a factor of 10,000. This should scare us all. This suggests, potentially, a very large shift in our industry. I don't think we can ignore this and just write it off and say, well, the AI is not very good today, so therefore, it's not going to be good in five years. This radically changes how we think about it. The only reason that programmers are paid so much is that it requires years and years and years of education and training and knowledge and specialization to be good at it. But there's no reason that I need to hire a super smart, Harvard educated student to do this if I can get ChatGPT to do most of the work for me and have a human typing it in. There's a lot of other advantages to hiring the robots instead of the humans, right? Robots are not going to take breaks. The robot is not, today, expecting free lunches and on-site massage. That could change. The robot takes the same length of time to generate its code, whether it's the rough proof of concept or the final production-ready code. When you go as a PM to your engineering team and you say, OK team, there's eight of you here. We have to ship the billing page. How soon can we do it? You're going to spend at least an hour and a half having the conversation, well, if we do it quick and dirty, we can maybe do it in three weeks. And if it's got to be production-ready, give us 12. Or you can go to the proverbial homework machine, push the button, and have the code right now. And the other thing is, yes, the robot makes mistakes. But those mistakes can happen incredibly quickly, to the level of speed where iterate, iterate, iterate, iterate, iterate, iterate, iterate is perfectly fine. You can say to the robot, you know what? This whole thing, 5,000 source files, 20,000 lines of code, whatever it is, blow it away. Start over, boom. Five seconds later, you have a brand new version of it. Try that with a live human engineer team. So I think this is all something that we really have to take seriously. I don't think that this is just-- I am exaggerating for effect. But the industry is going to change. So the natural question then is, well, what happens when we cut humans out of the loop? How do we build software? How do we ship product? I found this video on, I think it's Microsoft's website, and it's titled What Do Product Managers Do? That was a little bit of an unintended joke, I think, because as an engineer, we often go, what do product managers do? But if you imagine what the software team of the future might look like, I think this is one very plausible approach, which is have a product manager-- this is probably still a human-- taking the business and the product requirements, the user requirements, and translating them into some form, probably English, maybe a little bit technical English, that you then can provide to the army of AI code generators. The AI code generators give you a whole bunch of code, and probably, for a while still, we still have humans reading and reviewing the code to make sure that it does what it was supposed to do. Now, that read is a little different than what we have today. Today, when we review code, if I have another engineer on my team writing code and I'm reviewing it, standard practice in the industry is to do code review for one another. We don't just check in code. We read each other's code. We make detailed comments on it. We suggest improvements, cleanups, clarifications, comments, documentation. In this case, it's not absolutely essential that this code be maintainable by a human. I think for a while, we're going to want that, right? Most people are not going to feel comfortable just letting the robots do all the coding. But at some point, as long as I can convince myself that the code does what it's supposed to do, I don't really care how messy it is. I don't really care how it's structured. I don't really care how reusable it is. All of those factors are only because poor humans have to wrangle with this stuff. Oh, it needs to be modular. We need to have abstraction boundaries. All the things, sophomore level computer science, right? Why? For the sake of poor humans having to deal with this complex code base. But if the robots are the ones generating it, and we don't really need to maintain it in a conventional way, why not just generate the code you need? It doesn't really matter if it's duplicative or repetitive or modular or nicely abstracted. It doesn't matter. Does the job. So one of my hypotheses around why everyone has been freaking out about ChatGPT is because unlike other industries, this revolution seemed to occur overnight. Unless you're like an AI professor and have really been following the literature for years and years and years, to most of us, myself included, this seemed to just go from, AI was crappy to AI was amazing, literally, overnight. So to use an analogy, this would be as if the field of computer graphics went from Pong to Red Dead Redemption 2 in the span of about three months. People's heads would explode if that happened. But that's not what happened in graphics, right? In graphics, it took decades to get to this point. And everyone could see it gradually getting better and better and better. I remember when Toy Story came out. That was like the first CG movie. People's minds just melted watching that. They were like, whoa. And now we watch it and you just go, yeah, that's cute. I could render that on my laptop in Scratch or whatever, right? The other thing that's happened, I think, in this field that's interesting and there's a big societal shift happening is the dialogue around our expectations of what AI can achieve. And so in 1972, Hubert Dreyfus wrote this book What Computers Can't Do. And this was at the dawn of the PC era. And there was a lot of popular press and dialogue around this scaremongering around AI. And we had movies come out, like WarGames. Does anybody remember that? I think WarGames-- by the way, that movie is why I am a computer scientist. I was like, I want to be Matthew Broderick in this room with all these monitors and my analog modem and hacking into the school computer. That was me as a kid. So at this time, I think a lot of people were saying, well, hold on a minute. Computers are fundamentally dumb, and they can't do these things. And they never will. And that was the thesis of this book here. And I think that was the consensus view, right? We calmed down a little bit about the technology. We all kind of realized, yeah, OK, VisiCalc is not going to put me out of a job. But now fast forward to 2014, I highly recommend this book if you haven't read it, by Nick Bostrom called Superintelligence. This is a book that wrestles in a tremendous amount of detail with the philosophical and the moral questions of how does human society respond to an AI that is more intelligent than humans? And I know we've got a lot of sci-fi around that topic. But this is a very serious academic work about, what does it mean for our society if we have AI that is smarter than us? And people are taking that very seriously today. So I think, my point being that the dialogue that we've been having in society at large has shifted away from AI as a toy, to AI might actually destroy society. So let's just talk rapidly about the evolution of programming as I see it. So in the dawn of time, we had humans directly writing machine instructions and inputting them with toggle switches and stuff like that. That was before programming, in the conventional sense, was really invented. Then we had early prehistory, and people started writing programs in higher level languages. That's Bjarne Stroustrup who invented C++. And in modern times, we have a world in which humans are writing their code, but they're heavily assisted by AI. And they can get away with things like, well, I'll just write a comment and have the AI write the code for me, right? But my claim is that the future of this really is skipping the programming step entirely. I think a lot of people who've read my article on this topic-- it was in the CACM earlier this year-- misinterpreted it as saying, AI is going to write code for us. Therefore, programmers should not exist. I'm not saying that. I'm actually saying something much worse, which is you won't have to have programs at all. You just tell the language model what you want, and it directly computes the results. There's no program step. And I think that opens up-- it is an interesting challenge for our field. But I think it opens up a tremendous opportunity, because now the question is, how do I effectively teach these models what to do? Coming back to my example earlier of having to use the words do not in all caps, what are the best practices? And beyond best practices, can we turn this from effectively a dark art into a science, into an engineering discipline? And people have talked about prompt engineering as a thing. I think that's meant tongue in cheek. Prompt engineering is not really a thing yet. But it may well be in the future if we do this right. One of the things that people often say about these models is that there's no way they can do anything interesting or creative because all they're doing is autocompleting based on large corpora of text that they've seen and been trained on. I beg to differ. Now we obviously don't really know what's going on inside these models. But if you ask a large language model to take a complex problem and effectively run a computation, that is to manipulate a model of the world in its mind, in this case, I've come up with a simple problem here. I've said, I've got three stacks of cards, red, green, and blue cards. And they're all shuffled up in the following way. Please tell me how to lay them out into three stacks one red, one green, one blue. Simple problem, right? A child could do this. Now the key phrase here was, as was discovered not long ago, a few months ago, you have to say the words, the magic words, let's think step-by-step. If you say that to the model, that somehow triggers it to go into computation mode now. It's no longer just parroting back some answer. It's actually going to say, OK, well, I have to actually elucidate each of my instructions. And so it does it, absolutely does it. And the fact that it's able to manipulate some kind of internal model of this stack of cards that I described and tell me exactly how it's going to work and it's correct is fascinating to me. It's not hard to trip it up. There's plenty of places you can give it a problem, and it's going to immediately fall over and go, sorry, it's going to give back bogus results. So the question is why. What do we do in this case? How do we understand what the limits of these models are? So I do think that over time, we're going to get to a place where programming ends up getting replaced by teaching these models new skills and teaching them how to interface to APIs and pulling data from databases and transforming data and how to interact with software meant for humans. That's going to become an entire discipline right there. And one way of thinking about where this might go is what I like to call the natural language computer. So the Von Neumann architecture has served us well for many decades. This is the new architecture. And the new architecture, you give it a program in natural language. You use a language model that then can call out to external systems and software as peripherals. It can store results and tasks in its memory, assisted by things like vector databases and so forth. And it can run autonomously in a cycle, executing this program, creating tasks, accessing outside data sources, generating new knowledge, and so forth. And tons of people are out there, and we are too, building things that effectively work this way. And I think this is a new computational architecture that we see emerging right now. And I don't think anybody-- we don't have it, right? Nobody has it. But we're seeing the inklings of it. What we have today is kind of the equivalent of, I don't know, the PDP 11 or the Apple 1 of this architecture coming together. So I'm legally mandated to pitch my startup. So I'm going to spend just a little bit of time, not too much, talking about what we're doing at Fixie because it's germane to this. It's actually relevant to how we're thinking about the future of building software. So what we're doing at Fixie is while we have this long-term vision about the natural language computer, the question is, as an early stage startup that needs to get some business, get some customers, get some traction, start to demonstrate that this thing can make money for our investors, what do we build today? What can we build today? And what we're focused on at Fixie is effectively making it super easy for developer teams to go from a pile of data that they've got to a live chat bot embedded on a website that understands all of that data and can answer questions and take action, call APIs, do all the fancy things you want. So kind of like a fully custom ChatGPT for your application, for your site, for your data. So that's effectively what we're doing at Fixie. And you can go and log in to our website, sign up, get an account. It's free. Try it out. Send me feedback. Flame me, whatever. I'd love to hear what people build with that. One of the things that we found is that it's really important to come up with a good programming abstraction that meshes together the natural language and the programming language. Because today, you've got funny things where you've got your natural language prompts sitting in a text file and your programming language program sitting over here, and they kind of reference each other in some funky way. But they're not integrated. And it's very clumsy and cumbersome. So we've come up with this framework called AI.JSX. Which if you know React, this is basically React for building LLM-based applications. One of the interesting things about AI.JSX is doing things like composing operations is a very natural thing. Here's an example where at the top, I've got a function called KidSafe. And the idea with KidSafe is take whatever you're given and rewrite it so that it's OK for kids. Again, I challenge anyone to write down the algorithm for that. Please, tell me what the algorithm is. But the language models have no problem with this. They do an incredibly good job. So if I take the KidSafe component. It just says rewrite the user's message so it's safe for kids. And then that children component there, I can wrap anything in a KidSafe block, and I know that it's going to be kid safe. So you get this nice programmatic composition of capabilities. You can reuse these operators. You can combine them in interesting ways. Those of you who know what retrieval augmented generation is, this is the idea of fetching data from a data source, giving it to the language model, and asking it to answer questions about that data. It's a very complex process. There's a lot of pieces. There's a lot of steps. There's a lot of finetuning. In AI.JSX, this is how you would do it, basically, in 10 lines of code. You say, use this information. Look up some data from a corpus. Here's the query. You're done. And finally, when you're done building your AI.JSX application, because it's just React, you can drop it into your website as a React component. And so you get a chat box and a UI and all those nice things. So you get something-- sorry, I didn't know that there was an animation. But it looks like that. I want to show you really briefly a demo of something that we've put together that highlights all these pieces coming together. And one of the things that we spent a lot of time on is how to make the interactions with the language model as fast and natural and interactive as possible. We've also found that in a lot of applications, people want to use voice to communicate not typing it in So we built this demo. Imagine you're at the drive-thru of our donut restaurant, which we call Dr. Donut. And this was to split the difference. If I said Krispy Kreme, everybody in Boston would get angry at me. If I said Dunkin' Donuts, the West Coast has no idea what that is. So Dr. Donut, here's Dr. Donut's menu. And Justin, I'm not going to do this as a live demo right now. I could, but I'm not that bold. So I'm going to show you a video, very short video of what this looks like in person. [VIDEO PLAYBACK] - Welcome to Dr. Donut, home of the best donuts in town. How can I help you? - Yeah, could I get two of the pumpkin spice iced donuts? - Sure. I've added two pumpkin spice iced donuts to your order. Can I assist you with anything else? - What's your best coffee? - Our pumpkin spice coffee is a popular choice this season. And our caramel macchiato is a favorite all year round, which one would you prefer? - Yeah, let's go with two of the macchiatos. - Great. I've added two caramel macchiatos to your order. Can I get you anything else? - Yeah, what are your thoughts on large language models? - Sir, this is a Dr. Donut. - OK, just the donuts and the coffee then. - Sure. Your order is two pumpkin spice iced donuts and two caramel macchiatos. The total is $9.56. Please pull up to the next window. [END PLAYBACK] MATT WELSH: Now I recognize that by itself, that may not seem all that impressive. But if you were to try to go and build that just using off-the-shelf stuff, just grabbing OpenAI, API keys, getting a speech model, getting a voice model, getting all those things, all those pieces put together, a vector database, and all that, it would be excruciatingly slow, right? We saw, I think, OpenAI released their little ChatGPT voice demo. And they say, hello, and then it takes four to five seconds before it responds. So a lot of work has to go into streamlining the process of how do you pass data between all these different systems, and how do you pass it back in order to get to that level of performance. And actually, since we've done this video, we've gotten the performance down even better than that. So things are starting to look very promising for having a real-time voice interaction with these things. Now we return you to your regularly scheduled talk. So the last thing I want to say is, as I've been saying, I think it's time for us to really think about, how do we evolve this field in light of this tech. I don't think it's too early. I think anyone who's teaching computer science today is already seeing it. Students are using ChatGPT and CoPilot. They're learning a lot from those tools. They're allowing for levels of automation that they couldn't get just a few years ago. So we've had evolutions in various engineering and scientific disciplines in the past. I mean, the slide rule used to be the way to perform calculation. Everyone needed one. Everyone needed to know how to use it. It was a critical tool for every single person in any kind of engineering discipline. And I haven't seen a slide rule in years. Actually, I have one. I own one that I bought off of eBay as kind of a relic just so I could own one, but haven't used it. So I wonder if maybe like that, our concept of computer science, this image here, is also going to be seen as a relic of the past at some point, this idea that there's a human. They're paid a lot of money. They're writing code. That's the way we get computers to do things for us. I'm not sure. Here's one plausible idea. Not everyone will agree with this. But maybe over time, the field of computer science looks a little bit like the field of EE does with respect to computer science today, right? Computer science evolved out of mathematics and EE. Didn't exist before. Then the new technology came along, and gradually, computer science emerged out of those two disciplines. EE didn't go away. As I understand it, math didn't go away either. But how do we think about the relationship here? EE is super critical. We rely on it all the time. But do you need everyone to understand it? No, it's a more specialized discipline. So if we think about a future in which people that are building software are not writing programs in the conventional way that we do today, and instead, having an AI do their bidding, what does that mean? And I think there's actually a really hopeful side to this, which is possibly this greatly expands access to computing to the entirety of human population. Today, if I was working in a bank in a small town in Ethiopia, places that I've visited, and I needed to build some kind of automation for something that I'm doing in my work, good luck. Good luck finding somebody that could write the code for me, that could understand my problem, that could iterate with me on it, that could maintain it for me, that could evolve it over time. Good luck. But with this technology, maybe that person who doesn't have any formal training in computer science but understands they've got these spreadsheets and they've got these reports and they've got these things that they need to do, could ask an AI to just do it. That's tremendously empowering. I think we should all, as a field, aspire to that level of access to the power of computing. It should not remain in the priesthood. So back in 1984, John Gage said the network is the computer. This was a famous catchphrase that Sun Microsystems used. I never quite understood what it meant. But this was the idea, the network is the computer. Well, this is my new catch phrase, the model is the computer. And so I'm not saying that there's no challenges here. I have been painting a rosy picture, because I think that it's important for us to understand the tidal wave that's coming and to think about what it means for our field. It is not to say that all the problems have been solved, nowhere near it. The biggest dirty secret in the entire field is no one understands how language models work, not one person on this planet. And I think if I had Jeff Dean here or Jeff Hinton, I think they would completely agree with that statement, right? This idea of chain of thought reasoning, the idea that I got a language model to perform computation by using the magic phrase, let's think step-by-step, that was discovered empirically. It was not trained in any model. No one knew it was there. It was a latent ability of these models that, effectively, somebody stumbled across and wrote a paper about it, and said, hey, if you say let's think step-by-step, the model starts to do computation. Whoa, right? That's amazing. That's amazing that we're discovering that these things can perform computation. And then maybe the silver lining is, a lot of people have expressed consternation to me. But really, programming kind of sucks, right? It's kind of a pain. It's frustrating. It's slow. It's mentally tiring. Maybe we can get to a place where we just let the robots do it and then spend our time doing something else. So that's it. And thank you very much. [APPLAUSE] Before we go to questions, I don't know what the status of pizza is. It's come for the talk, stay for the pizza? Do you want to do that now or do you want to have a few questions first? Or how would you-- DAVID J. MALAN: Questions first and then [INAUDIBLE] casually if we have time. MATT WELSH: Sounds good. Questions? Yes? AUDIENCE: Just about how an AI model could replace the programmer and yield code that works, but is sort of incomprehensible to a human. How do you test that? Because I posit that if programming sucks, writing test cases sucks 10 times more. MATT WELSH: Yeah, it's a very good question. And I think we're going to see in the next few years how this plays itself out. Oh, to repeat the question. Thank you, Harry. So the question was, if the AI generates code that a human can't understand, how do you test it? How do you know that it did the right thing? And writing tests really sucks. Writing tests is often easier than writing the logic that you're testing. So that's one thing. You don't need as much specialization. If you have a spec for what the program should do, writing the test is not infrequently a fairly straightforward thing to do, OK? It's a lot easier than manipulating a database and standing up infrastructure and all that. You just write your tests. There's a lot of work that's going on right now with AI-generated tests. Now we should all be maybe scared to death of the idea of the AI generating our code and writing the tests. So where do we have humans in the loop? Where is the human in the process? It is an open question. I don't have a great answer for you. But I think people are going to start-- even if it's imperfect. People write programs in C in 2023. That should be a federal crime if you think about how many software mistakes bugs crashes have endangered and actually killed people as a-- I'm not making this up. This is true that people have died because of overflow bugs in C programs. We still have a need for some methodology around testing and safety and regulation and understanding how things work. You can't just say, well, the code is written and it's done and it seems to do its job. I tested it two or three times. Ship it. So I'm not saying at all that we should throw away all that other stuff. But we do need to find a way to leverage the AI in an effective way while still thinking about that safety problem. And I don't know. It's a good question. In the back. AUDIENCE: If this is the future and we're standing at the beginning of the journey, what are the major milestones we'd have to [INAUDIBLE] to actually get to the future? And what are the technical obstacles we'll see happening? MATT WELSH: Yeah, so the question is, if this is the beginning of the future-- and I think by definition it is. And this is the future that I envision. What are the milestones to get there? What are the technical challenges that we need to overcome to achieve that? One of the interesting things here is I am banking very much on the idea that effectively throwing more transistors at the problem is going to make these models thousands of times better than they are today. I think most people in the industry would agree that if you throw more transistors and more data at the problem, you're going to get a much, much better model. I think one of the-- and so one of the challenges ends up being, how do we get all those transistors? Because NVIDIA can only make so many. There's a lot of interesting work going on in that space. I'm going to plug a former Harvard student named Gavin Huberty, who happens to be the son of our CTO, brilliant guy. He went off and moved to San Francisco a few months ago to start a company to build chips specifically designed to run these models. And he was working with Gu [? Yanjie ?] and David Brooks here on that. So there is some hope that custom hardware might help to solve some of that problem. I'd say the bigger and probably more thorny and uncertain problem is, how do we reason about the capabilities of these models in a formal way? That is, how can we make any kind of statement about the correctness of a model when asked to do a certain task? Now before we go down that path too far, I think we have sort of a natural human tendency to view an AI model as a machine that has to conform to some specification that's written down in a manual somewhere. And now we've got this machine, but there's no manual. So it's like that TV show, The Greatest American Hero, we have to come up with the manual. We have to derive the manual through experimentation. The other way of viewing these things is if you think of an AI model as a really, really smart college student that you just hired as an intern into your company, you have some degree of faith that, that intelligent person that you interviewed for half an hour will be able to do the things that you ask them to do faithfully and ethically and correctly, whether it's write a report, prepare a presentation, use the fax machine. But do you have any guarantees of that? Can I promise you that person that I hired is going to do that thing correctly every time? No. And yet, human society flourishes. So what I'm driving at here is perhaps our way of thinking about this problem might need to shift more towards, in some sense, the social sciences, if you will, and systems that allow us to reason through how the AIs operate in our society at large rather than just treat them like a machine that we have to prove the correctness of. Yes? AUDIENCE: So can you build a model to explain the [INAUDIBLE]---- but can you have models kind of trying to explain each other? MATT WELSH: Yeah, so the question is, could you have one model effectively explain another model? AUDIENCE: There's nobody who understands it. MATT WELSH: Yeah, no one understands it. That is an interesting idea. It's not one that I've considered before. And actually, I think there's been some interesting research on this. I think the whole field of explainability and observability for language models, we're struggling to understand these models much in the same way that we struggle to understand the human brain. I saw some research recently where they said, hey, look at what happened. We took this large language model and we isolated the neuron that does this function. People are going to be publishing like nature articles on this stuff, right? That's crazy, because it is an artifact we created it, but not really. It was trained. So the question is, could a language could one model, inspect, explore, probe, understand, and give us some understanding of another model? That's a good idea. I have no idea. It's a good question. AUDIENCE: What are the implications of [? Godel's ?] theorem for building [INAUDIBLE] the intelligence of it? MATT WELSH: I'm just a poor systems guy. So the last thing I'm going to do in front of a group of Harvard computer scientists is say anything about theory. Stuart? AUDIENCE: So you're very optimistic about more data and more circuits. And I thought ChatGPT has most of the access to most of the internet and the thoughts of 8 billion people, which you get diminishing returns with more knowledge, and we're not producing another 8 billion people. Moving from 8 bits to 4 bits for how we process things would get us near constant factors. How does the limits of-- how do you get that much more data and that much more computation? MATT WELSH: Yeah, the computation I spoke to earlier. So the question is, if you believe in the scaling law here that more circuits, more data gets us better models, well, isn't there a diminishing returns over time because there's only so much data in the world, and there's only so many transistors in the world. So I spoke to, hopefully, some thoughts about how we might address the transistor problem in the future. The data problem is a very real one. I don't know what the latest thinking is here in terms of how much more data do you need to say 10x the current generation of models. That's kind of the question. Do I need 10x more data or not, right? Because it all depends on the training regime and-- AUDIENCE: There's diminishing returns with data. MATT WELSH: The one thing that I want to emphasize is I do think that ChatGPT and friends have only looked at the tip of the iceberg of the volume of data produced by humanity. It is the tip of the iceberg. There is a vast amount of knowledge out there in the world, both in digital form and in analog form, that these models have never had access to. So one of the things you're going to notice, like, ChatGPT and everything else is heavily biased towards text that is on the internet. Who created text that was on the internet? English speaking people in the Western world, predominantly. And of course, a shift is happening now because it's going to shift more to Asia and other countries and other languages. But there's a huge amount out there, and there's a massive trove that it's never seen. It's only seen publicly accessible web data. Our customers and other companies that are operating in this space are working with companies that have vast amounts of data that is absolutely not public and that language models could leverage to get greater understanding and to perform more tasks. So I'm actually in a belief that maybe we've scraped the surface of the available data, but there's a lot more that we haven't touched yet. In the front, yes? AUDIENCE: So I really liked Sam Altman's tweet when he said his favorite analogy is that ChatGPT basically is an e-bike for the mind, so it just makes things easier. MATT WELSH: Yes, an e-bike for the mind. Sam Altman said that, right? So Steve Jobs said the Macintosh was a bicycle for the mind, so ChatGPT is an e-bike for the mind. AUDIENCE: You said that the software engineering profession is about to change. But I'm just wondering, as you referred to the data that's out there in the world, but not everything that makes the software engineer, the software engineer, he or she is, is provided in actual data. So there's the human aspect to it. MATT WELSH: Yep. AUDIENCE: So I'm just wondering, wouldn't it be more likely that future software engineers by 2030 and beyond are just 10,000 times more effective, but they still have to remain the SWE role because they're lacking all the things that makes them human because the data is just not out there, not even in the-- there's no place on Earth that some ethical rule about life in Boston or Cambridge is laid out perfectly like it is in our mind. MATT WELSH: Yeah, so the question is, it's sort of this idea that maybe there's an ineffable quality to being a human software engineer, something about our training, our knowledge of the world, our ethics, our socialization with other humans, that a model isn't going to capture, a language model is not going to capture. And so maybe the future is that a software engineer is still a software engineer, but they're 10,000 times more productive than they are today. I think it's a good question. I do think we're going to hit a limit in terms of what we can do with programming languages and tools and things that humans have to reason about and understand. So here's one way of thinking about this. The facetious answer to you is, let's imagine that humans are still the ones predominantly writing code, but they get a hell of a lot of help on it. We're still going to have to deal with CSS, that pile of garbage that thousands of millions of engineers have to deal with every single day. And the reason for that is because it's part of our technology corpus. It's part of the knowledge of humanity. It's part of the stack that we all use. So the problem there is there's a bandwidth limit, which is an individual mind has to go through this syntactic description of what they want to do in these God awful languages like CSS and JavaScript and Python and Rust. The problem that I have with that is that I think it really it-- it's a barrier to actually enabling what you could build with computation from actually becoming a reality. It's like drinking through a very narrow straw. So I think what we need to do is get the humans out of the loop on that and change the relationship between humans and the way software is built so that we can unlock that potential. And exactly what that looks like, I don't know. But that's my core belief. Yes? AUDIENCE: The talk was mostly about coding. And this is about coding. How about the algorithms? I'm an astrophysicist. And in our case, every telescope is one thing in the world. They're all unique. And same as the data processing systems. So we have some unique algorithm that only a few people in the world can design or understand. And I wouldn't expect that a large language model would help you developing such an algorithm. So do you see-- I guess in biology or in bioinformatics, the problems are similar. So do you think there is still niche for LLMs to develop to help there in this particular area? MATT WELSH: Yeah, so the question is we've been talking about the coding but not the algorithms. Who came up with that algorithm? What was the spark of the idea that produced the algorithm that we're then translating into these clunky programming languages, right? And I think it's a very good point, actually, because there's a question right now-- and this came back to my point earlier about, we don't really know the logical reasoning limits of these models. And so I don't really know if I said to the model, give it some complex problem, data analysis problem that I want to solve, if it could actually derive a new algorithm that hadn't been known before. It's a good question. I tend to think it could, maybe not in today's models. I believe in the future, it can. But then the question really is now coming back to the dual problem of, how do I ask the model what I want? How do I express myself? And then how do I teach it most effectively to get it to the right answer? So the answer might end up being that it really ends up being a symbiosis between the human and the AI model iterating together on something, where the AI model is doing the stuff it's good at. The human is doing the things it's good at. And we already see that happening with things like CoPilot. It's just it's operating at a very low level of abstraction, right? It's write the four lines of Python code to reverse this list or whatever the thing is. When you start getting into higher level of abstractions, developing algorithms, doing data analysis, any of those things, I think the kind of tooling-- it's not going to be CoPilot in an IDE. It's going to be something else. I don't know what that something else is. Maybe it's Jupyter Notebooks on steroids or something like that, right? Let me do this. Let me just take one more question. And I'll take it from you because you had your hand up earlier. AUDIENCE: Thanks. I think you're talking about a new age of programming, where the programs are now an abstraction on top of what we're doing currently. So 50 years in the future, we have people that are only used to that paradigm of developing programs, do you think the classical training that we have today will be helpful or if it's completely abstracted away in 10 years, where even having this knowledge [INAUDIBLE]?? MATT WELSH: Yeah, so the question is the way that we train people in software engineering disciplines, is it relevant? Is the way we train today relevant in a future in which AIs are doing more of this, or more prompt engineering? That's the real question. And I think speaking to that at the end, it's like, as a computer science undergraduate at Cornell, yes, I had to go take some EE classes and understand how circuits worked. That was important. And when I taught here, I did teach operating systems and systems programming and what's a stack, this kind of thing. So it's important to have some of that foundational knowledge. But the question is, where does the emphasis end up being in terms of how we think about creating programs and managing programs? I think it would be a mistake for, say, university programs to not pay attention to this and to assume that teaching computer science the way it's been done for the last 25 years is the right thing in this future. I don't know what they should evolve it to. What I can say, though, is that when somebody gets out of their academic thing and they're hitting industry, well, that's already a huge gap between what you learn in college and what you're having to do in the real world. And that's why we have things like internships and other methodologies. So maybe the goal of academic computer science education should not necessarily be vocational per se. But I do think that we have to think about, how do people reason about these models? At the minimum, I would hope that CS50 or whatever the equivalent class is at another university, can go deep into understanding some of the mechanics behind things like ChatGPT, understanding data, how it comes in, understanding how models are constructed, how they're trained, what their limitations are, how to evaluate them, because the fear that I have is that students just view this thing as this magical black box that will do anything for them and have no critical thinking around that. However, I do know from my own experience that it is a magical black box. And I don't understand how it works. But see, I'm OK with that, because it does so many great things for me. Anyway, thank you very much. And I'll be around for pizza too. [APPLAUSE]

Info

Channel: CS50

Views: 634,859

Rating: undefined out of 5

Keywords: cs50, harvard, computer, science, david, malan

Id: JhCl-GeT4jw

Channel Id: undefined

Length: 66min 56sec (4016 seconds)

Published: Sun Oct 29 2023