OpenAI's Greg Brockman: The Future of LLMs, Foundation & Generative Models (DALL·E 2 & GPT-3)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

It's interesting that he specifically references Kurzweil's The Singularity is Near. Ideas that seemed a little crazy in 2005 are a lot less so now.

👍︎︎ 10 👤︎︎ u/sideways 📅︎︎ Nov 01 2022 🗫︎ replies

I'd start with these two sound bites!

[09:00] There's no human who's been able to consume 40 terabytes worth of text...

Why did he choose 40TB of text? Did they hit that number for GPT-4? That would be many trillions of tokens...

[32:10] Our goal is is just to keep doing something that was previously impossible every single year... 2023 we'll all forget about DALL-E 2 and GPT-3 and we'll be talking about something new...

Hmmm...

Transcript.

Edit: and my first snip: https://youtu.be/LFx5q3m\_F68

👍︎︎ 10 👤︎︎ u/adt 📅︎︎ Oct 31 2022 🗫︎ replies
Captions
[Music] foreign we're joined next by Greg Brockman president chairman and founder of open Ai and Alexander Wang CEO and founder of scale AI openai is a research and deployment company whose mission is to ensure general purpose artificial intelligence benefits all of humanity before open AI Greg was the CTO of stripe which he helped build from 4 to 250 employees please join me in welcoming to the stage Greg Brockman and Alexander Wang foreign [Music] hey Greg hey thanks for making it absolutely good to be here I want to start actually I don't know if you remember this but uh we first met uh when at the summer camp called spark where you gave a uh a presentation about uh you at the time you're the CTO of stripe and you gave this presentation about sort of like everything that you had accomplished and uh I was a member of that camp and it was extremely memorable you had a lot of a lot of good sound bites so I'm glad that it landed um kind of a full circle moment um well I think to to start out with I mean you've been CTO and now you're president of openai but CTO of two uh incredibly iconic companies stripe uh and open AI in some ways probably two of the most iconic startups of the past decade um I wanted to start out just by asking in what ways I have uh are the two organizations the same and being CTO the same and in what ways are they different well thank you for the kind words um yeah I think that one thing that's very interesting to me about kind of having been part of both of these organizations is seeing how much groups of people are kind of the same regardless of what the problem in front of you is um so you know I think that a lot of how we approached stripe was thinking from first principles like I remember when we were pre-launched and we you know we had some Buzz going because we had some early customers and one of my friends took me out to lunch she was a VC and he was like all right look I've been hearing about this stripe thing like what's your secret sauce and I was like I mean we just make payments really good and he's like no no come on like you can tell me like what's the secret sauce and really that was the secret sauce right is that we had just rethought every single piece of what we were doing from the ground up from first principles not sort of locked into the way that people had been doing it and we asked how should it be like where's all the pain and you know does it need to be there and I think that in AI we did much the same we really thought about okay like there's this field that we're entering and that we hire a lot of people who had been in the field but a lot of us also hadn't been in the field and we came to it with beginner's eyes and I think that that approach of just not being beholden to all the ways people were doing it but also becoming expert in the way that things have been done because if you just throw everything out like you know you're also just going to be starting from scratch in a not-helpful way so I think that that maybe is the deepest commonality between them but you know obviously very different organizations uh that you know for stripe I think that you know we ran the traditional starting Playbook right you basically come up with the Innovation and you just build build you like get in front of customers from day one uh you know the story is that we we gave the first uh API to a customer who charged a credit card and he was like I would like my money now please and we're like huh I guess we should build that I open AI we had research to do like where's the customer right and it really took us I guess five years right we started in late 2015 and it was really not until 2020 that we had our very first product and so I think that that sort of figuring out like what you're even supposed to work on like did you do a good job I you know like yeah should you feel good on a day-to-day basis I think that all that had to come from within rather than from without yeah well actually I want to go back to this point that you mentioned around first principles thinking it's very interesting because even I remember like uh maybe 2020 or 2021 you know you would sort of this is post GPD 3 you would talk to other researchers in the field and even they would still you know there's still like some degree of skepticism over the sort of like uh core concept of you know scaling up these models and if there were still gains to be had Etc and I think you know I don't know the story but it seems like the the sort of uh research sort of uh intuition that led to gpd3 Dolly 2 that have really ushered in kind of a new era of AI have were probably uh you know somewhat Against the Grain or somewhat unintuitive at the time you know one question I have for you is uh you know I think now looking back it's obviously very obvious to point out gp3 Dolly to basically a fundamentally accelerated AI progress and and its relevance to the world and its relevance to every industry and sort of have created the the sort of most recent ai ai wave um how is that matched up against your expectations when you were building these Technologies you know yeah well I think the thing that's most interesting to me is that those models you mentioned are kind of overnight successes that took many many years to create and so you know from the outside it looks like wow you just like produced this model and then that model and really on the inside the GPT Arc that's a five-year Arc right it really started with uh the sentiment neuron paper which is back in 2017. do you remember that I remember the paper very cool yeah but uh felt very novel it felt very novel yeah very few people remember it uh it's you know it's this very early result where we basically had been training a lstm at the time to predict the next character in text so we basically showed a bunch of Amazon reviews we said what's the next character and of course it's going to learn where the commas go of course it's going to learn really roots go but of course it's not going to understand anything but we found a single neuron in that in that model that had learned a state-of-the-art sentiment analysis classifier I could tell you is this a positive review or negative review that's understanding you know I don't know what understanding means but it's semantics for sure and that for us was like okay this is going to work um you know trans the Transformer came out uh late 2017 um and my co-founder Ilya immediately was like that's the thing that's what we've been waiting for so you take this sort of very early nascent result put in a Transformer and then that's gpt1 gpd2 is you just keep pushing it and you know that I think that the algorithm we kind of run internally is that we do these little sort of get signs of life and you have to be very very careful to distinguish Signs of Life from like kind of just pushing too hard on a specific data set that isn't really going to keep going um but if you kind of build those right intuitions then you know okay now is the time to put in more compute now is the time to put more researchers Now's the Time to like really scale it up and so gpt2 obviously was exciting um and that we were all like well we look at the curves you know the bigger we made this model the more compute we put in the more data we put in uh the more we just sort of got all the engineering details right those curves just got better and so actually you know our goal was just to break the Paradigm it was just push it until the curve stopped looking good and we still haven't managed to accomplish it yeah well I I think one of the one of the things at least for me and and probably for many people who initially played with gpd3 the like shocking thing was not I mean it wasn't necessarily that even the model got better and better performance on you know established tasks is that it sort of had all these qualitatively new behaviors that were uh felt very magical and even now you know there's there's prompts that you know you'll see on Twitter or whatnot that are sort of uh really shocking I mean did you have these sort of like early moments when you when like you had the early model results were like holy crap this is like this is Magic yeah well I think the earliest one that I remember was around code um it just you know at the time totally mind-blowing that you could just write a function name and a doc string kind of describing what the function should do and would actually write it not super complicated functions right but just that it was able to you know you ask for you know something to take a couple lines and they would be able to to really do it you modify things a little bit to make sure your hand just memorized it and sure enough it would write out the modified code um and I think you know the overall thing that's really interesting about the Paradigm of a gpt3 is that that where it really comes from is that I you know we kind of had this picture that look the problem with these models is that they're great within their data distribution but as soon as you're outside that distribution like all bets are off and so what if you just make the whole world the whole universe be the data distribution right you put the whole internet in there and uh and I think that what we've really seen is that I that these models uh that they really are able to generalize extremely well within the kinds of things that they've seen you know again different question if it's never seen anything like it I mean humans are also not very good at things you've never seen before um but I think that that picture of just like all the different things that it's seen in all these different configurations almost unimaginable there's no human who's been able to consume you know 40 terabytes worth of text and so I think that we just keep seeing surprises where you you just ask for one of my favorite ones actually was this uh teacher student interaction where I was a teacher model was a student I managed to teach it how to sort numbers and you just kind of have these experiences where like that's what it should be like to interact with an AI yeah I mean it's it's incredibly shocking you know one of the one of the things I'm I'm curious to get your thoughts on is uh I think in the path of developing gp3 they're you know required I think probably the jump from gpd2 to gp3 required a lot of conviction because you know you all were spending probably a fair amount on compute at the time to be able to train these models and they're probably a lot of experiments that didn't work and so you had to you had to be willing to keep going after it did that did that phase of the journey sort of this like gp2 to gp3 jump was it scary did you have doubt like or was it were you very confident that hey you know we're gonna scale this up and even though we're gonna not get it right the first few times it's gonna it's gonna be amazing yeah and to your point that scale was not an obvious thing not the company but the scouting things up uh at the time the funny thing is actually our very first scale result that just sort of convinced us that this is the right way to approach things you push it Until It Breaks not necessarily that more compute is just magically always going to solve your problem uh it was DOTA that was you know playing competitive video games and there we kind of went through this whole that was a three-year Arc where we started out with something that didn't do anything finally beat like you know the in-house team then we managed to go beat the pros and at each step that it was just kind of pushing in all Dimensions right let's make the model bigger it's to uh you know sort of again fix all the bugs and you just kind of keep iterating on every single Dimension and every single Dimension yields returns and so I think that we did very much the same thing where for gpd2 you know it's not as simple as saying okay like clearly you just need to like you know crank up this one variable and you just do it in one shot it's this like sort of iterative like stepping through the space on each axis at every single time and so I think that that on the one hand it does require conviction because you do need to say we're going to like carve out a big compute budget um so you're not constantly not kind of fighting other people for the for the big super computers um but on the other hand I think it's also very iterative and you don't have to make scary irreversible decisions because in each step you get feedback from reality and I think that that key of like both the big picture thinking of what if this works and make sure that you're really set up for success but also don't blindly spend a year of your organization on just like pursuing a thing that might not pan out I think that balancing those two is what was really key yeah I mean the one of the cool things is you sort of walk through this and and talk through the insights is that the sort of organizational learnings were really critical in this entire sort of path dependent uh sort of uh path to to gp3 it's sort of you know it's it's it makes sense when you say it that sort of insights from Dota 2 and insights from the sentiment neuron were sort of like the key these were like the key nuggets that led to the sort of like crystallized idea of you know scaling up and building gp3 but it's very unintuitive from the outside and sort of I think it's it's almost a statement of of innovation in some sense is that you know you you you're going to piece together this sort of like disparate collection of insights that you gather from a wide variety of experiments and and eventually you sort of like get the ingredients together and you build something that's the first principles thinking in action yeah um you know I think that the story of AI I don't know if you think about this at all but I think about this a little bit I think the story of AI to date and especially the past few years and and the story of open AI is probably going to be something that historians are going to study for uh for you know decades and decades to come um are there any are there any fun stories from the Journey of creating uh some of these Foundation models that uh you think uh deserve to be in the history books well I'll tell you to my actual favorite story from the DOTA days so you know we'd been working on this this system and you know actually the funny thing is at the very beginning we wrote down our list of Milestones on this date we're going to be Jonas our you know best open AI employee who also had you know many thousands of hours of Dota 2 Gameplay uh the state we're going to beat the semi-pros you know the state we're going to beat the pros um and so it was supposed to be like June 6th or something June 6th rolls around we don't have anything like you know he just crushes crushes us and two weeks go by three weeks go by we keep pushing back that deadline by a week every week and then one day we actually do beat him and you know I think my conclusion was that like it wasn't actually actionable to uh to to sort of set those goals of outputs you could only control your inputs you can control the experiments you run and so we just manage the project very differently after that and the thing that was so crazy to me still is that you know so a week before the international which is the like World Championships we're going to show up we're going to play 1v1 against the best players in the world we find finally started beating our semi-protester and we're like okay maybe this is actually going to happen but then we learned that he actually was on like vacation he didn't have his like real setup and so we were like oh no like this is not going to go well so we show up you know we continue to train we like kind of do like a Hail Mary of like scaling things up a biggest scale we've ever done and uh we show up at the international and we play against uh you know like sort of low lowest you know low ranked Pro like a previous Pro and we go uh 303021 um so we basically win win and then we did have one loss and we take a look at it and it's like this item that we'd never trained with we've never seen before like oh wow okay we need to add that um and you know do it fast and so the team stays up all night putting this thing into the training getting the whole thing launched and uh you know again like we did double the scale where we're basically maxing out our CPU cores at this point and uh start training and uh you know we're supposed to play against the top Pros in the world um fortunately they can't do the next day so we get we get an additional day of training and I the number two person comes in he plays against us and we win win win win win uh and uh he's like okay but I beat this but uh the top player is never going to lose to this thing or sorry yeah but the top player is going to crush this thing and uh fortunate because he had spent so so long playing that that guy couldn't couldn't come that day so we got one more day of training and that one more day of training was enough and so I think it's just the story of like you can really see the Improvement and at each step we could see new behaviors that this system had learned and I think that that experience of just sort of watching it grow up in front of you is just something that was was really amazing I I'm actually surprised that because you had I sensibly you'd probably train the mo the the sort of like agents for a long long time going into the international I'm surprised that each incremental day yeah so this is something I think has changed over time so at the time we basically had two weeks worth of training was like the whole model run and so you'd start from scratch each time um and the thing that was was really funny in the middle was that you know we put in this new item we were training it and when we took out of training it was the best spot we ever saw except that our semi-protester was looking at it and was like this bot is doing something really done it's just sitting there in the first wave and taking all this damage it doesn't have to I'm gonna go beat it he ran into go fight it and he lost he's like that was weird and he did like five more times he lost each time um but then he figured out a strategy that actually does work which is you you realize what was going on and have learned to deceive you know it actually learned that what you do is that like you pretend oh I'm just a weak little bad I don't know what I'm doing and then you know a person comes in you're just like smack and so the way you defeat that is that you actually uh you don't fall for the bait right you let the bot take all this damage and sit there and get weaker and then you finally go in for the kill and so there we actually uh stitch together our good bot for the first wave with the deceived bot thereafter and so there's a lot of this sort of like really examining what was going on and the systems because it's such a limited domain you know it's a complicated domain but it's very very interpretable it meant that we could observe behaviors like this and figure out how to engineer around them but once we graduated from the 1v1 version of the game to the full 5d5 you know uh much more like you know a competitive basketball or something rather than heads up um suddenly all of our analysis of the behavior stopped working right that we used to have someone who just literally would watch the bot play and be like oh we have this bug in the training we gotta go fix that for 5v5 we just could not do that I think that's kind of where we've graduated as a field is that to when you look at GPT 3 and the mistakes it makes sometimes people ask well why didn't make that mistake and sometimes you can interpret it but sometimes it's also a little bit like asking well you know why did you make a mistake on on some tests it's like well you think you know but like your explanation isn't always very good I think that to do complicated behaviors sometimes there's a very complicated explanation yeah have you read this short story um I think it's like the life cycle of software objects by Ted Chang uh I think I have but I don't recall it's like the it's about how they're these AI pets and they sort of like keep learning new behaviors it's very reminiscent agents yeah yeah I think I think that we'll see that kind of thing in our future somewhere yeah um I wanna I wanna kind of go back you know uh one of the things we've known each other for many years long before um uh you know these Foundation models and even before uh the the this competition Dota 2 and I one thing I vividly remember is how um sort of optimistic and confident you were in sort of this this sort of path of increasing and increasing AI capability you know sort of I remember the time I was maybe 20 2016 2017 it felt very striking because it was sort of like you know with these algorithms are still pretty still pretty weak uh um and uh and you were you know you're always very confident like oh yeah they're just going to keep getting better and better and better and you know very a lot of confidence that what were the things that back then gave you sort of the the resolve or confidence in the and and the optimism in the technology yeah I mean at some level you know to have that kind of belief in conviction and something that hasn't happened yet it's a very intuitive thing I mean I remember when I was in school and showed up excited about doing NLP research I went tracked down an NLP professor and I was like please can I do some research for you and he's like okay he shows me these like parched trees and stuff and I look at that I was like this is never going to work right and you know to explain like why does it feel like it's not going to work it just doesn't have the right properties right it just felt like you're gonna pour all this human like engineering and intuition and effort into the system and I know I can't even describe how language works yep right it just feels like there's just something inherently missing but I think neural Nets had the opposite property neural Nets is very clear this is a system that just absorbs data it absorbs compute it's like a sponge that just like slurps everything up and so it has the right form factor but the thing it's always been missing is well can you train it right do you have enough data do you have enough compute do you have enough ability to like have a learning algorithm that can shove all the stuff inefficiently in a way that it comes out in some way that generalizes like that's the thing that's been missing and I think what became kind of queer you know the field really I think got its its most recent Resurgence in 2012 with the uh the Alex in that paper and uh I think that there that was the first time where you had a neural net that really just crushed a task right that it was like people had spent decades on computer vision and suddenly it's like well I'm so sorry but this approach has just supplanted You by this this massive Gap and I think that you just started to see it spread right that it was almost like you had these these all these departments and there was this wall that was being knocked down day after day and I think that when you see a trend like that where things that have been long-standing and very deeply established and these ways of thinking these great debates that have gone on for a long time and suddenly you're seeing a repeated result that is consistent with the history I think that that for me is maybe the most clear sign that like something is going to work and there's a real sort of exponential that is waiting to to unfold and then you know we're there um what were the what were the moments if any of of doubt and you know let's let's start the path I think open eyes are in 2016. yeah yeah I'd say December 2015 you know 2016. okay great uh December 2015 until now were there any moments of of doubt in the technology or was it sort of always hey this is you know this is clearly the way of the future yeah I mean I think that doubt is a strong word there's definitely moments like I think to build something you you're always doubting right that you're always like you've got to be questioning every single bit of your implementation like any time you see like a graph is wiggling in a weird way you've got to go figure it out you can't just be like I'm sure that the AIS will sort it out uh and so I think there's like lots of sort of tactical lots of like sort of worries that we're not quite doing it right lots of like redoing the calculations to figure out like hey how big of a model do you think you're going to need lots of mistakes for sure um like a good example of this is the scaling laws so we did this study to sit to actually start to really scientifically understand how do models improve as you push on various axes so as you pour more computers you pour more data in and one conclusion that we had at one point was that basically uh that there's you know it's sort of a limited amount of data that you want to pour into these models and that there's kind of this very this very clear curve um and that one thing that that I realized only years later was actually that we'd read the curves a little bit wrong and you actually want to be trading for way more tokens way more data than anyone had expected and that uh you know there's definitely these moments where these things that just didn't quite click whereas like it just didn't add up that we were training for so little and that you know something conclusions that you drew Downstream um but then you realize there was a foundational assumption that was wrong and suddenly things make way more sense so I think it's a little bit like you know physics in some sense for like do you doubt physics it's like I kind of do I think all of physics is wrong right but like only so wrong right it's like we clearly haven't reconciled like Quantum and relativity so there's like something wrong there but that that wrongness is actually an opportunity it's actually a sign of you have those things already useful right it really like has affected our lives and it's actually like pretty great I'm very happy with what physics has done but also there's fruit and so I think that that for me that's always been the feeling that there's something here and that you know if we do keep pushing and somehow the scaling laws all Peter out right they suddenly drop off a cliff and we can't make any further progress like that would be the most exciting time in this field because we would have finally reach the limit of Technology we would finally finally learned something and then we would finally have a picture of what the next thing to do is yeah that's super it actually reminds me of um uh this one of the stripe operating principles which is I think micro pessimist macro Optimus yep um yes and uh it's it's very I mean very resonant but obviously like very related to what you're talking about which is um these you know you know you have to be extremely pessimistic we're extremely questioning in the moments of the technology but then obviously on a long enough time Horizon you know incredible stuff pops out yep you got to be excited like I think that this is just an exciting field and it's a scary field as well you've got to have some amount of just like awe at the the fact that you have these these models that they start out as just random numbers right and then you have built these massive supercomputers these massive data sets and you do a ton of the engineering work you do a ton of these algorithmic developments you put them all into a package right and we don't really have other technologies that work like this like I think the fact to me the most fundamental picture this like sponge that you just kind of pour stuff into and you get this model it's reusable and works across all these different areas like you can't do that for traditional software right traditional software is it's just you know human effort writing down all the rules and that's where the return comes from but you can't you know maybe you have like a spark cluster that some stuff but that's not that's not the cake and in in neural networks it really is yeah you know I want to kind of switch gears to thinking about the the sort of future and uh and and looking forward at what kind of what's next what do you think I mean I'll ask it sort of as broadly as as possible to start with what do you think the future of AI holds yeah I I think that the future of AI is going to again be both exciting and a source of a lot of change and I think that that is something that you know part of our mission is to try to help facilitate that as positive way as possible I think that kind of you know at a super high level I kind of feel like AI was like you know something that for the you know 20 2010s was like kind of cool you know it's a game of like published on papers and you played some video games and like you know it's just like it's just like fun good science um I think it's really interesting that 2020 kicked off with gpt3 which was really the first model that was commercially useful just as the model like literally put an API on top of it and people just talk to it and people build products on top of it and you know the uh yeah one of one of our our early customers just you know just raised it at 1.5 billion valuation which to me is is a really wonderful thing to realize that you build this model and it creates so much value for so many different people and I think that we're still in such early days for what these models can do and so I think that what I'm most excited about from just seeing gpd3 seeing dolly is thinking about the the sort of economic value that it can create for people and I think that there's a lot of other pieces to it in terms of like you know that everyone's going to be more creative uh that if you want to like I can't draw but now I can create images now I can take a picture of this in my head and I can actually see it on on a page and one of my favorite applications of Dolly is actually people who are 3D physical artists you know somebody's like a sculptor and now they can actually get a great rendering of the thing that they have in mind by kind of just like iterating with this machine and then they go build it right I think that this sort of amplification of what humans can do is what these systems are for and so I think that for this decade I think what we're really going to see is these tools just sort of proliferating they're going to be everywhere they're going to be baked into every company I think it's kind of like the internet transition uh that you know there's kind of like if you're a company like what's your internet strategy in you know 1990 it's like what even is this thing you know and in 2000 it's like huh maybe it's interesting and there there's a little you know boom and bust and here we are today to even talk about internet strategies like it's just so integral to every business it's not even like it's not even a separate thing right it's just like it's just part of like your it's like your payroll strategy where it's like it's not like a separate part of your business that you can pick or choose whether you're going to have it and I think that AI is going to be much the same I think there will be a transition point right I think that it's interesting like our mission is really about building artificial general intelligence right really trying to build machines that are able to perform whole tasks right that are you know push this technology to its limit and build machines that are able to you know our Charter definition is outperform humans at most economically valuable work and there's a question of the timeline but I think that that picture of you know you have these tools that are creative that help everyone amplify and what happens when they do become so capable that they're able to perform these tasks uh even autonomously and I think that actually the implications of that are different from what people expect I think that it's much more like you know I think there's still going to be this amplification um but I think that there the change is going to be just very hard to predict and unexpected and I think that really thinking about how all of that sort of value gets distributed how to make sure that it's sort of pointed at solving these like hard challenges that humans you know maybe are unable to solve ourselves you know the climate change and you know Universal education and things like that um and and really transitioning to this like ai-powered world I think is going to be just like a real sort of challenge for the whole the whole you know all of humanity to work together on yeah I I mean I totally agree one thing that I think is almost funny with how the timing of all these technologies have worked out is that you know last year everyone's talking about web3 as crypto and now it feels very obvious that uh AI is the actual web3 you know we'll take web four yeah web4 we'll skip we'll skip over one but sort of like web one was just reading web 2 was reading and writing and now web three or four depending on what we want to say is uh is is ADS computer read and computer right and it's sort of this um this incredible uh new phase you know one so uh I think I think you mentioned two two directions here that I think are really interesting one is the sort of uh advancement and sort of proliferation of of uh GB3 and Dolly and sort of the existing tools becoming more and more economically useful and there's sort of this continued Improvement of the algorithms themselves towards sort of towards sort of AGI what do you think and obviously don't reveal any open-ended Secrets but what do you think the sort of like road map to AGI looks like from where we are now I mean I think that Humanity to our extent has been on the AGI roadmap for a very very long time um I think even looking at just the history of neural networks in particular you know on the one hand we say hey 2012 like that was the moment like everything changed you know that like you look at these these we have all these curves of how much compute people put into the landmark results it was going like 10x year over year still continuing by the way um that's that's a decade of kind of year over year that's insane uh and uh the thing is we actually did a study to then look back at previous results all the way back to uh uh to you know say the perceptron in in 1959 and you actually find that there's basically a very smooth curve back there as well the amount of compute going into all the dynamic results was exactly Moore's Law and it kind of makes sense right it's like that people were not willing to spend more money they wanted to spend a constant amount of money on these experiments because you're starving grad students like you know you can only get so much computer time and that the results got better and better the more compute was available to them and I think that that is so interesting that yeah basically what change in 2012 was that we said okay we're just gonna like you know we are going to spend more money we're going to build massive super computers now because the ROI is there but that fundamentally the Curve if you control for that that cost Factor it looks exactly the same and so I think that basically this picture building more capable models by pouring more compute into them by getting better at harnessing this technology of neural networks back propagation I think that has been very invariant and the details you know maybe change a little bit you know do you want to work on gpt3 do you want to work on whisper like do you pour in uh your your you know speech data do you pour in text Data from the internet and to me those details I think you know they matter in the like in the like sense of like what are you going to work on today and you know what are you going to download but if you zoom out you look at the scale of like these this technology I I think it it actually sort of doesn't matter so much I think kind of what we're building it's almost like building computers like you think about the hay day at Moore's Law right where it's just like there's a new chip that comes out and there's a new chip that comes out and it's kind of like what's the you know what's the path to building the best computer the answer is well you just keep building the next best chip and you keep building the next best chip and you keep getting better peripherals and all these you know keep working every single piece of the technology and so I think this full stack of better GP use great software for utilizing them neural networks that we learned to harness more and more the scaling laws doing all the science alignment extremely important making sure these models not just are smart but actually are aligned with what humans intend all of that I think is the stack and so I think that you know what our goal is is just to keep doing something that was previously impossible every single year so you know I guess we'll you should check back in a year but uh hopefully uh 2023 we'll all forget about Valley 2 and gpd3 and we'll be talking about something new and I think as long as we continue that like you cannot continue that path without ending up somewhere amazing yeah I mean I think um I actually remember this uh in I think probably 2017 you were sort of very uh still quite uh you were very excited about sort of the sort of Moore's Law uh continuing and that that sort of uh creating a lot more opportunity for you know neural networks and and Ai and obviously that's that's sort of played out are you worried about the sort of uh proverbial end of Moore's Law uh kind of causing a stall out in in progress so I I'm not worried about it per se like I think the way to think about this right because I think we you know we often get caught in this debate of like is it all about scale or is it all about algorithms it's all about data and the answer is that's the wrong question right it's really like you multiply together these factors and the best thing to do when you're multiplying together multiple terms is that you actually kind of want them all to be equal and I think that the the answer is like it's been great for the past you know seven years that we've been able to just pour more dollars to build bigger computers that's one way to get ahead of Moore's Law at some point point there just aren't more dollars right there aren't more grains of sand to you know that have been turned into into these uh these wonderful computers that we use um so there is a limit there uh that we have not yet hit um but when you do that does not stall all progress right you still have algorithmic progress and there we've again done studies and we've we've shown that actually if you take um like an example is if you look at the amount of compute it takes to hit the same performance so to train you know a state of the art at you know 2012 or 2014 uh Vision model that that computes also falling exponentially we're basically making exponential progress in algorithms not at the same rate as we are able to you know sort of build bigger computers but that is a amazing Force too you know it's like I've got this exponential I've got that exponential like let's not even talk about the data exponential so I think that the the truth is that we will find a way I think that the history of this field is just so consistent and I think that that you know humanity is just so Innovative that I think that that we're not going to hit a wall um for for the foreseeable future and do you think that you know one of the one of the interesting juxtapositions of uh of today just from a scientific perspective is a relative slowing in nearly every other science um and there's you know there's a lot of research that sort of demonstrated that science on the whole slowing and then comparatively the sort of acceleration of artificial intelligence and sort of this this you know in many ways this Renaissance that we're entering right now do you do you fear that at some point AI will similarly sort of reach these points of diminishing marginal returns and slow relative like in much in the same way that other scientists have or do you think that's so far away that you know well I think two things I mean I think that there's there's always s-curves um although I think that something is also interesting about s-curves is that there tends to be Paradigm shifts like if you have you ever read uh Singularity is near no I haven't yeah so this is uh the very kurzweight book from like 2004 or something and uh I always thought just based on the reputation is going to be kind of a crazy book but if you actually read it it's the most dried reading you'll ever do and it's basically just curve after curve of different Industries within Computing showing how the performance has changed over time and it's you know basically the conclusion he comes to is that there's this repeated pattern that seems to happen across you know memory across number of transistors on the chip you know et cetera Etc where you kind of have an S curve of the current Paradigm and then you have a paradigm shift and that uh example he he talks about is you know thinking about uh let's talk about CDs right so you talk about great you know CD adoption it's like you know it's great S curve it's suddenly everywhere it's like everyone's got a CD player like every just the technology of the day and people get really excited about doing more of the same thing it's like Blu-ray that's the thing you know and so then everyone starts investing in Blu-rays and somehow it just doesn't take off and it's because it's just more of the same and it's like you know it's not backwards compatible and so it's just not really worth it but the real Paradigm Shift was streaming right suddenly you have this new Adoption card this new S curve that just is this like totally different way and the way we got fast computers was basically five different Paradigm shifts across 100 years and so I think that that's maybe a story here too which is like there's got to be an S curve in what we're doing right now and that there will be a paradigm shift when you hit it and I think that that again speaks to the Ingenuity of humans but I think there's also a second thing where my other answer is to some extent it doesn't matter because the thing about this field is that it's useful now right that kind of the goal that I think we've always had for AI was to actually make it so computers are just way more helpful like you think about what computers have done for Humanity right like how many problems they've helped us solve they've created new problems as well but I think that on net that they've helped us solve way more problems than they've created and I think they've kind of just changed the nature of how we interact with each other about how you know like just like hard to get lost anymore right you just follow Google Maps and there's really amazing problems that are now within our reach that just would not have been otherwise and I think that AI like we're starting to crack that nut we're starting to be able to you know like it's I think it's kind of interesting like a GitHub co-pilot you know which which we we power uh with the models that power it and that the way that that is useful to people is it provides very low latency suggestions right it's basically an autocomplete for code and the you know there's a very strict latency budget you know if you're more than you know 1500 milliseconds to get a autocomplete suggestion it's worthless like no one wants that you've already moved on but I think that what we really want to build the next phase is machines that help you produce it that are able to produce artifacts that are materially interesting on their own so not just interesting because it's like a fast suggestion to you but because it's actually a quality answer and you're starting to see if you talk to our current gbt iteration you can ask it to write some poems and it writes way better poetry than I can it actually wrote a poem for my wife that made us both cry like you know yeah I I cannot do that myself but now I can you know by partnering with this with this machine and I think that that's the real story right is really trying to get these tools out and everywhere and yeah you know if what we're doing right now stalls out I don't think that removes the value from what we're able to create yeah um by the way it's depressing that the the attention span of most Engineers is only 1500 seconds but you know or maybe 100 milliseconds but yeah it is what it is um I what uh what if anything you know I think uh one of the things that uh if I recall spurge to work on open AI was was sort of um also being concerned about the sort of potential negative consequences of of the technology um what at this point looking forward what are your sort of biggest concerns or what are you afraid of with with artificial intelligence that you sort of urge everyone in the field to sort of help avoid yeah so I think that one thing that's very interesting about AI is that you know if you talk to certainly you know 10 years ago if you looked at every article about it you talk to someone on the street Terminator is the main thing that comes up right so I think there's always been this feeling around AI that is sort of you know that there's an element of fear mixed with the you know sometimes people don't see any potential or sometimes you know they realize that there is a potential but like really trying to figure out navigate get it and I think that that picture the specifics you know I think that that we're starting to see a little bit more but I think the high level picture of this is technology that's very powerful and it can be powerful in positive ways in negative ways um I think is extremely correct and I think it's very important not just to be you know Starry Eyed Optimist everything's just going to work itself out but also not to be you know sort of Doomsday like everything is terrible and you know humanity is over because I don't think that's at all true I think this technology can be the the best thing that we've ever created and help us be the best versions of ourselves but I think it requires very careful navigating in the space and it's not something that just is for you know companies in Silicon Valley to figure out I think it's it's really all Humanity kind of challenge um so I think that we're going to go through different phases I think that that right now uh you know we're kind of starting to build systems where that I you know you think about uh misuse is the most clear problem and the the systems themselves are still not very powerful right the kinds of things you worry about for gpd3 are uh you know important problems you think about bias and representation you think about the system sort of you know sort of saying the wrong thing you know but but its action is really in your mind right it sort of wears on a page that then you know words on a page are very powerful but that they don't themselves have direct action in the world we think about something like codecs our code writing system which is a little bit more like a robot because it has it emits code and if you were to just execute that code directly it can actually directly have actuators into the world and making sure that that's aligned and doing the right kinds of things not having buggy code and not writing viruses and that kind of thing like that's really important um and so I think that figuring out what values go into this these machines and that they're operating according to those values that's going to be very very critical um figuring out how to avoid misuse and sort of regulate that both at a sort of societal level at a you know technical level uh all of that is is very important and I and I do think that there's also a point where the technology itself you have to think about that it's going to be extremely powerful and you think about a system that's you know sort of talking to lots of humans and is operating unchecked that's the kind of thing that you should worry about you know we already worry about that you think about like companies right that lots of people are using us you know social media platform or uh you know any of the technologies that we use and how much influence those can have in the world and those aren't systems that have you know sort of deep sort of uh behaviors that that are emergent from from uh from what they've learned and so I think that that figuring out the technical controls to make sure that these systems remain in service of humanity and sort of to uh to actually Empower and accelerate all of us that I think is is also a very critical thing so it's kind of this like ramping set of stakes um and making sure that we're building systems that are aligned with with our values and figure out what that even means like what what is the values of humanity that should be in the system and that I think is not going to be an easy problem you know one question um and and this this may be the last question that I have for you is sort of one of the things one of the conclusions of if the technology is such that scale continues to be the the sort of uh one of the more important things you know scale whether it's data or better algorithms or scale of compute then it it uh the technology itself will tend towards sort of this game theoretical proliferation uh mode where it's sort of like people are going to compete and you see some of this today even with the large tech companies and you guys obviously people are going to compete to sort of build the bigger super computers that have the better performance and you have the bigger super computer you have sort of Supremacy over the other super computers and sort of there's like this um you know laddering of the stakes and sort of uh and proliferation is really the sort of the right word do you think that that is a version of the future or do you think that there's sort of some path in which this becomes a much more sort of like open and useful not this sort of like tool for Nations States or large companies to compete with one another I think that the future that seems to be unfolding is kind of a you know replay of how say Computing technology has played out more broadly I think that that it is still going to be the case that you're going to have these increasingly massive supercomputers that are in the hands of only a few that are able to just create models that can just do crazy things that no one else can do but I don't think that removes the value from the massive set of things that people are going to do with these models and so you know I think that balancing the like super powerful very dual use extremely you know think of these like almost like you know these massive like you know sort of uh systems that are you know think about like a nuclear reactor and it's like you know it's like these like giant like sort of you know systems that you should approach with great care um and you think about like you know by contrast think about wind turbines and like there's lots of wind turbines everywhere and actually that if you add up the amount of value from wind turbines versus nuclear reactors like I think actually that the balance is probably in favor of wind turbines and so I think that that's kind of the future we're going to is that the AI technology is going to be everywhere and there's going to be lots of value that's delivered by having open source models that are integrated to every business and that people are building all sorts of crazy applications on top of and that's something that we really want to support and promote and you also have to have this dual answer for what you do with the with the new extremely capable stuff that's just a mile ahead of everything else and that's something you have to treat with kid gloves with more care and I think that that balance is tricky it's not easy um that's something we as an organization have been trying to straddle and I think that you know we've had real existential struggles internally trying to figure out you know like our our goal is to empower everyone it's really to uh to to bring everyone along to this AI transition and the best way to do that I think that our picture of it has changed as the technology has unfolded and I think that we're starting to get a sense of of you know where this can go uh it's really exciting to see all the the energy of all these Builders coming in because I think that like you said people are starting to realize like AI it's really going to work and it's time to build yeah well um this was incredible conversation thank you so much Greg next time we speak I'll make you uh read the poem that all right cool thank you so much thank you so much cool foreign
Info
Channel: Scale AI
Views: 167,998
Rating: undefined out of 5
Keywords:
Id: Rp3A5q9L_bg
Channel Id: undefined
Length: 46min 9sec (2769 seconds)
Published: Sun Oct 23 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.