Codex, OpenAI’s Automated Code Generation API with Greg Brockman - 509

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] all right everyone i am here with greg brockman greg is co-founder and cto at openai greg welcome back to the twimla ai podcast thanks for having me sam hey uh it's been a while since we spoke it was back in november of 2017 believe it or not episode 74 of the podcast we're over 500 now and we were then talking about agi i am really looking forward to this chat where we'll be talking about uh something new that open ai has been working on for a while codex but before we do uh why don't you reintroduce yourself to our audience and tell them how you came to work in the field of ai hey everyone i'm greg as i sam said and i am one of the co-founders of open ai um you know for me i've read the alan turing 1950 paper computing machinery and intelligence paper back before i knew how to code and i remember reading it you know it lays out the turing test but then it says look you're never going to be able to program a solution to this test the only way to do it is you have to have a learning machine and he goes into quite some detail you know he says like look you're going gonna do this like to have a little machine it's almost like a child machine that you uh give rewards when it does good things punishment does bad things and from there you can hope to build up a solution to this really visionary stuff honestly and for me i was captivated by the idea that you could build a machine that could understand problems that you yourself could not and i just saw being able to build machines that could themselves help you solve problems or outside of your reach be the thing i wanted to do um so i went to to a professor and was like hey can i do some some nlp research with you and he's like great yep here's these like parse trees and things like that um and uh sadly it didn't look like that was gonna gonna quite get you there so i got distracted by programming languages which you know i think kind of captures the same idea right of like if you can build a compiler can kind of understand this program and can really amplify what it what what a human can do um and then you know did startups and and uh it was really 2015 that i first encountered deep learning and for me i was watching hacker news every day and i felt like there was a new deep learning for this deep learning for that but i didn't know what deep learning was it was actually surprisingly difficult to just google around and learn what deep learning actually meant um so i asked some friends about it and as i started going around i realized all of my smartest friends from school were now in deep learning and that for me was a real sign of okay maybe there's some real substance here and the deeper i dug the more it felt to me like the old direction that it just didn't feel right the new direction actually did feel right and to me you know looking back at it now the thing i find most fascinating is that really this neural net direction it's not a you know five ten-year thing it's really a 70-year journey to get to where we are um so it's just exciting to be pushing the frontier of what these neural networks can do and that's basically what we've been doing at openai the whole time nice you mentioned your interest in programming early on and parse trees and all that kind of stuff and um you know that's maybe a connection to what we're going to be talking about today again which is codex open ai recently uh announced copilot which is another project in the same vein maybe uh tell us a little bit about uh these projects what they are and how they're related to one another yep so we've been building the codex model for about a year now we really started when we saw gpt3 be released and people's uh you know most excited reactions were actually using it for programming and we looked at that we said well we didn't build this model the program at all what happens if we actually put some effort into it and so we've actually teamed up with github and microsoft you know github i think is probably best in the world that uh you know knowing what uh developers want and uh have great a great community and obviously you know that they have they have lots of of data as well um and so we worked really closely with them to try to build a product that people wanted right to really validate that what we were doing wasn't just a cool research project but was actually useful from day one so a month ago we released copilot together together with github um which is the first product built on top of codex and that they use the codex api that is the same api that we i guess by the time people watch this podcast will have released on tuesday and so you know talk a little bit about the relationship between codex and gpg3 is it a entirely separate model are they the same model with different training data different training processes yup i would think of codex as a descendant of gpt3 so spiritually you do the same kind of task gpd3 is take all the text on the internet and just do an autocomplete task predict what word is going to come next codex is take all the texts on the internet and all the public code and do that same process and we've made lots of improvements all across the board really this has been an effort of a quarter of open ai to make it happen um so we really had to put in efforts um from everything to you know we have uh architectural improvements we have training improvements we have a lot of just like the the good old-fashioned engineering uh to make these these models be fast and responsive has been a huge amount of work as well so it's really been improvements all across the board so kind of talking about codex relative to gpt3 you mentioned take all of the text on the internet and all of the code on the internet in in creating something like a codex are those given equal weight or uh is the code somehow you know more relevant for the task that codex is likely to to see yeah the short answer is i think we're still figuring out exactly the right way of of doing it ultimately i mean i think that you know right now i our our process um is you know i think you basically end up with you know you end up seeing much more code more more recently than than than than text um but it's still an open question i think exactly what you want and we've kind of found that when you actually look at the models in terms of how people want to use them that part of what makes codex really shine is the fact that i it has all this world knowledge built in and so you can actually end up with a model that's very very good at doing just like you know sort of very narrowly defined complete this programming you know this this function or something um without actually being very useful to people so i think that that finding the right evaluations is actually one real trick to make this this model work and so you know what we really focused on uh is and has actually guided us pretty well so far is at the very beginning of the project we wrote down this data set that's now open source called we call it human eval which is a list of problems written by humans that are just programming puzzles and we kind of designed them to be ones that are kind of a little bit of tricky wording and a little bit um like uh you know just different from what you would find in you know some already out there in in the training corpus i'm so kind of intentionally chosen to have some some twists them in that sort of thing um and what we found is that by pursuing that metric it is actually our best north star metric like everything else if you just look at perplexity you know basically like how good it is exactly at predicting next token in text that that particular metric breaks down a little bit for us because you kind of want this holistic not just how certain do you get that there should be a period here but you really want just given a pretty natural description of what the problem is going to be can you solve that problem and so when you created that uh that data set and that that metric did you was there a closed loop there where the things that uh the programs that codecs created against that training set had to actually run and produce the desired result i unders i understand yes so yeah i'm currently asking just from the perspective of evaluating the codex in terms of producing runnable code yes an aspect of that data set 100 100 so you literally take you know you provide the model with you know maybe a doc string and maybe a little bit of a function definition it generates a bunch of code you literally eval that code now the details of the eval actually i think are pretty interesting because you just had some code come out from your model what's it going to do is it going to delete all the files on your computer like it's all possible right and so you really need to have a good sandbox and so you know i think that one thing people miss in this field is you know it's all about the big idea but what people miss is that actually it's about the small ideas right it's about getting the engineering really right and so yeah you want to actually train a model to run some arbitrary code and eval and make sure it's doing the right thing you need to have a world-class sandbox to make that happen and so you need to make sure both that the execution is like not able to do anything you know tamper with your system but also that um you know just even little things like resource consumption and being able to crash your system and things like that are held in check and we actually have found multiple times that the model would generate code that that kind of broke our current sandbox um so we've upgraded it since then interesting interesting so i think that is suggesting the folks that play this play around with this via the api that they take care to inspect the results they get before they just run them if they don't have a sandbox environment that they're uh yeah i i definitely i definitely recommend that for any code you take from the internet um you know if you just download some code from even my github i i will not take offense if you double check it before just running it i think it's it's just an important thing generally um but i would say that this this like the model doing unpredictable things is really early in training right so when the model isn't very smart isn't very capable um that it's it's sort of less predictable exactly what it'll do um the more capable the model gets the more it's going to be faithful to your instruction so um you know i've been i've been using this model for for you know i spent quite quite a bit of time playing with it and that i've actually you know found that it's that it's quite reliable in contrast in some ways to gpt3 can you talk about those distinctions a little bit more in terms of the types of uh results that it tends to see versus gpg3 relative to the the prompts that you're giving it yeah see the thing about gpg3 is that i always and i i really like when we get these new models i really spend a lot of time with them trying to really understand them trying to like just sort of feel like i get the personality of these models uh if you'll forgive the term because these models you know they're not one thing right they're really this whole distribution of things but so for gpd3 i really spend a lot of time trying to teach it you know i have this whole chat session where i was a teacher and i was explaining to it how to sort a list of numbers and it would do one example get it right and i'd be like wow i really taught it the process of sorting and then give it another example and it would totally go off the rails and do something wrong and i think that the feeling that i had was gpt3 didn't really want to listen like it really felt like this this you know this this being that like had a short attention span and would just kind of like do random things sometimes and i think that that's probably a reflection of the training data in some ways right if you're out there on the internet and you read some text saying okay now i'm going to sort a list of numbers i mean maybe you're in the middle of a fiction story right and then like you know that some aliens arrive or something and so it's actually reasonable for gpd3 to make pretty pretty arbitrary predictions when it's not very confident once you come next but by contrast in code what i found with codex is that when it fails it does half my instruction but not the full instruction right and sometimes you know sometimes you can end up with the traditional failure modes of auto aggressive models where it fails by repeating a token over and over if that's the most certain one and basically most of my experiments i have been i haven't had to fuss with with hyper parameters and really just set temperature equal to zero so it's just always picking the most likely token um and it's worked out way way better than for any model that i've tried before and i think that a lot of this comes back to the structure of the data right that in code if i have a comment saying now i'm gonna sort some numbers you're really going to sort numbers next right there's really nothing else that's about to happen and so it's almost like we have this great data set that we've built up of instruction following and i think that that idea we found in gpt land was pretty key to getting something that's even more useful to people and in code it's almost built in yeah i i'm very curious about this idea of um you know the code is the data set and the self-documenting nature of it when you think about uh just kind of raw code that you might find in in github you know there's documentation that's going to be at a i would think a pretty low kind of semantic level like you know this loop is going to do thing x um you know i think of something like a stack overflow that's talking about the the code that you might see in a post at a much higher level and i wonder a little bit about um you know is all of uh you know to what degree is is the code that codex has trained on you know github style versus something that might have some more like higher level semantic meaning and you know just your thoughts on whether that matters and um you know how uh how codecs might evolve with different types of data that you train it on yep i think the answer for this stuff is probably gotta catch them all like i think we're at a point with these models and i think gpt kind of set the stage for it is that the broader you go the more capable you're going to get yeah and the the part of it is that when we when we do a task it's kind of impossible to predict exactly what skills people want to bring to bear right like that the you know it's almost like uh if you rewind to the uh before the your general purpose computers were uh obviously the right solution um which by the way they're not even obviously the right solution for all problems but for most problems they are um there were specialized machines for each individual application and people were always like well your general purpose computer is cool and all it's great demo but if you really want to do like you know your your your contact book you need to use this specialized you know ibm whatever um you know machine that existed at the time and i think that basically just turns out that many tasks require mixing and matching between lots of different things and so it's kind of hard to pre-bake one answer to everything and so where we've started has been you know again kind of all the text out there and all the public code but i think that within code you know it's not just open it's not just like you know django and projects like that it's also think about all the ipython notebooks that people put on on github right and that i those ones tend to be very much like a tutorial right there's lots of lots and lots of tutorials and things like that that are out there and so you get kind of a very different slice of of intelligence from those and i think that what we've been looking at like i think kind of a big next step really is figuring out what are the best sources you know what do you learn from each one how do you figure out what you want to balance in the in in that model and and one thing that that people probably i might be surprised to hear is that codex um you know it can do lots of things in lots of different languages you know it's probably pretty good at about a dozen different languages but we really trained it just for python like we actually were just like we just want this thing to be as good at python as we can and uh all the other stuff kind of fell out as almost an accident so i think that you know if you test it you'll it'll be interesting to see you know do people find it very useful for the broad range of languages or is you know sort of that that focus on python does that shine through nice nice i can tell you that it does do hello world and lisp okay good there we go believe me we did not try to make a good lisp um you know i'm also fascinated by this this idea that we we talked about uh earlier you know the language the the natural language plus the code and there's part of me that would love to like tweak some hyper parameter that lets you wait one versus the other yeah um any you know any thoughts on that or i suspect that your answer is going to be uh similar to the last one which is kind of the more the merrier all having all the data you know is going to get you better results than trying to over optimize or yeah yeah i think some of the stuff you're you're you're hitting on the right frontiers i think right and and look like just to zoom out to the big picture to me the most fascinating thing first of all is that this is all just a neural net right like you you rewind back to the 40s and you know bits and mccola like that model of the information processing of the brain like that's the thing we're still doing today and so uh you know you can actually find this great paper uh on wikipedia called like you know a an interpretation of the history of the of the perceptron and that you know the story everyone always told about the perceptron was like hey in the 60s these neural net people overhyped everything and you know all the funding went away and if you actually look at the historical documents kind of what was going on is there were two competing camps there was the symbolic systems people and then there were the neural net people and the symbolic systems called a very concerted campaign to try to drive all the funding for the neural net people and that they had all these disparaging things to say like those neural net people they have no new ideas they just want to build a bigger computer like that's all they want to do and you know here we are you know 40 years later 50 years later and yeah we just want bigger computers and more data and so i think that is actually the most core answer like you know i think that we all kind of want the great scientific insight of like you know to figure things out and figure out the exact theory of of mixing um and i think actually the funny thing is i think we can make progress on those problems um but the highest highest order bit is you need to have a big machine with lots of compute and pour in all the data you can and like you know at some point that the details that mix start to really matter but the highest order bit is actually achieving that first thing does that um you know to what degree does that like cap innovation if you've already you know pulled all the language in the world all the text in the world into gpg3 and all the code in the world into codex uh and it's all about you know data and size of compute where do you go to innovate yeah so i think it's a great question so on the one hand you can look at what i said as a pretty depressing thing of just like okay it's just you know just a simple matter of you know doing this this large-scale engineering and you need to have your particle accelerator equivalent in order to do it um but actually if you dig into the sources of progress in recent years you know we published a couple studies on this and so we have one study that shows the compute ramp is insane like it's just faster than any exponential that i'm that i'm aware of but we also have another study showing the algorithmic ramp showing the efficiencies due to algorithms is also exponential and you know rather than being like you know doubling every 3.5 months it's like you know more like you know doubling every year year and a half you know something much more like moore's law i forget the exact number um but that's still a pretty insane rate of progress and so i think that the truth of all of this is that if you have a paradigm that is worthwhile right that like making it a more capable neural net clearly a worthwhile thing at this point you're going to innovate to the max in all dimensions and yeah we've had a pretty big compute overhang because people just weren't willing to spend lots of money on computers and now people are so they just spend more money to get ahead of moore's law so that's one dimension similar story for data you know there's been lots of data out there just like it just wasn't really worth people's effort to collect it or people didn't really know to do it whatever it is there's an overhang of just gather all that data on the algorithmic front you know i think that's been the one that people have been pushing on and so there isn't as much of an overhang you know there's not like low hanging fruit left around that just no one's thought of that just you know you show up and you're just like gold is at my feet it takes effort but i think that the fruit is still there right that it's still the case that we are making this exponential progress there so i think it's like just because we're making big progress in certain dimensions um you know that's just temporal right that like we cannot keep up the rate of improvement from those dimensions and so yeah once you've saturated them the only thing left is going to be this other dimension so i think it's really important we as a community don't lose that muscle that we really build it up and now if i asked you to comment on the that algorithmic dimension uh would it be uh would it be asking you to speculate into the the future or is there you know a set of kind of you know relatively low hanging fruit that you know things that you know that directions that you know that you want to head on the algorithmic side yeah well i want to talk again about just you know first of all my personal philosophy you know is is is very much like you know greatness through a thousand small steps um that i really you know i think and i think there are some people who are extremely good at the like one big idea to change everything i mean i think that you know like ilium who's one of my co-founders is extremely good at that you look at like uh you know i think he's you know with work like alex knight i think he's very good at sort of setting the direction um but for me i tend to think in terms of okay like what are all the small details we have to get right to make this happen and if you look at the current models you know the funny thing about gpt3 uh is that it actually uses the same the tokenizer that alec radford who who works at openai uh wrote kind of like overnight right before the deadline i you know three years prior uh for gpt one and like you know that thing is not optimal it's actually become kind of the standard lots of people use it i mean you know people have done a little bit to to you know play with different tokenizations and retrain them things like that but fundamentally i think that that there's like a big you know big shift in some ways and in kind of a small detail in other ways of just we should be really doing bite level models right we shouldn't be doing this like let's like you know sort of tokenize things and uh and chunk them up in this like way that kind of maps to you know it's almost like hard coding that's in the model that that probably would do a lot better if it wasn't there um i think a lot of the story of neural nets has been removed the hard-coded stuff and add in learning so i think that's one example of the kind of thing that i would really love to see someone work on and just see to see great results from and for us to incorporate that um so i think basically little bits of the architecture that are still like yeah we really should be doing this differently um i think that that for me is is actually where i put a lot of focus i wanted to kind of transition to you know how we should think about codex as like users and practitioners those folks that want to play with it like how should we you know think about interacting with this api to get the most out of it and let's maybe start with what is it best at versus what where are the you know the soft edges yes well i will first say i think that no one knows yet to the answer of what is it best at like i can tell you what i've discovered in my efforts right and i'll say for me i know i'm scratching the surface like i know i am fair enough but and that's that's a wonderful thing by the way you know if you train a vision model on imagenet you know what it's good at right it's very very good at all the dog breeds um this model general purpose so it's quite good at lots of things um i have so for for me you know i really latched on to this uh being able to provide instructions in natural language and have it generate an executable output right so basically talk to your computer does it um when you know when we first started playing with the model like it wasn't clear that you'd be good at that and i just kind of realized like hey this model when i give it these like because i actually started out the other on the other side i started out with trying to say if i just want to provide one big instruction and have it write a whole program and you know it's quite reliable at doing things like i'd say make it to kinder ui that like has a button that says hello world then you click it and sends an email like that level of instruction it could actually write like you know the 30 40 lines of python to do it and sometimes you make a little mistake you would forget to like wire up the button or you know kind of have like a placeholder for whatever um but the way it would fail was again very interpretable right i could look and i'd be like oh okay i just kind of forgot this piece and so then i started thinking about well what i really want is i want to be able to chunk this instruction up into smaller pieces because you know it did 80 of it and so if i just had a 50 size instruction maybe it'll do 100 of it um and so i think that that's kind of the highest level picture of where we are is the model i think it's not yet ready to do big things right on its own but it's ready to do small things and honestly for programming like i like doing the big things i don't like doing the small things myself right the uh the like you know okay like here's this very specific fiddly thing and like get the details of the indexing right like that kind of thing the model it knocks it out in the park or memorizing the details of you know this whatever framework like you know i used to write in ruby on rails and most ruby on rails is just knowing what the realism is to do any particular thing right right and um yeah and by the way i mean like ides are just not very good at ruby on rails because there's so much dynamicism and things like that but this model i think would be would be quite good at it so i think basically figuring out how to work with those strengths is important and so part of it is i think like one dimension that i think is very exciting is baking it in as an interface to lots of existing applications so we have a example of baking to microsoft word i think that for any website really you should now be able to very easily build an interface where you just say like you know you know what if you're depending on what your web app is you know go and look up the uh you know go send an email to this person or uh you know like yeah any of the functionality that's in your website should become voice controllable or natural language controllable without having to necessarily click through a bunch of buttons in order to get there you mentioned that some of your initial observations were that you'd need to construct this prompt and it will spit out results that you know were 80 there or missing some detail or something like that um and that you solve that by kind of chunking your prompts and making them smaller and more uh more compact do you think the the issue that you experienced originally was um you know was on the input side or the the output side if that makes sense was it a you know issue in the fault in kind of the generation process right you know couldn't pull the the piece for that um you know couldn't make the connection necessary required for the last 20 or was it you know forgot it in the parsing stage using really rough language here to yeah yeah yeah you're asking you're asking the great mysteries of of the what's going on inside the neural net which i i you know i i think is is is always very interesting and you know for for me well first of all i also think that if you ask literally me to do the same task without access to an interpreter so i just have to write the program once without ever being able to push backspace i'm not going to do a good job either like trust me like i will not most of programming for me is i write a little bit and i run it and it doesn't work and i change it and i fix it and i iterate and i fix it you know and that that other piece this model doesn't get to do it at all so i think that it's very possible that the model simply cannot like you know just reading all that text and really deeply thinking through all the details about the interface should work is a bottleneck um and then secondly it's very possible that just like it just as it's writing it just realizes oh no i really wish that i'd like implemented this function beforehand so you know what i'll just pretend that it's implemented later and like you know that never gets to it so i don't know which of those stories is more true my guess is that it's a mix of both um and partly i just look at myself like you know look this is not a human-like intelligence so it may be too you know a little bit too egocentric to think that i can look to what i'm good at and bad at to map to where the model makes mistakes but i will say that for me it's been actually like i feel much more in tune with the failures and successes of codex than i did with gpt3 for me it does feel like when it fails i'm a little bit like you know sometimes sometimes the way it fails by the way is they'll just put in pass you know so it's like you know i have a nice python you know def whatever and like i put in a doctrine i'm like okay model you go now and that solution is just to put a pass or you know comment to do um something like that and i get it it's a little bit like it's like okay i'm not gonna be able to do this so i'm not even gonna try right and you know i don't i don't think that's necessarily uh uh you know the only characterization but it really feels like you know if you if you think of how code is usually structured that i think that that it actually starts to feel a little bit more like constrained in terms of the the again you know you have this pattern of comment complete or total out um and kind of nothing in between yeah you just mentioned uh the structure that code tends to have um the you know codex operates like gp2 gpt3 and this kind of input you know process output paradigm have you done any playing around or experimentation to try to um force fit structure into that input in a way that it understands that it can produce more you know structure on the output well so i i have one so i have i have a couple of of different dimensions that i think are very interesting right so um look there's there's one dimension that i think is is kind of fun which is translating between languages and so i have a little demo of a um writing writing a python program so i wrote a python program that then you run it make some calls to the api generate some ruby code and that ruby code is just a program that calls the api to generate some python code and you got this python ruby oscillator forever and ever it's a little bit like writing equine it's just like kind of kind of a fun fun little thing um i actually tried doing the same thing for python to ruby to javascript to python to ruby javascript i got it to do like six cycles of python ruby javascript um before before it broke um so it actually was like each time writing a little bit of unique code which is which is kind of a cool thing to see um so setting it up for that i think was was a very interesting challenge because there you really have to make sure that your your prompt which is kind of contained within the program i is something that kind of like gives the enough context to the api for it to actually generate the whole new program but you know it's like you really got to play some some some nice uh fiddly games to make it happen so you know that i think is more of a proof of concept it's more of like a interesting exercise than it is something very practical um but there's actually another direction that i was experimenting with that i i think it's like interesting and very fruitful if someone can make it work of you know look programming is two things it's understand the super hard problem and decompose it right it's basically problem decomposition and then mapping the small problem to code we've already said codex is really good at that second thing probably better than i am that first one is it actually batted it and all i know is that the obvious ways of making it good at it i haven't succeeded that but i but using codex for task decomposition is something i've tried a little bit and got some interesting results on and you know you can do things like you have codecs call into you know you basically tell it oh there's this like magical oracle function and so oracle is you give it some natural language and then just like the machine will magically implement it for you and then you say okay do this hard task and you get access to call the oracle thing and you can see can codex generate good calls sub calls to oracle and i've actually gotten it to as a little bit of like a you know together working with codex to be able to get it to do things like you know go on google and like download an image of you know a particular person and put into a website and things like that and you use selenium to to uh to orchestrate all of this um and i think that ideas like this are are very interesting because maybe you can actually have codecs as a tool that helps in more of the cognitive domain in addition to this like very mechanical like code emission domain is there a um you know input pattern that you've seen or a hyper parameter that can kind of guide it towards a degree of complexity in the solution like there's a the length of the output you know uh as a as a uh you know one idea that might be that you know hey if i say you know give me hello world and i want it to be you know 300 characters in length or a thousand characters and like that's going to be you know one thing if i say you know 10 000 like is it gonna give me the you know j2ee enterprise right right right right yep yep i mean i think the best the best starting point by the way for all these things the only real answer is you got to try it right like you really just need to play with it um but i think the place to start is just by asking the model for what you want if the model doesn't quite seem to get it you try to spell it out more clearly expand how you're asking like really think about if this were a junior programmer and i had to really hold their hand and walk them through it how would i do it right and sometimes that's break it up into multiple instructions sometimes that's just expand more of what you're asking for so i think that's definitely the starting point another very powerful thing is providing more examples right so one thing we really haven't really done very much of yet is trying to do gpt-3 style prompt engineering and trying to provide prompts to the model that really show examples of the behavior you want and like all the indications so far and all the times we've tried is that it's quite good at that um but we just haven't really pushed it in the way that we push gpt3 in part because it's already capable of the tasks we want simply by asking so you know like we just kind of didn't have to go down that road and then the third thing of course is fine tuning and so we have a gpt-3 fine-tuning api these days um you know we'll be rolling that out for codex um and you know like i think that that will open a new dimension to what you're able to make it do awesome one of the interesting examples i saw in some of the materials was a you know not your traditional kind of create a program like you know xyz but it was just solve this word problem like you know from a elementary school you know jason has six apples and four apples uh something like that um but it created a program to figure out this word problem yes right uh i thought that was really interesting and it made me immediately think about uh the implications of something like this in education you know both coding education but you know more broadly in education any thoughts on that yeah so the funny thing is when we were starting open ai i you know i'd left my previous job and i knew i wanted to start a company and i had three possible domains on my list number one was ai which turned out to pan out um number two was vr ar and i kind of scratched that off very quickly but number three was programming education you know this is an area that's very near and dear to my heart i feel like you know for my programming education it was you know i started out very self-taught just building stuff that i was excited about and it was just hard you know it's just not very much fun to like you know it's like you do w3schools tutorial back in the day i'm sure there are better tutorials now but then you're just stuck staring at an editor and thinking about what do i build right and you run your thing and it doesn't work and what do you do right you don't know about a lot of concepts you know i didn't know about serialization and so i was building actually built one of the first things i built was a chatbot game so is that you had a little chat bot that you could train by talking to it and uh then you could have a little chat about battle where you would like play this game where uh one window that you were talking to was a chat bot one was a person and uh you had to distinguish which list which was which before your opponent would um and all this stuff you know i didn't i didn't know what serialization was i just like had this like i came up with a magical identifier that or you know like a string of characters that i thought no one else would would ever type and i used that as my record separator um and just looking back i just wish that someone was there to say oh you should probably use json here and then i'd be like what's json right i'd go around i figure how to use json and i would have just sort of cut off this whole tree you know there's a little bit of the tree that was very useful for me to figure out why is it useful to have serialization like you know why don't you just want to do your own record separator you know what are the problems but there's a bigger tree of really implementing it building up the library and trying to make it work and like that kind of thing that was a little bit of wasted effort and so what i am excited to see with codex is that we have a model that for the first time you can show it code and can actually kind of understand it and so we've done a little bit of playing around with code explaining right and actually can do a decent job of taking a function it explains how it works or can generate comments for it or generate doc strings generate on unit tests i think that all those things really open up the possibility of having a personalized programming tutor right and that to me is just like it's amazing i would love to be able to see yeah programming education fall out from you know pursuing the ai passion uh and uh we will get there it's just a question of you know i'm hopeful that codex is enough um at least to take the first steps uh does there need to be an element of i guess i made a mental connection to like explainability in these kinds of models and you know a tutor you want your tutor to be able to explain to you the connections beyond just you know showing you an example which is kind of what codex does now does that kind of call to mind the whole explainability around these kinds of models to you and and do you think that's a piece that would be interesting in that context yeah so i think maybe in a non-traditional way like i think that the traditional explainability has been we want to look at the connections of the neural net and explain why it made a decision that it made yeah right but i mean if you think about the equivalent problem for humans we're not very good at either right you know we don't open up the neurons of the brain and be like oh wow look at look at the connection between these two neurons right you asked someone why did you make that decision and i think most of behavioral science is basically realizing that our own explanations of our actions are quite poor right that like you know you you kind of do something and you come up with some like back narrative for why you did it um so i kind of feel like the baseline we should shoot for is that we should shoot you know look we should get to a better place than where we are with humans in terms of being able to explain why decisions were made at the very least i think it's a good baseline to hit and so i think that what we should be trying to focus on with these models is that you know they write some you know they're given a function and that they should explain how it works if they wrote their own function they explained why they wrote it and that explanation actually adds up you know and like maybe it turns out that in fact just like the human version that it doesn't quite correspond to you know sort of objective truth in some ways and that says well i made this decision because of this variable and that variable and they changed that variable and it still does the same thing um you know that kind of experiment i think would be very interesting to see but on the other hand i think that for these super complicated tasks and let's not kid ourselves i mean like even writing the simplest program is a super complicated task of like you just got to understand so many different concepts you need to know this whole library of all these different functions like that is really hard and i think that to even fit in our brain exactly the like okay like you know how do i translate how would i even write a program for you know say you know say it five times you know like something like that um what what is it supposed to reference like you know five like how is that represented like all the different ways you could see it um i think that for us to write a program that can do that is just gonna be such a giant complex tree that even a trace through it would be extremely complicated and probably you know something that's outside of humans humanity's ability to understand so i think the trick is number one uh having them you know just focusing these models on being able to provide good explanations that that feel right at an intuitive level to outputs that feel like they were written by a person and i think that that we're on trajectory for you know i think that you can ask codex for this stuff today and maybe it'll do a good job you know maybe it's not exactly what it was trained for so maybe it won't but i think that that you can you can at least get started um but i think there's a next step and this is actually part of our alignment work at openai is thinking about models that themselves are really optimized for explaining what another model did right because here we have these you know with a super complicated problem that this model i came up with a solution for and that it did in a super complicated way that we can't understand but hey we know how to train models that can do super complicated things that we don't understand and so maybe you get an explainer model to do it and i think that really finding the right balance here where you can have a very trustworthy model and you know that there's there's ideas that we have for how to actually do it um but maybe you can bootstrap your way to models that can actually solve problems where we don't even understand the solution but then they explain and they have to really prove to these other models that what they're doing is legit um and i think that this kind of thing is is in our our uh our you know it might take a while to get there but that isn't our future um some of the the broader societal issues that you know something like a codex codex gives rise to or questions like uh jobs uh copyright and you know potentially fairness bias um can we maybe dig into those really quick thoughts on kind of job implications yeah so i i think that the interesting thing about codex in particular as an example of ai in general is that it's just not playing out how people expected right i think that the expectation was that ai's going to take this job and then that job then this job and the only question is just ordering the jobs in in order of automation but in reality i think ai is kind of taking no jobs and it's taking a percentage of all jobs at once and that percentage tends to be the kind of boring drudgework stuff and i think that's actually a pretty inspiring picture right you look at in the case of codex that programming you know being a software engineer requires you to talk to users understand what users want come up with an idea of the thing that they are going to be excited to use and have this picture of like how you're going to build it so there's the architecture of the system when it comes to implementing you want to design in a way that will be future compatible so you know tomorrow users are going to ask you for something else and you should make it so you should make it so it's really easy to build that feature right so you kind of have to anticipate all the different ways that you might want to modify your system none of that is and then you also want to write you know you want to implement using a framework and you know know that after all that api dots and stack overflow exactly exactly exactly so we actually have very poor tools for those those that last piece but that's not what we want to spend our time on and so i think what we're going to see with codex and i think that this again i think is representative of the kind of ai we're building is we're going to find that the kind of like the heart you know the drudge work the part that is like you need to know the whole encyclopedia of your field that you know just like even coming with an idea of where to start like those problems that i think are real barriers to people getting started those are going to start really melting away and then that will free up people to actually work on the exciting stuff copyright is the the next one i know that you know the big issue here is that there are no answers and the system hasn't quite figured it out yet um but i'm wondering what your quick take is on that yep so uh you know i think that you know our position is definitely that you know training training on you know publicly available code and and text is fair use um but i think that it's definitely the case that the technology here is running ahead of the law right i think that you know that's something that i think is has happened many times in the past and so i think that it's time for a public conversation about this like part of the reason that we're doing a preview here you know that this is this is a api that will be available starting to roll out now is that we want that feedback we want to start that conversation and you know technologies like codex you know i think they have a lot of potential i think we would you know be doing a disservice to ourselves if they weren't easy to build um the work lots of people weren't able to use them so i'm very hopeful that that we can figure out how do we get the good of these systems and get lots of benefits and you know just really help help it help it super supercharge the economy in a way that we think is you know doing doing the right thing for everyone and are there fairness bias types of issues that have come up for you in the the context of codex for sure yeah i think that fairness and bias are like kind of a key part of ai and i think that you know one thing that you know first of all i think that those issues themselves i think you know deserve a lot of space because you know we're building these systems that you know that they are being trained on data that is generated by all of us right and that if you're if you're you know sort of not careful you're going to latch onto the wrong things or help amplify biases that exist in the system so i think that this is always going to be an important thing and the stakes are just going to raise as as we go um but i also want to point out that i think that codex also represents a bit of a raising of the stakes of the kinds of fallout that you can get from from a misbehaving system right you know that if you generate some code with codex and it does decide to delete all your files um that's probably not something you want right so i think that that we need to figure out what values go into these systems and that you know we we have uh some preliminary work on this that that i think we've we've uh you know published published a bit on already um but i think you also need to think about how do you really align these how do you technically align these systems with whatever value should be in there and i think that you know look like we've got some technical problems ahead of us but i think the question of you know both who are the people who are actually building it and making sure that that that is diverse and representative enough i think is pretty pretty critical um but also the question of you know how exactly are those values chosen you know who makes that decision i think one day that's going to be kind of the most important problem that we as a community and you know we as a society are facing and so i think that you know it's never too soon to to start really really working hard on these problems a related issue is uh access and accessibility and that's maybe a segway to kind of the rollout plan for codex just a little bit about that yes so we really want this technology to be out there and used we think it can deliver a lot of value and we think that it's like a little taste of a future to come so uh that's really important to us we're going to do the same kind of playbook we did with gpth3 where we're going to have a private beta we're going to roll it out as quickly as we can safely we're going to be scaling it up um that the invites will start flowing on on tuesday um so again when everyone sees this podcast uh that the first invites will all be out and you know honestly we just want to learn right that we have a new technology here and the best way to understand how it will impact the world is by actually seeing it impact the world and our philosophy is very much try to get you know a broad slice of usage at smaller scale and scale it up as you go and there's very particular things that we did for gpg3 you know we have an academic access program in order to make sure that the you know the academics are able to get uh to get access i think that you know for this i think there's going to be different segments that are going to be excited about using it you know i think that people who are programming you know students i think are like one segment who uh we want to make sure that this is accessible to so we really want feedback we really want to see how people are excited about using our technology and we are very excited about you using it and honestly we need your help to understand it awesome awesome and is there a was there something about a competition that you're hosting for this yes um so thursday 10 a.m so i don't know what time you're planning on releasing the podcast but thursday 10 a.m we are going to have a new kind of programming competition so you will be able to use codex as both your teammate and a competitor so everyone's going to get access to some number of queries to codecs while doing python programming challenges and uh it should be very exciting there will be a leaderboard for for the whole internet racing to solve these challenges but really the goal is to get a sense of what is it like to work alongside codex and this is one way we can really accelerate access to everyone and give them give them a chance to get a little taste of it awesome well of course we'll have pointers in the show notes for this episode but greg thanks so much for taking the time to uh give us what is effectively a preview a sneak peek uh although it will be released by the time this shows public uh great to have you on the show once again great to be back thank you so much

Info

Channel: The TWIML AI Podcast with Sam Charrington

Views: 6,369

Rating: undefined out of 5

Keywords: Ai, artificial intelligence, data science, technology, TWiML, tech, machine learning, podcast, ml, open ai, greg brockman, codex, copilot, gpt-3, github, explainability, coding, fairness, generative model, programming, language modeling, coding education, deep learning, gpt 3, github copilot, python, software development, software engineering, computer science, openai, openai codex, openai codex explained, what is openai codex, codex demo, how to use codex

Id: CvgfxH0UZa4

Channel Id: undefined

Length: 49min 41sec (2981 seconds)

Published: Thu Aug 12 2021