Can OpenAI Codex Debug Its Own Code?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello hello and welcome to another opening codex video where this time we're going to be taking a look as to whether or not it can debug its own code or just code in general so quick primer for people haven't seen this before codex is a model made by open eye that essentially generates code so if i put like a comment here i say like print hello world or something like that and then i hit generate or submit here i should get some code that will hopefully print out hello world although it is taking a second oh okay it also printed out some other stuff but it didn't print out hello world so that's essentially what opening codex is but i'm curious as to whether or not it can debug code because if you've seen my previous codex videos you've seen that it does get you know some things wrong sometimes it does a very good job but it does get some things wrong and imagine if you were a programmer and you had to write an entire program without testing your code without running it to check or doing anything and you only got one attempt especially if you're working on some complicated machine learning stuff i think the results are usually going to be pretty obvious you will almost never get it in the first try you know it takes time so we are going to be taking a look if given an error message it can essentially fix some code but also if it can find silent bugs in code bugs that are not obvious and fix those too so i don't want to keep boring you with the the description here let's get to it i prepared several examples i haven't tested them myself yet though so i don't know exactly what's going to happen so i'm curious to find out before we dive too deep into this i want to take a quick moment to ask that if you enjoy this type of content do consider subscribing to the channel it really means a lot to me and it helps out more than you probably would imagine anyway thank you so much let's get back to the video so let's start with my first example uh this is i'll just copy and paste it in here and one thing you'll notice by the way i've turned up the response length to like 200 or so also i set the best up to 5 which i wasn't doing in my previous videos but this will actually give us the best of five different possible generations so hopefully we'll be actually doing better now so let's see implementation of bubble sort with a single test case at the end so this is our first and first example if you're not familiar with bubble sort it's like the the probably the simplest sorting algorithm you can get it just loops through uh it has like two loops through a list an inner and outer loop and it just rearranges the elements to go from smallest to largest so we are going to hit submit and first we are going to generate bubble sort now i was actually having i was actually having trouble getting codecs to make errors so i'm going to introduce an error in myself for some of these for these simple examples at least so we're going to take this code right here and this should work right so we are going to take this into visual studio code paste this in here and we are going to run it and as you can see we print out bubble sort and it looks like this is in the right order although i will say i will say this is already in order which is a bit strange let's let's move this out of order we'll put like a 17 there and uh a six here yeah i don't know why um or whatever you know it's fine it's fine now it's in order right so our r6 is in the right place and our 17 is in the right place so this works let's now introduce an error and take one of these things out so if we do this we should get in there right so if we do this there we go we got in there so what we want to do is take this new code first of all and put this back into our playground so let's replace this in here so this is the code with the error and now we're going to come down to the bottom here and i have a little prompt i prepared so the prompt is going to be let me find it real quick here it is and i'll paste it right over here so i've essentially set it up as if it's some sort of like test or something right so we have question one one running the above code gives the following error so let's go and copy and paste our error in here so i'm just going to take the whole error message we're not going to parse it forward or anything like that we are going to let it struggle so let's replace this with here gives the following error propose code that will fix the issue and then we leave it open for an answer i'm going to add a stop sequence of three quotations right here and this will essentially make it stop when it gets to the end of its answer so i haven't tested this i have no clue if this is going to work let's submit and find out i'm actually kind of nervous i hope this works out for a video the issue is that the code is not checking if the indexes are out of range the code should be modified to check if the indexes are out of range hmm so that's technically it's correct this is a list index out of range however however you know that's not really the really the issue is that we are going too far with this with this right because before it was only going to the length of this array minus one now we're going to the full length of the array which we should not be doing so i'm not satisfied with this answer so let's let's see if we can probe it a little bit more and just a spoiler i did test one of these so we will have at least one of these that works later on if you want to see an example so do stay around for that so let's actually change this prompt a little bit and we'll also delete the answer let's say the code should do a bubble sort but there i guess i guess it knows there's an issue running the above code gives the following er let's put it up here let's the code above should do a bubble sort however running the code above running the code gives the following error proposed code that will fix the issue so hopefully it actually proposes code this time let's see let's see answer the code will not work because the last iteration of the outer loop will not be able to compare the last two elements of the array okay the code needs to be modified to include a test case at the end of the outer loop so this is like i don't think this is right either because the last iteration of the outer loop will not let's um you know maybe it's stopping before it can rewrite the code so let's let it generate a little bit more bubble sort right okay um okay let's take out this stop sequence it's trying to generate these uh okay so so it is giving us code maybe we just didn't let it generate long enough okay it's taking its time [Music] so i think this will work i think it just solved it so if we try this now and we run it it does work so it did actually debug the issue i think the issue is we just weren't giving it enough time to actually to actually do that the thing right it was stopping at these three quotations because i told it to but it was actually trying to put the fix after that so hmm that's pretty good and now i i should say a little caveat here this could just be because it's just recoding bubble sort from scratch um or maybe not i really don't know but i'd say that this is a success so round one is a success let's move on to my second prompt so the second prompt is a little bit easier but we're going to work with something a little bit harder which are silent bugs encode essentially bugs that will go unseen if you don't take a look at like what's actually happening so here where's our prompt grab in the prompt generate 20 digits of the fibonacci sequence but with a minor error in the code now when we oh that is not what i meant to do what i meant to do is this and then generate so i have tested this to make sure that it gives us the right code and yes so this is an issue so if we take this and we go into our code and we run this you can see that it just generates one number right it should be generating the first 20 digits of the at least just 20 digits of the fibonacci sequence so that's an issue so what i'm going to do is now remove this part so just generate 20 digits of the fibonacci sequence and then i'm going to post in our little prompt here for detecting the issue so again it doesn't throw an error message but there is an issue here in the fact that it is not printing out the full sequence as we want so question one does the above code snippet give the expected output so the answer here should be no let's see what we get and i forgot to put in the stop sequence so this might go for a little bit longer uh answer oh oh wow it just generated a bunch of answers and its own questions okay well let's delete the rest of this this is all we need so answer no good but the code snippet prints out the 20th digit of the fibonacci sequence which is it even gives us the number 6765 is that is that right it is oh my gosh wait what i how is it getting it's is this memorized or is it actually doing some sort of computation i'd be willing to bet that this is memorized and this isn't actually doing a computation but might be wrong maybe i should have another video just figuring out how it gets this number i'm very curious if it can do maybe this model can actually do some sort of math and wow i'm impressed i'm very impressed by this i haven't tested this and that that is exactly what we wanted so round of applause but we're going to go a little bit deeper now actually my next question was going to be asking what the issue was in recording it but it's already given us what the issue was so wow that's impressive let's i guess we could delete this but let's just keep with this and we'll get to the next prompt which is actually fixing the bug so question two the code above did not give the expected output proposed code that would fix the issue so let's see if it can give us code that will fix this so let's try and get the answer hmm okay it stopped why did oh it just generated a bunch more than what we wanted again but that is okay oh it didn't give an answer that's that is unfortunate hmm i wonder why it didn't give us an answer maybe kind of like the last time maybe we need to phrase this differently actually what i just deleted had code hmm so let me generate this one more time because i want to test something as you might have just seen when it generated that huge sequence right there it actually did give oh does not give the expected up oh yeah yeah uh so oh this time it gave us i i don't know what was up with that last time but it looks like it worked this time so this is the answer the question is blah blah this is not tail recursive i think that is correct maybe the code snippet wow this is a very in-depth response let's just see if it actually the code can be fixed by adding a return statement to the end of the function i don't think that this is a fix generate let's see let's see i think this is the exact same thing we just did but i guess we are about to find out yeah that's the exact same thing so that unfortunately does not fix the problem but i have a feeling it can't so i'm gonna try a little bit harder here and you know i do say generate i don't say print let's uh let's actually take out this question one here and we're gonna say question one the code above does not generate the expected output propose the code let's say uh print 20 digits of the fibonacci sequence so i'm going to be a bit more specific here and let's see if it can get it now so if we go here and i'm going to take down the response length because it's been generating quite a bit the code above does not give the expected output proposed code that would fix the issue let's try this this if this doesn't work we'll move on to the next the next test but i have a feeling we can get this to work so question okay there we go but oh i think again this is still the wrong thing let's see yeah it's the exact same thing that that's unfortunate however we'll we'll move on to the next thing so the next thing is i want to try telling it exactly what's wrong and seeing if it can solve this then the code so so let's give it a little bit more help and see if it can solve this maybe if we tell it what the errors because that would still be helpful right we could as a human like see an error tell it what the problem is and have it just fix it for us hopefully so the code above does not give the expected output instead of printing a sequence of 20 digits it prints out the 20th prints out only the 20th digit of the fibonacci sequence i hope i spelt that right propose code that would fix the issue so let us now try this if this doesn't work i'm going to be a little bit sad honestly as oh okay oh i think this is still not right postcode that would fix the issue i think this is the exact same thing yep that's the exact same thing that's that okay it didn't get this and i'm gonna i'm not gonna lie this is a little bit disappointing because it couldn't give us the right code even though it's not that difficult however it did if you remember in part the first part of this test it did give us the reason for why this is not working so i don't know take what you owe from it it works in some places and it does not in others i do have one last thing i want to test here and this is going to be a little bit more difficult we are going to be trying linear regression now this time if you've noticed the last two times i sort of i tried to introduce errors into the code this time i happen to find something that just has an error on its own so we can see uh you know without me like manipulating something whether or not it can actually open a codex and actually debug its own code so let's go ahead and give this a try so implementation of a linear regression using toy x and y data so let's keep generating cool so and we essentially do some imports we create the x and y data we show a plot of the x and y data and then we calculate theta now if you're used to doing linear regression with like stochastic gradient descent or something like this this might be a little bit weird this is actually the closed form solution i believe i haven't actually tested it but it looks like it is right where you can actually calculate the perfect weights from you know like a one sort of pass sort of thing you don't need to like do some iteration stuff so theoretically this should be able to work but i have tested this and there is actually an issue so let's figure out what that issue is and then we can move on with testing the prompt so let's copy that there run this you can see it shows the scatter plot as there and now we get this issue i actually don't even know what this issue is myself i'm sure i could figure it out if i spent some time on it but i have not so what we are going to do is we're going to copy and paste this issue as we go back just go back here and here is the raw raw log so now what i want to do is i'm going to copy and paste my prompt in here for fixing error messages so our prompt for fixing error messages question one run the above code gives the following here so let's copy and paste our air in and this is a bit more difficult so i'm not sure we should expect it to get this but maybe it will propose code that will fix the issue answer i hope for the best let us pray okay answer and we got nothing hmm what's it trying i'm curious what's it trying to print out though it's just printing out the exact same thing maybe what we should do is just give it a little a little head start we can do something like this so answer and then we can start we can copy the first part of it and see if it will follow follow along with us let's just do that let's just paste that and see if this works okay so it is generating something is it going to generate the exact same thing though that is what i am scared of so i think this is the same code but let's go ahead and test it i think that was the same right yeah that is a shame that is a shame okay so that unfortunately does not work but let's let's try the other prompts let's try the other prompts and you know the other thing we can do is we let's actually i don't usually do this because it takes a while to generate but let's do best of 20 and i'll give this one more run because honestly i was kind of hoping it would work i'm slightly disappointed but at least at least you all of you watching this know that this is uh this is real i'm not uh pulling any shenanigans here this is actually uh what you can expect when you use this so it looks like that's the exact same code again unfortunately which is pretty disappointing but i guess you know maybe we're expecting too much and maybe we're just not using the right prompts right it could be that this would work really well if we use different props i really don't know okay so that doesn't work that's unfortunate we can still try a few more things so running the above code gives the following error let's do this why is this why is the air thrown and i'm just kind of freehanding at this point and what change should be made to fix it and then we will have an answer and let's see if it can answer honestly i'm not expecting much this seems pretty complicated but answer okay wait we got something the area is thrown because theta is calculating using the inverse of x dot t dot dot x the inverse of a matrix is only defined when the matrix is square and a full rank that is true i believe uh in this case this is not square that is true it's not square and is of rank one to fix the error we can add a column of ones to x [Music] so let me let me actually look at this and evaluate whether or not this is right so implement out you know what um open dot dot x yeah this is this is actually true this is really weird so it's doing a dot product right um although i'm not sure if a dot product yeah in this case this is i don't know why it's even taking the transpose it's a bit weird but it looks like it is just taking the enterprise so this would result in a scalar value and if this is trying to take the inverse which we can look up really quick because i don't have never used this function yeah so yeah of a square matrix so it looks like it might actually be getting the issue right so let's see if we can actually now prod this to give us the right code now that it's given us this maybe this will work so question two will be now give code give a code snippet that would fix the f4 mentioned issue and we have an answer let's see maybe it will work this time i really don't know answer the column of ones is added okay so let's let it finish that it's not doing anything else the column of ones is added to y to make that oh oh sorry it generated a lot of stuff actually and i just didn't realize it so this is what it's proposing saying x equals mp.v stock mp.ones of length x okay it's setting the x value to a bunch of ones oh actually no no no so it's stacking ones with x will this work i have no clue i guess we're gonna find out i it doesn't tell us where to put this though right so if we run this is this just gonna work i doubt it x and y must be the same size yeah um maybe rather than that we should say rewrite the code now rewrite the code we write the code in to fix the aforementioned issue let's try this let's see what happens if this doesn't work i think i'm i'm going to give up this video has everything gone on for a bit but we did get an answer here and i think this is exactly what i just did i i'm i'm tempted to give up but i've also tempted to keep trying this let's see okay x and y must be the same size so okay let's just let's copy this i i really want this to work so let's give it another let's give another go say so question three uh question three the previous error was fixed but now there is a new error being thrown we'll give it this value there and i guess we should say should we just give it the whole traceback i guess that'd probably be helpful maybe probably not honestly it probably hinders it if anything but we'll just give it to it in case uh so propose new code to fix this issue and then we will have our answer and see what happens i hope this has been interesting for all of you this has been a this has been a really interesting video to do for me so what's happening here is it changing anything i'm not sure it is i think this is the exact same code but um oh it generated quite a bit there question three was the one we were on right yeah so we'll copy and paste this wait wait question four is the code above is still throwing in air proposed new code to fix this issue did it uh if it does it already predict that there will be an error i don't think so but uh oh let's not get rid of our imports and oh uh [Music] huh well huh so clearly this is not right uh this is not right at all however it did fix the error i guess but it didn't maintain the correctness of the program which is an issue that is obviously an issue so what to do here i i think i'm going to call it here this has gone on long enough i do want to talk very briefly about some other things it was generally more i wonder what was going to generate it's not thrown in air though so anyway that's all the things i wanted to test the last thing i want to go over is where i see this going and just sort of my thoughts and my thoughts on this are that i think this sort of process of having based on the results we've seen today at least i i think this is clearly not ready we definitely need more work in prompt engineering right prompt engineering is interesting something i really know very little about i guess this is like very naive prompt engineering that we're doing right now we're just testing out things i'm sure there's better ways to test these but once we do get a better handle on this i almost certainly see this being the way that codex progresses not just with bigger models but being smarter and being able to test its own code by testing its own code you're giving it a chance to you know work the way humans do as i said earlier you could never really expect a human to write it you know a hundred line program and get it right on the first try it's just you know it's it's unreasonable it's unreasonable to look through that and be able to predict everything so perfectly that's what we have testing for that's what we have you know these dry runs for and you know that's why debugging such an important skill so that is where i expect codecs to go next and i'm curious to see what you all have to hear do you agree with me that this is sort of where we're going and how interesting do you find this do you want me to do more of this on the channel i've i've really been enjoying the opening of codex stuff so if you do enjoy it let me know in the comments and again do consider subscribing if you haven't already i really do appreciate it and it helps out a lot anyway thank you so much for watching i hope you enjoyed and i also hope to catch you next time
Info
Channel: Edan Meyer
Views: 3,325
Rating: undefined out of 5
Keywords: openai, codex, openai codex, github copilot, AI, codex ai, machine learning, openai copilot, ai singularity, self improving ai, ai that codes, ai that codes itself, self programming ai github, meta machine learning, nlp, natural language processing, nlp for code, GPT, gpt-4, gpt-3, gpt model, machine learning model, python, openai codex demo, openai codex tutorial, codex demo, what is openai codex, how to use openai codex, two minute papers, debug, debugging
Id: Pkp1MRFGUVo
Channel Id: undefined
Length: 26min 31sec (1591 seconds)
Published: Wed Sep 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.