Can OpenAI Codex Compete in an ML Competition?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone i hope you are all doing well and welcome to this new video where we are again looking at opening codex but in my previous videos where we've been sort of testing on these toy problems well this time we're switching that up and we're going to test it on an actual problem so if you are not aware of what opening a codex is is it's essentially a model that generates code and you'll i'll show that and demonstrate that in a second but if we quickly pop over to this tab you can see that there is this kaggle competition going on it's hosted by google brain and it has to do with predicting ventilator pressure so so they have this sort of artificial lung and they also have a ventilator they are using with it and with that they are recording some data and then trying to predict i guess the pressure that's in the lung or something so you probably don't pop it it's something like that right maybe not exactly right but something along those lines and i want to apply codex to this problem because in my past videos we've seen some impressive results but we haven't seen actual use cases of where this could actually be used so this is going to be a bit of a i use open ai codex to do the vast majority of the work but as a programmer and like an ml person right i can also kind of guide it in the right direction so this is going to be probably as close as you can get to a real use case of codecs hopefully hopefully we'll see so if you look at the competition here they sort of talk about this they have some data that i have already downloaded in advance and they describe the data here and then they also have a leaderboard there's actually a lot of people working on this challenge i think it's like 1000 over 1500 so quite a fair number of people so this is a real an important problem so if we go back to the data section here they sort of provide a bunch of details about what all this data looks like and stuff i've actually prepared in advance just a little bit by mostly copying this over to codex even from the future here quick note i forgot to mention that we will actually be making a submission to this competition before the end of the video and i'll let you all know so you can all see how codex actually performs in comparison to other people so that will certainly be interesting to see we're about to jump into the code last thing before that though is if you do like this type of content do consider subscribing to the channel i still run a very small channel so every one of you that subscribes and likes the video and hits the bell icon really helps out a lot so thank you so much let's jump into the code so i have this sort of thing i've set up and i've essentially summarized what the data is about i've copied over what the files and what the columns in the data set are and i've done this just to save us a little bit of time so the goal here will essentially be to give as much freedom to codex as we can while still guiding it and doing our best to make sure it comes up with the best result so i want to start this out by we have all this sort of pre you know this context i and then i at the very bottom here i put below we outline steps in the program and how the model works so what i want codex to do is start out by generating its own approach now i'm not sure if this will work very well and if after a few tries it doesn't maybe i'll go ahead and put in my general steps for what i would do but i want to give it a chance first so let's go ahead and click submit and we'll be able to see it start generating some steps okay so let's see what it has oh oh gosh i don't oh no that's that's more than i need okay there we go so load the data from the csv files that's good pre-process the data that's also good create a model train a model test the model okay that's that's all and i'll put the results so these are all good this is quite general though right like this is kind of what you would do with almost almost anything they're also using tensorflow i don't really want to use tensorflow because i totally forgot how tensorflow works uh what i do want to do is i want to make this a little bit more detailed so i'm wondering if we can try and have it specify what type of model this is going to be so create a model the model is uh we let's say the model it uses and see what it comes up with the model uses okay an ls a lstm with a dense layer to predict the airway pressure okay awesome so this is good i'm okay with this i'm not sure an lstm will be the best thing here but it's certainly a good thing to start with because the data gives us essentially its time series data right so we get information about how much i guess about like the pathways of the airway going in and it coming out and all that sort of stuff so an lstm is a good natural choice here so the next thing is to train the model and then test the model i'm gonna uh output what it had before i'm gonna put back in what i had before which is output the results as a submission file okay so one last thing i want to do before we have it generate more is specify that this should be using pi torch because i know pi torch so we wanted to work with pi torch so right here we can say we create a sota model and i'm going to put soda here right like state of the art this might seem ridiculous but this is my naive approach at prompt engineering i guess because what codex is trying to do is it's trying to predict the next character right so if i say that this is a soda model it will be hopefully more likely to output a code that would be c to the soda model that's that's the idea but uh i don't know how long this will work so let's say we create a soda model in pi torch cool so hopefully we should be using pi torch now this is uh one thing to note is that it's generally code now but i actually have a very long context as you noticed i put in a lot of data so i am curious of how well it it you know will remember all these details like here you'll see it has uh it has ranges i wonder if it will actually remember those i i really don't know so we've we've done some importing though so this is good we've imported a standard scalar so it's gonna that's good uh loads up the data so reads from the data directory which i had specified in the context so that's that's correct to train and test csvs awesome then pre-process the data so let's let's let's generate a little bit more and see what happens standardize the data convert it into a pi torch tensor oh gosh i really have no clue if this is going to work or not this is a little bit spooky you know what i'm going to do is i'll be right back i actually want to plug this into a jupyter notebook and see if this is working as we go so i know if i need to adjust it as we go so give me one second and i'll be right back there we go as you can see i've copied over all the code into this python notebook here in visual studio and i've been testing it a little bit now you might notice i made some slight changes so if we go back and we look at the original code real quick you can see that it's grouping these into essentially groups of 200 rows per example now as it turns out that actually is an issue because i did a little testing and i didn't specify this so i guess the model couldn't have or the yeah codec couldn't have known but really these comes in groups of 80. so i changed the 200 to 80 to help it out a little bit as i said i am going to be slightly hands-on here to help this out if if it's you know something minor like that so let's go back over to this is what i have here so this last line here though is a bit of an issue so this train valve it's essentially splitting up the data into training and evaluation data that's not an issue the issue is that it's randomizing the order and because this is sequential data we can't just be randomizing the order without regard for that so what i'm going to do is i'm going to delete back up to that point and i should show you a quick example of sorry before that this is what the data looks like right so you can see essentially what it does it did in the last step was it created this i don't see it here but if we maybe do this one more time it created this oh time yes so time is what it created and time is essentially the index in this specific example so it's kind of working so let's let's delete up to this random error delete back to this randomization part let's see so split each example into train and test that where the test set is used to make predictions for the submission file except for that is not true because we already have the test right so rather we want to split the example into hmm maybe we can just get rid of this entire let's try regenerating and seeing what it comes up with this time around i'm going to go ahead and turn up the temperature a little bit and the reason why is because we want more varied responses we want to try and generate something else so let's see if we get something different we turned it up a little bit maybe it's still too low yeah we're still getting this and again the issue is that so maybe we can reword this comment right here and maybe that will help it out so now that we've specified that let's try re-running this and seeing if we get something different this time okay okay so it just doesn't shuffle this time which is that works right so it's just instead of it's doing the same thing without the randomization i'll i will take it so next it makes a standard scalar which is good we're going to need to scale the data because it is all on different scales and we're using a neural network and neural networks tend to work a lot better when you do either you either normalize your data or do some sort of normalization i wonder if it's a good idea to change the time step column we probably don't want to do that what we can do to maybe help this is specify which of the columns we want to use for training in which ones we want to normalize there we go so i have typed out the columns that need to be normalized and specifically noted that pressure is the target column or what we're trying to predict so let's delete this and try re doing this now and see if we get to the right columns okay so it fits on the training data for rc u n and u out that's good that's what i had right here and then it transforms each of them to be the new values and then it also fits on pressure a different scalar and then i guess it does the same thing except for it doesn't do it for the test because it's not in the test which is good because i i think i did write that but it was way up here before uh it was somewhere up here so i'm i'm kind of impressed that it uh it kept that information i i guess i did write that it's the target right here so maybe it could be implied that it's not in the testing set but either way this looks good what i'm not sure about though is each of these values should be normalized like separately right the values of r and c should not be combined together and then normalized each one should be normalized separately i'm not sure if that's happening so what we can do is we can copy it over to visual studio and give that a test to find out i've copied the code over here so now these lines should scale this data appropriately hopefully it works so if we do train.describe we should be able to see some descriptions of this data or a description so if we look at the mean in these specific columns so r c u n and u out we can see that all of these are very very very small close uh very small numbers close to zero which is what exactly what we were hoping to see that means that each column is being scaled separately we can also see the standard deviation on each of them is one which is exactly what we were expecting so that is super awesome i've also gone ahead and copied the code over to this is i have like a different notebook for testing and a different one that will be used for our final generation so just note i'm kind of switching between those so hopefully that is not too confusing now if we come back over here we should be able to just keep generating this is doing pretty good so far so let's go ahead and generate a bit more uh here i forgot to switch this to 80 this uh where it said 200 per row so let me switch that it looks like it's messed up so 79 okay so we're converting these into tensors that's good and create a model okay so let's see what it does for creating the model i'm not going to read all of this but we can look at the code and see if it looks roughly right so it's creating an lstm cell and there's some input size okay i'm not sure if this is right or not but it looks roughly like it could be right so let's generate a little bit more and then we will run this to see and make sure we don't get any crashing so we'll take this and i'm copying it over here let's run this that looks good so now let's go down here and copy up our model and hopefully the model runs without air this doesn't necessarily mean that it's oh it's just because i ran it twice so now if we come over here and we paste our model let's make sure that this runs without any issues now i should say just because it does run without any issues doesn't mean that it's necessarily correct but it's a good sign nevertheless so let's go ahead and generate some more i guess we might it might be a good idea to look over this but you know honestly i'm a little bit lazy we'll figure out if it works soon enough so okay so it's generating some more training so it's printing out the loss i guess and then this is for evaluation it looks like it's doing a whole eval loop which is good we want to do some evaluation so we know how we're doing and then the training is complete so let's actually generate a little bit more ideally i don't like to generate so much at once but i kind of want to give it the opportunity to do this then maybe we can go back in with like github co-pilot and fix individual lines at a time or something i'm not sure let's see so output well it sure does love generating all of these ah here we go and i think that's the end of the program so if i try and submit more yep it will not generate anymore so it looks like this is it we should be able to run these two things right because we've defined the model and we've created this train function the test in the output we've not copied in yet but let's go ahead and try these out so this should maybe we want to try and create the model first just by itself to see if that gives us any errors hopefully it doesn't but let's see okay the model was created that is great so next we can go down here and we can see if it will train and it does not because we have an error so what's the error object of type function has no length hmm what could be going on here i'm back and i found the issue it's actually a funny little issue so if you remember up here this had originally been named train the data we were using was named train and the function to train is also called train so this train function was overwriting our data so i went and did is i changed this to train data so that little fix should hopefully fix that issue but let's see if this works after that oh i need to create the model now let's see if it works okay so we're still running into an issue too many values to unpack so what this must mean is that this train loader we're trying to get inputs and targets out of it but it must only have one value in it so like just the inputs or something so i'm gonna go ahead and fix that and then i'll be right back once that is also fixed i'm back and i've not actually fixed the issue yet but that's because i noticed another problem while i was going through this and that is that if you noticed earlier here we have our train values and all this stuff right here right well when we're converting this to tensors the issue is that we're actually taking all the columns of our data which is not actually what we want what we really want is we want specifically the columns that we're using for training plus this pressure column that we're using as the target so what we really should be doing is taking those and then converting these into tensors and then here we can put it in this i guess this data format that it wants below where it's trying to get the inputs and the targets so i don't actually want to write this all out by hand because i want to see if codex can kind of fix this up a little bit so what i will be doing is in line i will be using github copilot which is as you'll see here it's just sort of i guess it's the same thing right it's using codex to power suggestions so i'll be using these to fix this so let me go ahead and write some comments as to what i want here and then i'll be back and we can test this out in line okay so we've got this here now i'm actually going to go back to here and we're going to hit submit and see if this will give us anything because clearly it was having trouble earlier so it looks like it's about right let's go ahead and let it generate a little bit more okay so it does the same thing for the value in the test and it goes back to the model so let's copy and paste this back into our code i guess there really is actually a difference probably in the models they use between the ai codecs and the github copilot that's here and maybe that's why it was not generating things or maybe they use it a little bit differently i'm not quite sure but it's interesting to note so we've got that and now what we can do is i'm actually going to copy and paste this back into our so i'm using so many files here but it's not super easy to generate this or it's not that bad i guess and let's run this and see what happens function is not subscriptable that is again because we have the train function where what we want to be doing is say we need these i'm going to convert these into data i'm going to reformat this just to make this a bit easier now what i've actually just decided to do is i've copied and pasted a model in here and i'm just going to regenerate this training loop because now that we've trained the now that we've changed the input data we probably need to be able to redo this and it looks like it's actually doing it much simpler this time which is a bit nice it's not making different functions which is generally nice but for the purposes of this video it's probably best to not do that so let's copy and paste what we have over here into our into our code and we will see if it works going back to visual studio i'm going to delete all our training it's very sad very sad but it has to be done we can delete all this and let's see so we create a model i'm going to put that in its own cell we already know that works though so that's great all right if i run them in sequence it works and then here we create a loss function and let's see so for in range of epochs so it's gonna i'm gonna change this to 10 epochs because you know 100's quite a while uh and i need to finish this video at some point so forward pass it puts in the train input and it gets the output thread and then it does a msc loss on them i don't actually know if this is going to work because i haven't used lstms in a while but let's run and give it a go it hasn't crashed yet that's a good start it's definitely a good start and i'm hoping for the best it looks like we unfortunately got an error and i've noticed a few things because of this so the first thing that is actually causing there we can see the input size is this then we have an 80 sequence length and four input features and we are also for some reason outputting four features so if we go up here we can see that in the model the out features is equal to the input size really let's just change that to one it's not a huge change so i don't mind making that change by myself and that should give us one output now another issue is that it's taking all 75 000 examples and kind of throwing them in at once which technically would work um for the purpose of this video that that makes this very slow which is not ideal so what i want to do is try and put this on the gpu which you know i'll excuse uh i'll excuse codex for not doing that because i didn't specify that so i'm gonna go ahead and move this to the gpu real quick and do a little testing to see if i can get this faster for the video and then hopefully i'll run this again and hopefully we'll get it working okay i am back and unfortunately we got another error because as i kind of expected throwing 75 000 examples through the model at once did kind of overload it on memory so i'm going to copy over my changes to the codex and then from there what we're going to do is we are going to try and regenerate it to we're going to regenerate to do batches batches of data instead of just throwing everything at it at once like an absolute madman we'll say train the model in batches batches now we can say the number of epochs is 10 and the batch size equals something like 128 and once we've got that let's go ahead and regenerate which is kind of sad because it looked like the last one might have actually worked but i bet it will generate another thing that works so hopefully we should be good to go so looks like we've got some sort of training function here and it's not too complicated again so that is very nice it looks like it's just doing a selection now we can copy all of this in and i am just now noticing that we probably also don't want to put all this on cuda at once so instead what we will do is where is it so batch input i'll put dot to device so we will just uh we'll just do this as we go then it's thinks we have a loss function but we don't why is that did i forget to copy something i did i forgot to copy this oh my my oh my so careless of me so let's copy that up here okay i did a quick reset let's hope for the best okay that is not great the size of tensor a 80 must match the size of tensor b 128 so it looks like we have a a dimension mismatch going on here in the lost function specifically i'm back found a quick little error we just needed to have a one in the outputs it was expecting it was just a simple dim dimension mismatch nothing too complicated i also averaged the losses so when it will eventually print out the losses here it should average over them uh just just a little thing to help out with the video nothing big let's run it and it works except for i'm printing something out that i don't want to be printing so let's fix that i'm actually amazed we've gotten to this point hopefully it actually works i guess it just might not work at all so that would be a shame but it's running so let's give it a few seconds and once we get oh no we've already got an output so the first loss is 0.5879 nice okay so let's give it another second for the next epoch and see if that goes down it did go down a little bit so i'll give it a little bit of time and once we get to the end of this i will let you all know okay we've gotten to the end of it and the loss was going down the whole time so it looks like something is being learned probably so that's great the issue though is that we don't have the validation loss yet now i think if we go back to the codex playground it was starting to generate it here but really what i want to see is i want to see constant updates of how this model is doing on the value of the evaluation data set as we go so i'm going to delete this part and i'm going to add a variable called something like log frequency and we will set this to i don't know something like 50 to start out and in this loop we are going to say calculate the what is the eval evaluation loss calculate and print and hopefully it should do that and continue just as it was doing before this looks about right so i want to of course test this we should also be printing out the the normal loss too you know i really here this should be a quick little fix we'll do in p dot mean [Music] losses negative log frequency so all this should do is print out the losses which is great training input divided by size so let's take this and copy this back in and see if we're getting what we expect now i am actually uh this is this is quite a long line hmm i think of log frequency well oh well i'm going to change the log frequency to like something very small like five and that way we can actually test this out and see how it's doing before we uh figure out that it's not working at all or something okay that looks good now it looks like the validation loss is zero though and that is not great so why could that be the validation loss so let's see the validation loss is the last thing that it is printing out and when it is printing it out it's printing out valos dot item interesting i wonder what's happening there i actually have no clue val oh it's checking vowel output with vowel output well of course it with itself has a loss of zero because it is going to be the exact same as itself so instead this should be i'll call it for prediction and put that i believe it goes right here now if we run this we're going to get an error because we have not put these on the there's a very inefficient way of doing this but you know it will be fine hmm still an issue so an issue it says oh it was supposed to go right here oh my god and there we go and we get the validation loss which is going down very rapidly so that's great last thing we'll want to do is change the log frequency to something greater like that and we'll just try it five epochs and i'm going to run this and as soon as we have the results i will get back with all of you so see you all soon oh there's a little ear hair but you know it's fine i'll fix it in the break while i'm gone the model is not quite done training yet but i wanted to make a quick update because i noticed that the hidden size of the model before was 32 i went ahead and changed up to 128 just because you know we might as well try a little bigger model if we can you know 32 is pretty small might as well spice it up a bit and then the other thing i did is i noticed the model was not specifying model.train in model.eval this should improve our performance a small bit so i thought i just added it not a huge thing but the ai codex did miss that so i'll be back again uh once this is actually done training okay i am back and i actually cut the training process off a little bit early because it looks like our loss we weren't really able to go so far beyond 0.5 but that is totally okay that is still i mean it's time it's kind of hard to say exactly how good that is but that is why we will be doing a submission so that's the final thing we're going to do is we are going to have codex make a submission to the leaderboard so let's go over to codex and we will let's see what we have here test the model so let's paste this in here and it's just the update and now we will say run the test input run the test input through the model to generate a submission submission and hit submit and hopefully that works i don't know if i specified what how this submission should be formatted but hopefully i did guess we let this keep going test predictions oh no oh no it's gotten into an infinite loop let me look up the format for what this is supposed to be and specify the format so it has an easier time doing that let me check that real quick here we have a sample submission so what you can see is that it's just the id on the left and then on the right you have a column for pressure with the pressure so if we go back here and we we can specify run these systems to make a submission file then we can say the format of the submission file is uh i guess we can say a data frame with two columns id and pressure the id should be taken from the original test data and now hopefully this will be able to generate something a little bit better oh there we go id equals test id test i guess that's from back from our test data did it not didn't we change that to be test data test input and i guess we just have test yeah okay so it should be test so test idea and then it has the pressures and it just reshapes it as negative one i think as long as didn't randomize the order this should be correct maybe i should double check because i don't want to submit something just for it to i'll be wrong but yeah it doesn't look like where it doesn't look like we are randomizing anything anywhere so hopefully that should be it as long as we don't run out of memory which honestly i think we might while running this but i guess we are about to find out let's go ahead and give it a go will it work will it work if it does work i'm going to be uh very happy very happy oh oh okay we have a submission file and it looks to me like this is exactly what we're looking for however there's a catch here codex forgot something and it forgot that we normalized our data so we are going to need to unnormalize it thank god i remembered that or i would be very sad thinking that this didn't work at all uh at the end of the day so i almost forgot that is a close one so we have our test spreads uh we can just put a comment in here to say on scale the predictions there we go that inverse transform scalar dot inverse transform i didn't even know that was a thing well maybe it's not but hopefully it is if it is that's really cool let's take this down to over here oop and hmm oh look i was just going to add this fix and it actually has it itself when i delete it i wonder why you didn't get it the first time but that i think should be all we need and that should get us our submission so let's give that a go oh you know i need to regenerate these because i let's rerun this one more time and then we should we should actually uh be to the end of this now awesome so that ran let's look at our submission and it looks like these numbers are a lot different now and yeah i mean i think this uh looks like hopefully what we're looking for this is roughly the right range of numbers i did a quick little check before this so i'm gonna go ahead and submit this and i guess really we just have to hope for the best i'm actually really curious to see how this turned out i'm not expecting any medals here but um maybe if we're not in last place i'll be quite happy let's head on over to here and submit predictions then all we have to do is open up our prediction and drag it in here so where are we down here drag over the submission.csv and let that go okay it finished and we have a 6.39 now you probably don't have much context so you don't know how good or bad that is um as it turns out uh if we look at the leaderboards um 6.39 is very very bad um a 6.39 puts us in 1 618th place which is we're in the bottom what is that bottom 80. i guess we're not in last place so that's um better than nothing uh but yeah uh this is a terrible score it's slightly disappointing but i guess the code ran i'm really wondering what went wrong so i might take a second to actually look over the code myself and see if i can catch some like anything obvious you know and if i do i want to let you all know so i'm going to do that real quick i'll do a last quick search and if i find anything i'll retrain it and resubmit i made some changes looked around and i made one last submission just to make sure we ended up with a worse score so looks like there there probably is a problem with it i am calling it right now as soon as i upload this video someone is going to look at this video and they're going to be like eden how did you not catch the super stupid error online x and i'm gonna be like oh my gosh i can't believe i didn't catch that so if you do see the air do let me know i'll fix it make a new submission and then i will upload what the score was once i fixed it if there is actually something like that but either way it was interesting to see what open ei codex was capable of when it came to an actual problem i personally probably i'm not sure i would have used an lstm on this it's definitely not a bad option maybe some feature engineering would help here i've seen some other people do that for this competition or maybe there was just a blatant air that i didn't catch but anyway i hope you found this interesting if you do like this sort of thing consider subscribing to the channel i have a small channel and every one of you that goes and hits the subscribe button likes the video and hits the bell icon it really helps out a lot but anyway thank you so much that's it and i hope to catch you next time
Info
Channel: Edan Meyer
Views: 1,709
Rating: undefined out of 5
Keywords: openai, codex, openai codex, github copilot, AI, codex ai, machine learning, openai copilot, ai singularity, self improving ai, ai that codes, ai that codes itself, self programming ai github, meta machine learning, nlp, natural language processing, nlp for code, GPT, gpt-4, gpt-3, gpt model, machine learning model, python, openai codex demo, openai codex tutorial, codex demo, what is openai codex, how to use openai codex, two minute papers, kaggle, competition
Id: V1cpfbCVytg
Channel Id: undefined
Length: 32min 8sec (1928 seconds)
Published: Mon Oct 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.