GPT-4 auto prompt generation for GPT 3.5 turbo 16k experiment with evaluation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi there in this video we're going to take a look at an interesting experiment I I did uh it's about autogenerating prompts we're going to ask gp4 to generate so many prompts for us for a specific task in this case the task is to generate some code for in this case a python calculator using TK inter but the goal is we wanted to generate at least we want uh the model to generate at least 500 lines of code see here we're we're trying to so you're trying to get GPT for to generate prompts which will make GPT 3.5 turbo 16k to Output at least 500 lines of code so I thought it would be an interesting experiment to try plus it's evaluatable so we can actually count the lines of code uh so we first ask GPT 4 to generate the prompts and then we save them each one to a Json object along with and then we save it and then we ask GPT 3.5 5 turbo to generate the code somewhere here yeah right here the code generation and then we save both the prompt and how many lines this prompt were to did get GPT 3.5 turbo 16k to generate how many lines right so we'll talk about the code in detail I also split this into three parts I got the prompts generated in a neutral tone and a in polite tone and in more like a commanding authoritative tone just to see if it would make any difference so here I'll show you some charts that I got at the end because since we are saving all the promps and how many lines it generated we can actually plot it so this was one of the first experiments I did which was promising as you see this blue line is how many lines of code each prompt was able to get GPT 3.5 turbo 16k to generate and this orange line is the average and it seemed like it was going upwards the reason is because once we generate the prompts and then the code then we know since we know how many lines each prompt has generated we then take the top three from the first top 10 we we take the top three from the top 10 and give give it back to GPT 4 okay look these are the best prompts and then we pick the bottom three from the bottom 10 and say these are the worst originally wanted to pick these at random but then I changed my mind that's why there's only that's why I'm saying three from the top 10 from the bottom you can actually modify to do that however the my first experiment was promising so this this is 10 prompt 10 iteration so gp4 originally generates 10 prompts and then we picked the top three and the worst three and then generate 10 more prompts and 10 more prompts to 10 iterations so this looked promising but then when I did this again and again it just didn't really so we looking at the Orange Line it didn't really do that well but then I realized I had a misspelling in a prompt and then I did again with without the misspelling and as you see I wasn't able to see much of an improvement so and then let's take a look at the polite as you see it's pretty steady and I have some older data here as well from a previous experiment so we save all the generated code as well so I'll put this code in patreon and along with all the codes that were generated as you see these are all calculator code let's take a look here commanding as you see this is still very much a flatline although one of the prompts generated really number 98 I believe we can take a look at that actually number 98 is really long and actually works look it's 316 lines of code and when we run it here is the app it actually has almost everything in it except the equal button ironically missed the equal unless I'm missing it so actually all the buttons work see I believe well this one didn't but there's no equal sign anyway but 316 lines of code but I tried this prompt I can actually find this prompt in data. Json right it's the 98 prompt It produced 315 lines of code I actually tried this with CAD GPT and I couldn't get it to produce this many lines of code so I don't know if any one of these prompts are special or not but uh I'll I'll let you be the judge of that see takes some Target lines which is how many lines of code we're going to produce right so I wanted a system which I can evaluate so I just chose to produce prompts which is going to write some python code and the evaluation Target us how many lines of code it produces you can actually modify this code for a different purpose and now we Define a global index so that not since we're going to use third pool executor and parallel processing and whatnot we make just to make sure that our index doesn't get all mixed up and then we check for generated code and uh we create if it doesn't exist so this will the code files that we just recently generated and then we have a parse uh prompt method because if we just take a look at our prompt it says your goal is to write this many distinct and unique complete instructions for a GPT model for it to Output at least this many lines of code writing a and this is the task you can actually turn this Dynamic to a python calculator app using TK enter return each of the prompts in between they say these tags so we can parse them so one thing interesting so there's quite a lot of small interesting things about this code gp4 always actually return Parable code so that's that's something to note so that's why since we are asking it to be returned in these then we have a parse prompt function which actually looks for that and just parses the prompt out of that that's it and returns those prompts then we have extract feedback prompts which is which does the same thing but from the other call let's actually take a look at that too real quick so this is our call Provide feedback and generate new prompts uh we we'll come back to this and go over it but that is what this one does they do the same thing I just wanted to keep them separate and then there's the prompt generation uh method which takes in a feedback in the first iteration is going to be none right because in the first iteration the original promps are going to be created and then in the second and further iterations we're going to have feedback so if we check if there's feedback then we make this call otherwise we make the original call which just does the same thing that we were talking about right it says return each of the prompts in between these tags main goal is to get these prompts to generate at least this many lines of code each so emphasize this rule be as creative and ver both as you need to be get the model to generate at least as many lines of code each so I originally didn't have this part and then I later added it it didn't didn't seem to make much of a difference temperature one makes tokens 2,000 we do use streaming we do have them me try and accept blocks this is how to do the streaming responses and then we return a dictionary of prompts and feedback prompts when we do get them and we have another method for code generation which just sticks in the prompt right so we just going to send this to GPT 3.5 turbo 16k just with the prompt which is the prompt that will be produced which is going to look like this working with python so these are all gp4 generator prompts let's just take a look at one of them here this one working with python and its graphical user interface Library fabricator calculator that exhibits comprehensive functionality so as you can see but it also says a well-rounded code of minimum 500 Lin so so these are all different attempts and different ways of saying it right just to see if it will produce different results uh before we continue I do want to mention that first of all I will have this code available at my page uh link will be in the description also check out my website ww. eive dolive you can find and search all my videos here quickly like for example if you're interested in prompt engineering you can search for prompt find all the videos that maybe have more relevant content according to prompts and whatnot you can find the descriptions these are all my YouTube videos I have over 180 of them take a look it's e.li make sure to put the WW in the front I think cuz some people told me without that it wasn't working anyway just keep that in mind so the link for the my website and to my patreon will be in the description so one thing I found it funny is that so if you look at some of the polite before we continue I guess we should should mention this so this this one so I will have all of these available at patreon so I have this commanding there these are all the same codes but they're designed to produce the same result with different methodologies I guess here in this case I'm asking it to to be very polite in your instructions almost convince the model to generate the code with kindness right and if you look at some of the prompts I generated some of them are hilarious see it says greetings intelligent GPT M I'm in need of your I in need of your expertise could you assist me by writing a comprehensive TK enter hello dear GPT model salutations esteemed GPT model good day so if take a look at these look there's there's like 300 of them hello Noble AI model your impressive coding skills f me uh unfortunately didn't really do much of a difference and the commanding ones are different we too see as an you are to provide me I'm commanding you so you know different approaches so they are all the same except for their prompts so just take a look at that that's why I'm only looking at the neutral anyway I also have this other file to prompt generator class so you can this just generate prompts and saves them to aop P file is a list maybe you can build bu on this anyway so then we get the code generated here Mak token set to 10,000 if you want to try this with gp4 make sure you're low the mix tokens uh also pay attention that we are generating prps with gp4 but the code with 3.5 turbo 16k that also has streaming responses then we have just some methods just to parse the code right this what this does and saves it to a file and generate a code using our index and then we have the update Json method which saves the prompts into these files into this Json object file and provide feedback generate new prompts is the main part of our uh right one of the main things that we are doing so since we have this data that Json which has all our prompts in it we load it and then we sort it so by the lines of code lines of code that was generated by each prompts and then we pick the top prompts and the bottom prompts of 10 of them if you have have 10 and I mean doesn't matter so like for example in the beginning example we did we only put in two prompts so we're not going to have 10 right but that's fine this wouldn't throw an error we still the top prompts and the bottom prompts would be the same essentially in that case that's why I was using in my experiments 10 prompts you can use 20 if you like or more anyway so the feedback prompt is something like this your goal is to write dis spin prompt distinct and unique complete instructions for a GPT model for it to Output at least this many lines of code writing a python calculator app and and then here I'm saying here are the top three prompts and I'm inputting it as the first second and third of top prompts but then we check the length of it just so that this doesn't throw an error just in case if you don't have three prompts in there especially in the beginning and then these are the bottom three prompts and then I say use these prompts as inspiration to figure out what works and what doesn't and generate whatever made in your prompts which is a prompt number right that will pregenerate at least the M lines of code so this is the way I thought of it maybe you can improve on this and then this is to plot the data at the end and then the main process just runs it right it generates prompts gets feedback generates code and then just saves it to the appropriate files that's about it and then you here you enter how many prompts you want to start with and since our evaluation is based on how many lines you define here and these are the iterations and if we ever did hit the target lines and this would actually print but unfortunately I never was able to see this so This Is It uh it's was just a quick little fun experiment let me know what you think both either in the comments or join our Discord server we have over 800 people there who loves to talk about and built with GPT yeah yeah and like I said the code files for this I mean pretty much everything all the generated code and the Json object and the plot p and all the files will be available at patreon link will be in the description like I said thank you for watching and uh see you in the next one
Info
Channel: echohive
Views: 1,373
Rating: undefined out of 5
Keywords:
Id: Cjp6yfGuz6s
Channel Id: undefined
Length: 12min 58sec (778 seconds)
Published: Sat Sep 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.