How To Improve Your ChatGPT Prompts

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone welcome to the uh sck podcast talk about AI business and comedy about D minus comedy if you like that type of stuff hit the like And subscribe button um we are hosted by two former googlers uh Joe is my co-host he's out doing a world tour or something I am joined by the awesome uh Jonathan my friend Jonathan thanks for joining us uh Jonathan was the head of community over at Kora he did Community work at slick deals and also as an Enterprise sales background really good dude uh but man of the hour here so so from cognosis AI he's been on the show before um cognosis is um it's like workflow automation service but also it integrates AI so the AI can be tagged in like gp4 or different types of LMS that come in and get your work done and I use it for doing new scraping information I think it's super duper good so suy thank you for taking time out your busy day to join us today I really appreciate it yeah no worries excited to be here again yeah so uh gosh I I tell everyone to follow your Twitter because there's always such great information on there and I feel like you're always bleeding edge on like when these new models coming out and the thing is though it's not like esoterically hey look at the mlu and this what it goes it's like no I'm putting these things into production and I'm seeing how they're working for my my use cases so um you recently you came out with a tweet recently says underrated Gemini 1.0 1.5 flash overrated GPT 4.0 um we really need better ways to Benchmark these models cuz LM Sy the chat Arena and ain't it stuff still stuff like cost speed tool use writing Etc aren't considered most people just use Top Model based on leaderboards but it's way more nuan than that um so I thought that was a really interesting tweet and then also you wanted to come here and talk about uh how to think about creating prompts for uh for your your startup or whatnot and kind of give a clinic to people explaining how to create the best prompts for your LM use so we love to hear where you like to go first you to talk about the prompt aspect or maybe talk about what you're seeing in these models yeah I mean I think we can start with where what I'm seeing in these models and I think right now there is is really hard for people to get a gauge of what the best model is they just kind of look at this leaderboard and I'm not sure how many people are familiar but it's a site LMS or something and on it they just like they put a leaderboard of like the highest rating models and they use like an ELO score which is like you know a pretty accepted score across a multitude of you know Sports and what I noticed is when GPT 40 came out you know opening eyes newest model um at first it was really fast and everyone was kind of blown away because they put it on the leaderboard and it like just you know blew through all of the other previous models so everyone was like how did they do this it's like 10 times faster half is half the price um and then you know as I played with it a bit more and more uh I started to realize like within internally uh running some tests and and evaluations just within our own agents it kind of sucked to be honest like it was kind of stupid um like it made really dumb mistakes that I only see from like GPT 3 or like you know the models that are not even close to the top 10 and I and I tested it more and more and I noticed that it had that issue and I was just like kind of confused cuz like you know how is this model and I and I would switch it between the old man you know gp4 turbo and then gp40 and I would try them and you know every time 10 out of 10 times one would fail and the other one would pass so that's kind of what prompted the Tweet was like I uh noticed that like it was a bit overrated at the time I think actually now the perception has changed like if I was looking online this week and everyone is like oh it sucks um but i' like to point out that I called it first um yeah and then the other part of that was just saying like Gemini flash was underrated just because it's a you know a million contact model it's from my testing pretty it's really good at test you know analyzing large context um and it's super cheap that's like the big one so that was the the basis of that that tweet and kind of I I had more written up because I just felt that we could do more to to the average person they could go to this site and not just have a single score but like maybe have a little bit more metrics like what is the best if I have you know 20 documents or what is the best for writing emails uh or what is the best for uh you know not yapping I think that's like pretty useful things for people to decide when they're picking and I think right now it's just like okay here's one make your choice you don't you know you can't pick between and you know there's just the iPhone there's no Android and and all the other things just like okay everyone just picks this one single thing right and that's that's a really good point because like I I've been saying I prefer chatarina over I guess you you me can agree that the well maybe I'm putting words in your mouth the benchmarks are probably like the worst worst place to measure like if these LMS actually functionally doing anything and I guess a step up from that is chat Arena the lmss what you mentioned but then you mentioned there's so many different use cases besides just chat that like how do you deploy these LMS to actually get them to get work done so there's some seems to be something lacking there is anyone working on like creating just a better way for us to analyze how these llms work or is it just going to be Case by case basis not that I know of I know I mean I know the LM whatever I can't can't pronounce the actual proper way to say they're working on things but I don't know if anyone specifically like has a product or anything around that I mean it would have to be like a free public cheap to use thing so I don't think so I think you know you have that chat score which is like you know people can kind of game the chat uh by like because it's based on people's preferences so like one of the interesting one was like I think I forgot what model it was but they would like format they like I guess in the prompt they would say like format your text with like headers and bullet points and and people rated that higher because they prefer that output so um yeah I don't know I don't think there's anyone but it's it's a pretty open area where people can someone could make a product and I would use it right good point um now let's get into prompts uh you said to a lot of people I struggle with prompts uh Joe has made fun of my prompts and you've been working on a lot of prompts maybe you can talk about how you look at creating prompts for LMS and what we can do to make them better yeah I mean I think prompting is one of those things where I think this is more of a developer thing I feel like U majority of people prompting don't have to worry too too much because the models are going to get better and better at understanding what you're asking um but yeah I've been prompting them a lot and I think it's pretty it's a lot harder than it might seem like I think it's easy to get like an output but when you're doing more complex tasks and you're doing more complicated things what I start to see is people doing like uh these things I call Super prompts which are just like seven paragraphs explaining to this model like what to do and and I what I noticed is that they they generally do not as good so they like they have so much to like try to figure out and because the models aren't good enough like a lot of the times you get like mediocre results so one of the things I talked about was how you could split your prompts into more workflows or flows where you could have smaller prompts and then based on you know based on what happened in a you could go to B or C uh so you can kind of have like more of like a just a program not a programmatic but it's just like a control flow where it's like oh you know um the user has asked me to you know read their email um okay cool now I will go and you know and maybe I'll ask him questions maybe and that could be a separate right I would like I wouldn't have a single promp where if the user asks you for an email and then do this this this like I'd be like okay the user has requested an email now we're going to go to like the email prompt where uh hey you know what are you looking for ETC so that's kind of what I was referring to is like splitting and breaking them up into smaller pieces will generally perform better than having one giant 18 paragraph prompt like it's trying to decipher what you asked it and then it's also trying to understand what the user asked it um so interesting and and you see when you break him down like that like a huge performance gain on your side yeah and the nice part is you can use uh Dumber model so you don't need to have Smart models trying to decipher uh like these longer prompts for example if I go to any model and I say can you classify something like you know like traditional this is like something that's very very easily done by like every single model doesn't matter how small right is this good or bad very easy is this uh yes or no um does this request talk about X Y and Z so you can kind of offshore those to small smaller models that are cheaper more efficient and then you could have this smarter more heftier ones kind of do the more complex reasoning where you know maybe there's like multiple steps or maybe there's like planning where you know um so that's kind of what I've noticed is there's a lot more performance from like products maybe not specifically like chat in chat out but like more like workflows where you're trying to do like complicated tasks interesting now I I know uh the companies are talking about their context windows are expanding for all different llms do you think it's kind of tricking developers and thinking oh the context one is much larger now I can just throw in more garbage in there and I'll be okay my prompt can be messy and it'll still work out well I think so because I don't think I actually don't think a lot of these models are really good at using all of the context like I think the most usable context that I found outside I haven't yeah like I haven't tested the new Gemini Pro with like too much contact um but I noticed like basically every single model that I've tried it's not good at fully internalizing all of that data in the context and like sometimes it'll confuse it so like you have like you know you you have this model and you could kind of say like you know it's Trends up with cost right so intelligence and cost the smarter the model the smarter the more expensive it is so what you start to see is really large context cheap models which is great right I don't want to pay $10 per query to use a model but maybe something like 10 cents is a lot more manageable what the problem is with these like less intelligent models is as you put you know 100K 200k 100 you know whatever tokens they start to kind of forget and they just kind of just lose a couple of IQ points so it's a it's a little tricky uh with like the the models these days because I don't think they're good enough to fully understand all the context so yes I could see how people could stuff it with like you know giant things and and expect it to perform but it for my tests and me using them it's really caseby casee basis and I've noticed that the usable context is more in the the realm of like 30,000 tokens where it can fully understand that interesting so that's about what 22,000 words or something maybe roughly yeah um I to political science I'm not that good at math I just brought that made that up um okay so one break down your prompts two don't be deceived by the context window um three I've heard some people say Hey you can sometimes ask the LM itself I'm trying to get this type of output from you what's the best way for me to ask you this question have have you seen that effective at all or what other techniques are you using to make sure your prompts are successful yeah that that one is pretty good one I'll usually go to like Opus Opus is like the best prompt analyzer and I'll actually ask it that I'll be like hey I'm trying to ask this like lesson intelligent model uh like where do you think I could clear it up and I notic like marginal games like it does help clear up the wording cuz I feel like uh we as people are bad at giving clear instructions I think it's very challenging so the models are really good at giving clear concise uh especially the smarter ones so I do do that I say hey like you know how can I make it more clear for you and then it'll spit out so that's one of the techniques um but yeah other than that I think you kind of hit the nail on the head there like split the prompt and sometimes go to the model a smart model and be like hey here's all the code and here's this giant jumbled mess like can you please make it clearer um and you know easier to understand and it actually does a pretty good job

Info

Channel: SVIC Podcast

Views: 296

Rating: undefined out of 5

Keywords:

Id: u_tAai43Ya0

Channel Id: undefined

Length: 12min 39sec (759 seconds)

Published: Thu May 23 2024