Prompt-Engineering for Open-Source LLMs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone my name is Diana Chan Morgan and I run all things Community here at deeplearning.ai today we have a very special speaker to talk about prompt engineering for open source llms as you all know your prompts need to be engineered when switching across any llm even when open AI changes versions behind the scenes and this is why people get confused why their prompts don't work anymore transparency of the entire prompt is critical to effectively squeezing out performance from the model most Frameworks struggle with this as they try to abstract everything away or obscure the prompt to seem like they're managing something behind the scenes but prom engineering is not software engineering so the workflow is entirely different to succeed so today in this Workshop we'll learn a lot about prompt engineering versus software engineering open versus closed llms push accuracy by taking advantage of prompt transparency best practices for prompt engineering open LMS and Rag and how to implement rag on millions of documents this Workshop will be recorded and the slides will be sent out afterwards it is inspired by one of our courses that we have made with a speaker for fine-tuning llms our event partner today is lamini lamini is the all-in-one open llm Stock fully owned by you at lamini they're inventing ways for you to customize intelligence that you can own and now I'd love to introduce our speaker Dr Sharon Joe Sharon is the co-founder and CEO of lamini as a former Stanford faculty member she led a research group and published awardwinning papers in generative AI Sharon teaches some of the most popular courses in corsera including fine tuning llms and has reached a total of over a quarter million profess professionals she received her PhD in AI from Stanford advised by Dr Andrew Ing and before her PhD was an ml product manager at Google she received her Bachelor from Harvard in computer science and the classics finally Sharon has served as an AI advisor in DC and been featured in MIT Technology reviews 35 under 35 list we're so happy to have you here today Sharon are you excited to dive into everything so happy to be here thanks for having me again of course uh excited to have your little llamas you know stick out and help everyone outn from them may have been making cameos throughout certain deep learning short courses so stay tuned amazing well why don't we get started I'm sure everyone is super curious to learn everything that you have to teach today and for anyone that has questions we drop the slide in the chat uh where you'll be able to vote upon which questions uh you want to hear from Sharon the most all right take it away Sharon fantastic yes and please ask questions um yeah very very open to that and uh especially if it's you know a little bit spicy I can want to see some of those questions um cool so I'm Sharon I'm the co-founder and CEO of lamini um and I'll be talking about prompt engineering for open source llms today um as the title suggests uh prompt engineering is not the same for for closed versus open llms and I'll go into exactly what that looks like um and uh you know this this talk was really motivated I know Diana gave a really good intro on me but um this talk was really motivated uh by you know the pain that I had both seen and felt myself and I just I just don't want other people to go through the same struggles so um this is to also step through some things that I've learned throughout this journey and also I've seen a lot of people a lot of smart people um kind of mess up and and spend you know too much time on like too many months on um so I I believe in bringing people on the right path um path to success of course this is uh you know this is these are my learnings um of course there are a few few hot takes in there and you'll see them very soon um so this this should be a fun talk um okay yeah my background I think that was basically covered um but I was also the first generative AI instructor at Deep learning um so I love I love generative models I find them really really magical I also love people and so my mission in life is actually just to make it possible for more people to be able to wield this magic um that I that I basically fell in love with during my PhD cool um so just to go over a little bit uh you know hierarchy agenda of what we're gonna want to what we're going to cover um so first is you know how how is an llm you know how it was you know fine-tuned or what was going on behind the scenes um behind the both API but also before you started touching that llm what was going on beh before that that has now impacted the way you prompt it so what goes on behind the scenes does impact your prompt and I'll go over exactly what that looks like um and you know there are you know different llms you know and they actually have different prompts that do better on them I see people saving prompts for gp4 for example and thinking that'll generalize to other things it other models and that's not the case and actually um even different model versions right like people have felt that uh deep pain around that people think it's you know oh where is this backwards compatibility which is a normal software Paradigm um but you just don't see that here with large language models I'm and I'll go over that a little bit um something I want to hit home also is that it prompting is not software engineering and I I kind of see people really struggle with this because it's called prompt engineering um I I think that was poorly named uh from my perspective I think it's closer to actually running a Google search query where you're iterating over that query and I'll just talk through exactly best practices um kind of what I've seen um finally prompts are just strings I really really do not want you to over complicate this I'm going to like be super spicy about this and slightly snarky about this during this talk but promps are just strings um so if there's one thing that you take home is that prompts are just strings and you can handle a string I guarantee it um okay uh and finally rag is a form of prompt engineering a lot of people get that confused they're like prompt engineering versus rag versus fine cheating I'm like no this this is a form of prompt engineering I'll tell you why it's because it just impacts The Prompt you're concatenating strings to a string um so again do not over complicate so much and I see people over complicating way too much so um this is kind of uh the agenda we'll go through um who is this for I mean I posted this a bit on Twitter uh this is largely for awesome really smart software Engineers um so I hope a lot of you out there are are this archetype you're extremely good at software engineering and you're curious about working with open llms um and it's very exciting time to do that and there are many many benefits to working with these models that I can get into but you know I'll just touch on um a few that are relevant to prompt engineering uh in this course um and who's is not for well this is actually um not for people basically like me who's trained thousands of generative models and proposed new architectures because um and train them successfully um and and that's because uh that process helps you understand why the prompt acts the way it is and why you have the prompt the way you do um and uh I'm kind of my view is you shouldn't have to go get a PhD do all this stuff to understand this this should be this should be made more simple and I'm going to try to go through how simple it is through this class so it you know I I just want this to be simple I don't want you to go spend so much time on a PhD I want you to be able to listen to me at 2X and get it done in 30 minutes um and be entertained Okay so let's start with a memorable analogy this is really starting off really spicy um so I was thinking about this last week I was like how am I going to communicate this anal you know what is the best analogy and I apologize this is my best right before I fell asleep so um you go out every day you're wearing pants okay everyone sees you wearing pants you see everyone wearing pants you're told it's right to wear pants like it's the correct decision to wear pants um you are a good person for wearing pants okay like you're just a good person every day you act normal because you wear pants and you see other people wear pants you told this right to wear pants you told this correct moral even um but then one day you're not wearing pants okay you go out you're not wearing pants my question to you that I'm going to pose right now is do you still act normal right do you still act normal okay and this is like maybe like an extreme an analogy but this analogy is llms have the same wear pants feel so like LMS also wear pants okay um and that's via their prompt setting and I'll go into exactly what that is um and remember that a prompt really just is a string right it's just a string um so a string you know a lot actually matters for a string um sometimes it matters more for the llm based on how it was uh fine-tuned or trained than it does for us but even for us I think even small you know edit distances within a string can make a big difference for example here's Yan laon's recent tweet of open space AI versus open AI I know it's very spicy I'm just making this very spicy for you so um so you know very different words different words um Different Strings uh only by a little bit of edit distance um so wearing pants is an analogy for a prompt setting and every llm every version of an llm has a different prompt setting maybe not like like entirely different but it's been adjusted to learn a certain certain notion of wearing pants and okay let's just go through a real example all right m one of the great open models today very exciting model um okay so let's say I want this prompt I want to put this prompt in respond kindly to the child I really hate zucchini why should I eat it okay to be fair the I really hate zukini why I eat I actually took that from my co-founder who's awesome but an adult um but I just say respond KY to uh this child uh so like what do you expect you expect it to respond kindly to the child right uh okay well let's see without pants what's going on so without pants it just like goes on in the third person it's important to remember that everyone has food preferences and is normal for children to not like certain vegetables that's just not good it's slightly off right and we'll look at examples where it's way off um but this is just a simple example where it's definitely off um and just just if you were to just let the llms or you put l the pants on the llms it actually just responds to the child and um so I understand that you don't like zucchini and it's okay to have food preferences however it's important to remember that eating a variety of Good Foods um a variety of foods is essential for good health okay you know it's not perfect right but it is it is better um and there is a shift immediately and I'm as you'll see soon uh putting on pants is like super easy just like it is for you it's easy to do for the llm as well again strings um okay same thing goes for llama you're a health food nut um I'm drinking green juice um so you're just like putting this into chat GPT what do you expect you expect it to be like oh uh you know like giving you advice Etc um without pants unfortunately llama 2 here just like continues a sentence right um and from my perspective I'm like oh shoot you know it's not instruction tuned maybe you know I'm a little bit you know judgmental about it it's like I'm also feeling a little judgy okay okay it's just not good this is like kind of it gives like this off the rails feeling and I know people are using llms at scale and so just this difference where now it's able to respond I know it's I know Lama is very very expressive it's kind of hilarious but um it's very expressive but this difference makes a huge difference at scale as well right like when you're using these llms um so let me just share with you exactly what pants mean okay so every LM has a different notion of wearing pants um you'll see soon that you you can actually have control over what the pants look like but um you know at a very at a very basic level every single LM has a different different notion here um and for mraw it kind of looks like this for latu it looks like that um and what is what is exactly Happening Here well you know these are these are called metatags so if you go to um a party with uh opening eye people and anthropic people you might hear them talking about metatags um and what a metatag is is like here um these are tags uh you know inside of your prompt that um indicate something you know about what the user put in okay um and for open llms this is completely exposed to you or transparent to you or available to you rather um unless you framework so I'll go into that later uh but uh this is possible for you to just see and use um uh and yeah let's actually you know you know what I want to do I want to dive right into it so uh I think they can drop a GitHub link to you I'm gonna share my screen and just show you some code and just just walk through exactly what what do these pants look like okay um so I'm going to share a different screen okay okay I believe I share the wrong screen unless I didn't no I share the right screen I apologize cool um okay great uh so this is an open source Library super basic stuff okay it's like oh man just 50 lines of code 60 uh so really simple stuff all it's doing is I am um running a couple open models so first I'm going to run you know mraw um with you and then I have a few proms here kind of just like the green juice one that we talked about the zucchini one we talked about maybe a couple others uh and these are these are actually real ones I just grabbed from my own history so like I I did not spend that much time on this but it pretty appar it's pretty St difference between um using the putting pants on or not um and so all I'm doing is iterating through um printing what the prompt is uh letting the model you know generate based on that prompt and then I'm adding pants on right pants on here um printing out the pants with it and uh letting the model run it with you know with pants on um and then comparing them same same thing with llama 2 here um doing the same thing uh and here there's something you know kind of fun with like uh the system prompt and the user prompt cool um and you'll see here I I do you know some kind of this is my own personal convention and again these are just strings like all you need are F strings and these are just like 30 Max 30 extra characters um that you need uh so I I just don't understand why people need like extra Frameworks around it um okay so here let's let's focus on this guy um and I just ran it up here but I actually I'm just let's just run it live why not um so let's do this let's import um this is a class that runs different models and so I'm just going to run M draw here um and which one so I know we kind of looked at those other two so I'm actually going to do the code ones because I actually thought those were really interesting um since they made a massive difference and I know that people are are kind of working um with code uh so if you were to do this here let's see let's run the model with just okay oops um let me just say prompt equals this okay um and then I'm going to run the model so this is how you run the model lm. generate okay and just that and no pants on okay great and then um I don't like looking at new line symbols so I'm just going to run print all right so totally off the rails um great uh so vs code this is something again I wrote into um an llm so previously so like how to find code that's been checked in a long time ago EG more than six months ago yeah I was I was trying to figure that out um but then here is just a kind of a disaster like what is going on um I have a large code base it just starts to think it's me and then this get log stuff it's like what is happening right all right so let's put some pants on this guy let's see okay um and actually uh just one note here I find it really interesting that um uh and people don't know this as well but like really interesting that you know just a single space here this matters a lot and um just like a fun historical note back when uh gpt3 was out but no chap GPT so like this was uh Circa um this was Circa 2020 and I was playing with these models it was so interesting because like you could put you would end your whole thing with a with a space like imagine like if you ended your question with to chat ifpd with a space it would go completely off the rails it'd be no pants on already so like very sensitive now models are way less sensitive including open uh models so it's just it's kind of exciting to see it um but before the the pants thing used to be more extreme let's just put it that way okay so let's let's actually do this um but yeah so I I just found that like historical note really interesting we can even run it with and without a certain space here like if you want to run it um okay not here but like if you want to change what the pants look like oh sorry I need aint um uh if you want to change what the pants look like and you remove space It should give you actually a slightly different string so just know that these things are sensitive to the strings it's less sensitive semantically you know if you were to embed everything but um it's still still going to give you something a little different um cool and happy to go into the machine learning reasons why but just so you know it is different uh so yeah here much better wow in Visual Studio code you can use the built-in get um get integration you can do all these steps um you know it's possible this is actually not totally true but like the this is this is like a valid response like I I I would be like yes I am I'm talking to someone who is wearing pants is kind of my thinking um great okay uh so you can run a bunch of these also on your own um this other one uh uh this example Json file I also um was running this on my own this was just kind of um I I literally this is like literal code I have I've written and I need help with um and it was a dramatic difference and maybe I'll just scroll up to look at it for you um so like put the script just so you can see what the script actually outputs um so it would be something like let's see so like here prompt you know this is adding pants on then it's actually like producing code like legitimate code um versus when you don't it doesn't do that I can actually just run it like that for you to see um but yeah I just find it just interesting that I think people miss this and then they think these open models don't work um and that that's actually really unfortunate because they do work um they actually work very well um and you can get like a just a massive performance bump just from knowing this um so this is what the exact script does um in the open source repo so you can just say you can see the prompt then you can see the prompt with pants on you can see the response from those and yeah you can see like like why are you saying a colon like stop um uh I I actually know why but it's fine um uh let's see so yeah this is off the rails this is better let's see um great okay cool cool um here I'm going to go back to the slides I think the point was made um but feel free to play with your own there are definitely going to be things that are a little bit more sensitive things that are going to be less sensitive but just something to know um if you're if you're using llms that way great cool so the learning is you didn't get it the 10th time is LMS want to wear pants okay so like give them the pants to wear um and you might be wondering okay well why didn't I know this why have I been basically abusing my llm and not letting it wear pants well um it's actually pretty poorly documented it's not really in a lot of official releases um it took actually our team a bit of time to find uh as well um it's just not it's also when it's not official you're like is it you know is that the right one is the one that like because AWS was like a partner with meta llama release like is their blog right so you know a lot of questions there um I've also worked with folks who you know they this is terrible but uh they were like oh our model just so much better than the base model and I was like well I got better base model results than you and it's because they got the pants wrong and I was like so your models not actually better than the base model um so that's it's important to know that um this is not something you need for Clos LMS I will uh go over I'll go over that um with you in a bit um you can argue it's implicit lock in right so they like manage it all for you um I think it's also so that they can push out new versions and I know that people don't like it when they push out new versions and update things um behind the scenes because it's not transparent at the same time the argument is if you were to create your own you probably would do the same because the software expectation is you don't want to hip andall upgrade you know every five minutes um when these models are constantly learning um so there is a trade-off here to be fair um and you also thought your framework could handle it correctly and a lot of uh Frameworks don't uh and I've been I've been guilty of of doing that so um just getting a you know the string representation right isn't always the easiest but it's you know making it transparent I think can help you get just a few extra characters in 30 extra characters um so yeah on to uh why it's different from clothes models so chap gbt you can text chap gbt you don't need to add pants to it you know it just it chooses to wear pants behind the scenes or open AI putting pants on behind the scenes um so uh this is actually you know what the API looks like you are putting in a I would say a dictionary here but mainly this string of content right and what's happening behind the scenes this is my guess this is not actually it I mean if this were that' be kind of weird but uh so like maybe there's a role thing going on here maybe there's a message tag there right but there there's some representation um within the string that they're doing behind the scenes and you know if I were them I would probably make those representations single tokens that they've already they've figured out kind of like an endless sentence token um so that you can like represent it uh very minimally in the prompt uh to be Tok efficient and so so this is a guess I don't know what exactly what it is right um but this is kind of what's going on behind the scenes and this is why you don't think about it when you're working with closed llms right so you just you don't think about it um and if you're shipping your own Clos llm like uh this is also something for you to think about like this is a tradeoff right for your user um uh on the one hand it is kind of gnarly to look at those tags um for like non engineer non- software engineer um and you can't wrap it in a beautiful interface but on the other hand it does help with working with them and I think there's um real benefits to understanding that those exist and that and what they're you used for right why why you need them um so uh I something I just want to hit home is at the end of the day though all your all you're adding to the string you're like oh wow great chat GPD handles it for me well you know like on the one hand yes it handles it for you but on the other it's like it's just a string it's like 30 extra characters you could do it I believe in you I believe in your ability to to be able to concatenate strings um and you know I was trying to look for you know what level is string like working with strings it's actually a kindergarten level this is from scratch so like no excuses if you're a software engineer no excuses for not work being able to work with strings um okay great so let's see um 80% of Frameworks over complicate the string though so I said do not over complicate the string 80% of Frameworks get it wrong um it might be a few characters off it might be completely buried somewhere so this is this is Lang I know I'm calling him out but um I'm actually guilty of it too so I shouldn't just be calling out other people I've been guilty of this too um our team kind of built a uh kind of type system to make it easy to work with llms where we like converted your type to input output types that was very pantic using pantic very pythonic I and you know you know some a lot of software Engineers are like wow was awesome I got to work with and um llms dis way uh but to be honest it was just totally over complicated and we spent like months ripping that out over time slowly uh and so I I just don't I don't recommend that and so that's my learning from the pain that I've seen um and I think there's um like I don't think it's intentional I think it's to you know help with working under normal software paradigms um and I I think that's um I think that's valid I think there are better and other ways to do that where the prompt doesn't get totally buried because at the end of the day you're going to maximize performance when that prompt is transparent um and you're able to verify as the person running it that that you know all the characters are in the right places um and if we were to go a little bit more advanced I would tell you ways and ways how to Veer off of those pants and maybe you know change the color of the pants to follow the analogy but like this or where wear a skirt um but yeah so it's uh it's I think it's important to keep the prompts transparent um especially for open llms um cool so the learnings keep the problem transparent wow I'm so ahead of myself uh so make them easy to change um because every new llm you're going to change them um that's for every new version every every new update um I don't know if this is the case now but for a very long time on hugging face new versions of models and this was just last summer new versions of models would come up without people saying anything without the Creator saying like oh we pushed a model to the same to the same name so it's going to be the same one it's just uh so yeah there's no backwards compatibility on that when when when someone else is owning the model um so just note that that does happen it's it's the same thing as opening eye changing versions without changing the name so um just just know that um so prompts are are the highly variable part of your workflow so I want to make sure that you understand why this is important like because it's so highly variable it needs to be transparent and easy for you to access to be able to iterate on it constantly um and it's not a pain it's a string so like if it's it's not a pain in in the in the butt to do this it is a string you can do it and um it actually gives you a lot more flexibility in terms of working with these models okay so um so we've gone over a few different things uh I I feel like probably what's on your mind maybe is um who decided how LM should wear pants like maybe you saw okay well clearly openingi decided some things for their models um and maybe meta and mrr decided some things for those models um but who gets to decide and uh I'm extremely I extremely opinionated on this um I think I think everyone should get to decide at some point um and I think it depends on your application um and you know it depends on like what level you want to be able to make that decision at right like maybe maybe do want to make it at the same meta level but maybe maybe you want to make it at a higher level right like another you want to add another prompt setting you want to add another notion of wearing pants for this llm I and the way to to do that like the the you know standard way of doing this is fine-tuning llms um I also do teach that class uh and I plan to um teach a class that goes into that more more in depth for software Engineers um but essentially that's how you can decide the prom settings because it shifts the knowledge and behavior and you choose what those meta ts are um and uh who decides you know this is why enclos llms that's why you feel like your prompts are breaking or not backwards compatible it's because um someone else is deciding how to wear the pants and things just shift a little bit for every update um there are actually infinite prom settings you could Target infinite and they're all just represented in strings so it's actually not that hard um so we looked over chat so those were chat templates essentially um I call I call them prompt settings because the word the the phrase prompt template is just so overloaded and used for other things as well like not Al not for this um um I've seen chat template I've seen um I think mrr released something with or hugging face together with hugging face they releas something uh uh to help with managing it and I I do I I do think when it's very simple and you can verify it it's okay um but you can also you know produce Json outputs uh right um You can function call it's all fine these are all PR settings you yourself could Target and and control um oh I added this one because I thought it was fun because one of our uh customers um was kind of fine tuning a model towards menu items because they're extracting menu items from a conversation that they're having uh if you imagine like a drive-through window kind of thing so it's kind of fun so you can do anything like what I'm trying to say is like you can do anything not just like the standard stuff right so like whatever format you want um so you can uh use fine tuning and and what that does is have your model view another setting as wearing pants or you can have it forget a previous setting you can you can do whatever you want basically so um kind of a powerful tool to do that um I kind of want to add this like little tidbit of something a little bit more advanced uh for those um who are watching who are a little bit more advanced on this um you can actually get a model to think it's wearing pants without fine tun in called this ERS new clothes if you know about the um uh myth or whatever it's called story uh uh and we I actually built this our team built this and um uh we have this custom inference engine that is able to take any model and without any fine-tuning be able to actually produce guaranteed Json output and so you can actually basically Force some pants on it doesn't like fit really well or it's or you kind of trick it into it um uh but you can get guaranteed Jon Alpa so this means when I say guaranteed I mean like 100% like it is not like it's it's not even probabilistic at that point so yeah um cool so the learning here is that you know maybe you're like H I just don't want to do with this and lazy you's like h i I just need someone else to manage 30 extra characters or like 10 or something okay whatever I'm gonna stop ranting about that we're curious you is like I actually want to control this I actually want to understand this better I actually want to post performance um it's okay to be lazy I can be lazy too um for different use cases you want to be lazy like if I'm just like random asking a question um in other use cases where you're building your own maybe for um your company or your side project Etc you you you do care right so uh depends and usually I transition between I start lazy for everything I do and then I move to curious uh so okay so prompting um one strategy for prompting that I just want to make sure we hit home here is around iteration um so for software engineering you know how do you be a good software engineer okay um you create a good design and you implement it okay and the a lot of your time is in creating that good design right um and then implementation like some people are like oh get a co-pilot not going to replace me because most of my most of my job is just designing architecting right the code um uh the system before I write the code for it great um and 80% of devs I meet are actually prompting like their software engineering so I actually find that you know some random person off the street might might be I'm not saying actually is but might be better at prompt engineering um and I find that just so interesting because uh for prompting uh there's no such thing as designing the right thing you can't take a class and be like can't take this class ready you can't take a class or this class or any class and say like I'm going to do exactly these exact things that they said because at the end of the day um it's a very very iterative process um you're not designing it right perfect and then implementing it um you actually should just give something bad you should actually be lazy you you should be lazy so like you should give something bad just like when you're Googling you know how you don't even spell things right sometimes um just be lazy and like iterate oh oh Google didn't understand what I said it didn't do the did you mean this it like totally didn't understand the random keys I I pressed great so let me like iterate on that make it better make it better um and so that's the mentality you have to go into to the whole whole um whole thing with prompting and because it changes so much across different models versions Etc um you want to be able to have control over that right you want to have transparency over that and you want that to be easy to access um so there's constant prompting you yeah exactly what I said um a good enough strategy so uh I I see a lot of people um plateauing um and having trouble realizing that the like the best strategy might just be to time box it um and to really realize you get diminishing returns and I think the way you do that is prompting iteratively to learn that like naturally you'll just you'll just do this um so figure out the time it's probably one hour figure out the time though um so and and invest that time into it and then get diminishing returns after you know a hundred iterations in that one hour okay um so just like you're Googling like really not anything different don't spend so much time just designing it or like watching all these videos to design it um and again because the promp settings can change like things can change it's actually it's good to learn this skill and it's good to make this very easy on yourself um and so yeah I I just think it's make it make it super easy in your code base when you're working with models um to change these prompts um because you you can't leave it stale um unless of course you're controlling the models themselves you're fine tuning themselves you're checkpointing themselves and that's actually one of the big benefits of open models right so um you can you can control one the vers versions upgrade uh so the learning is prompt engineering is neither software engineering nor is a machine learning I can also guarantee you it's not also not machine learning um despite all the paper Cays of like people publishing prompts it's it's just not um yes there are some fun things that you can learn about the prompt where you can think step by step for example um but it's not machine learning um so this is all the stuff you put into the string before um before you actually run the machine learning part before you actually run uh the model uh so again just string manipulation kindergarten scratch uh cool let's let's move to rug this is also something I um find people trip up on they think it's a completely different um thing from prompting uh but I think it it actually is a subset of prompt engineering it's a way to prompt things um so rag is is search it's information retrieval it's been around for decades actually longer than way longer um and speaking of like here you go prompting like Googling Google is the best at this at search um and so there are a lot of good practices a lot of best practices that you can employ from Decades of research and also Decades of um real production deployments um it is at a very basic level and I want to again hit this home because I think people get a little bit intimidated sometimes or they think it can do all these magical machine learning things when it really can't um it's just concatenating more stra to your string um so you know don't over complicate it and also at the end of the day it actually is good to have transparency around this I see people running big workloads without any transparency without seeing their prompt at all when you know they bragged it and added um relevant content to it and they could have been debugging there at that level they could have caught so many bugs right uh so um or ways of just representing the data differently adding adding a title adding a header you know some some other meta information within every single chunk um so don't over complicate it like you can do it you don't need a PhD machine machine learning for sure because it's not even machine learning um so great uh so I put together a really simple repo for us um uh to go over and kind of the goal of this I say 80 I think it might actually be 70 now kind of proud of that but uh so uh the goal of this is is to make it look simple um but also actually just see how simple it is uh and and yeah kind of go through what would a this is I mean obviously um kind of like what a machine learning person how a machine learning person would go about it because I'm using Fe um but essentially how would I go about doing rag um so I'm using F this is an incredible Library by um Facebook um or meta um and it's been out for uh a really long time I should have checked exactly how long but way before chbt um so it's for similarity search um it's kind of their embeddings Library uh to be able to handle embeddings well um I'm importing some standard libraries I'm also import um importing uh our embedding API to be able to hit embeddings really quickly and also being able to run a model um on the uh outputs of something in rag so like after your whole prompt has been basically compiled created a with rag be able to run the model there okay um so exactly what's going on here um so I'm just going to set through code uh so that it all looks transparent and again 72 lines and this part is just running it so um cool uh cool so loader this is just loading data I'll I'll show you what that looks like um build Index this is an important um function to understand uh so really it's grabbing um different pieces of content uh inside of your data so maybe you have um maybe you have a txt file you know with data in it um and you're loading it up so let's say I know there's another file maybe it's cheating for the 80 lines um but uh all it's doing is chunking and loading um so loading files I'm putting the chunks and batches just to be more efficient and then chunking the files and uh chunking chunking um each file cool um and then you iterate through that these are batches right um and what you want to do is just like get embeddings of them uh and you get embeddings of them so these are vector representations AK just like numbers representing um the semantic understanding of every chunk um is it perfect no right it's not perfect it depends on what you give it um maybe you can give it extra information so it can represent more um I know there are a lot of techniques about you know what best way to do it um there are a lot of different indices you can use um and you're welcome to go look into that um they often are trading off uh things like this is this is a really simple one for uh for our purposes but um uh they're often trading off things like you know memory or efficiency versus accuracy um so a lot of different techniques there that have been around for decades um and that people are using today um so uh and I think like the best ones are not being invented today the best ones were already invented a decade ago so um yeah so or most you know um most popular ones certainly um so I'm I'm putting that into that index adding it there and that's all I'm doing and then here all I'm doing is extending um this uh list to include the actual content so that I can retrieve it easily that's it so I get the embedding it's basically a representation between the embedding here um you know semantic understanding of of um of your chunk of uh data and then being able to actually get the data associated with it okay what's going on in vetting all it's doing is similar to calling a model um so you can do that this is probably going to be your biggest um bottleneck in rag it's not going to be building this index I don't think um uh these models are getting larger and larger of course you can get away with running smaller models right um so uh I think I think throughput here matters um and so here you get um over I think 50 QPS so that means like 50 chunks per second um can be run through this um and being able to do that quickly and efficiently is um how you get uh how how you basically can represent it I know people are using um uh vector vector soures right um uh I don't know how I feel about that long term but um the the thing is um uh unless you unless you need more than a terabyte of data I feel like you may not exactly need to actually have that store um because at the end of the day you can fit into the model I mean the model was able to fit uh the internet inside of it cool so um query exactly what's going on here so I'm getting the embeddings um this all I'm doing is representing it as an array to be able to feed into this um index to be able to search it and grab the right chunks and here it's like K goes five for this particular type of index you know it has a um a parameter K which tells you I want I want the top five essentially here uh great um and there are lots of different methods right to tradeoff between you know I want the top five but I want them to be diverse five right like I don't want them to be all the same um since there's a lot of repetition in my data set um so things like that um and I I think like yeah we're just scratching the surface but it's it doesn't get much more complex than all those things um query engine so this is just I'm breaking the code apart in kind of best practices of how you usually do information retrieval so there's um this index that you build and then there's a query engine um and here what the query engine is doing is it's running a model right so like first um you you query the index and then I'm putting it now through the model um before it was literally just this before um back in the day before we use models it's was just like let me get my closest um my closest documents and then Google surface the closest documents now you can put some close documents into your model and be like can you you know English ify this for my user and actually uh reason over it and so that's very exciting um and again I'm adding pants here just to hit that home uh and finally this is you don't have to necessarily have an extra class here but um just make it easier so a runner where you can load the data set K be able to play with that number um be able to you know training quote unquote um be able to build the index during that and then calling it um with the query engine um so what that looks like is uh just instantiating that class I'm sorry every time I uh hover over it shows extra stuff instantiating that class saying hey grab everything in the data directory I'm just timing it I'm just kind of curious how long it takes do that training which is building the index um some indices you actually do have to train um just just know that um a little bit when they want to like find um some centroids not exactly centroids but um yeah uh and then uh here all I'm doing is making it easy to um talk to the model in this format oh and I already ran it so it's kind of what it looks like when you run it great um so ask an investment question sorry this is like super dummy data um so it's grabbing chunks here let's see great okay cool um and then yeah the model's able to produce it there and then you can keep asking questions um whatever you like um yeah and I think what's important here is that it is simple you should just be able to run this on your own you should be able to swap different models you should be able to swap different ways to embed or different indices um that should be really really simple and uh I don't see why anyone can't do that especially with um Library like fece I think that's makes it really powerful um and then of course um some kind of good embedding model and I know that people are experimenting with the lot so that's an area where you can experiment with a lot as well um okay okay cool and this is open source so feel free to to play with it do whatever you like with it um yeah okay and back to the slides cool cool okay so you saw that um I I think I probably wrote 80 lines CU I was like it's 80% of the way they there um but I really do think it is and I'm not always convinced that the last 20% is worth it for some applications right like some applications maybe yeah you just add a little bow to it and it's it's done um but I think you should be able to max out in like what I see is like four four hours if you know how to actually use use this technology um so the idea is don't don't over complicate any part of prompt engineering you're just working with strings including rag actually you want to be able to print and see those chunks right like maybe you saw those chunks and you're like actually that's not what I expected um or maybe I can represent different chunks maybe I can add a little bit of extra metadata like you it should be pretty obvious when you're adding that to the model as a in the string our representation what what what a model would do with it um especially if you've been iterating hundreds of times on uh real prompt engineering if you've been taking that actual approach um so if you take that iterative approach you'll start to realize like very OB obvious ways you should be adapting your data for it and um I've seen some um cool techniques of running uh or pre-processing data essentially through other llms like a pipeline of llms to be able to um represent it in a different different way um so feel free to also play with that but it should be obvious from iteration that you um like what extra data you might want to include in it um and I've seen people put like oh I need to put section headers in here because of course like how does it know what Clause 8 means um what was part of you know this this document or I need metadata on the the document I actually pulled from I'm pulling from multiple documents you know so that type of stuff um so it should be it should be pretty apparent um and uh you don't need to read like 10 blog posts or like use a framework to get it done it should be it's a string um okay so test your assumptions does rag actually get it right U might not okay um does Google search to get it right all the time no that's why people are using chaty just kidding um so yeah so if I mean if you put in that much effort you can actually keep going get Google level at at search before the llm um you know I personally don't recommend that because I think there are so many more things you can push on the llm side like just just get it to work when your retrieved content sucks like you can actually make that happen just like stop retrieving things you can actually just put it all into the weights of the model it learned the internet why can't it learn um a few more documents or millions of documents and your biggest uh bottleneck is um that embedding call right that that is your biggest bottleneck um so being able to get good throughput on that I think is important um and so uh at least here like to help with customer loads we've been pushing on on that like dramatically um to make sure it's possible for people to actually run heavy heavy loads and and heavy heavy amounts of data through it um okay uh and uh yeah I think Frameworks um you know often make it harder so if if a framework is like ma making The Prompt more opaque and whether that be the prompt with or without rag right um that's not good that doesn't help you especially with open models um especially from moving from a close Source model that you've been playing with to an open model and you think like oh wow all these claims that you know M mix draw is better than um trag BD in some ways don't seem right well maybe you didn't put pants on you know like maybe you didn't do brag like maybe you haven't been iterating correctly um and uh I I generally believe is always a user fault but this is something that I've just observed especially those who come from a software engineering background like myself cool so what are the key takeaways um of this talk uh it's one you know keep your llms pants on like and now that you know what it is now now that you know what it is keep them on um and that's how you can get normal or expected output and performance um it's because it's been fine-tuned or train that way like meta and mrol actually train their models to learn those those tags in that way and so it also expects to to Output in that way um prompt transparency this is your biggest Advantage it's easy is just strings don't over complicate it um it's just strings just strings and make sure that when you're organizing your code your code base wherever you're working with llms that you do get prompt transparency and that you do manage your prompts in a way that everyone can touch it easily um and find it easily and print it easily okay so like very important and that should become very evident if you're iterating over the prompt and so those go hand inand all of this just all of it goes hand in hand um because you will keep the pants on if you if you um uh keep it transparent and make it make it that and actually iterate over it um cool and then uh Pro approach it iteratively like I was saying um it's like Googling something um start lazy start bad uh and just get it better and just time box the time you you actually uh invest in it cool um and I think that is it oh feeling more advanced just kidding there's one more um feeling more advanced than strings uh yes you can customize your own problem settings uh with fine tuning and choose what strings um are the pants for the llm great okay and that is it thank you so much for listening um I will take questions and or or maybe I was supposed to listen to questions over here uh perfect well thank you so much Sharon I know our community really learned a lot and this has been a really informative session I do have some questions prepared I know there's going to be so many from the audience okay yeah no problem we'll start with the first one and see how many we can get through so our most top voted one was should the system prompt of an llm for a non-english app be written in the target language or in English and ask for responses in the target language okay I'm going to go back to my point of iteration um so uh why not try both right like you should be able to try both because at the end of the day like the reason why I'm saying it's iterative the reason why I'm saying don't design it up front is because um for a lot of the behavior of the models we don't fully know how it was fine-tuned behind the scenes so I actually can't tell you up front you know it's an empirical thing so I actually encourage you to try both um and the way I can give you a framework for trying and being able to determine which one's better though um so the framework I can give you is maybe 30 maybe 100 if you're feeling to start with 30 or 20 um uh the 20 examples of of running of running this and where you change the system prompt to be English or non-english and you see you basically AB test okay so like that's the framework for doing it and these models are very empirical and it is on a per model basis um because I know different models are trained on different languages I think mrra is trained on more languages um for example yeah perfect okay I think our next question is what should be the approach for understanding prompt engineering for the LMS that are going to be released in the near future examining training sets Etc oh I see um for a lot of prompt engineering I think you know as much as the creators can tell us about them and maybe you're a Creator like whether you're a meta or not I think actually more people are going to be fine-tuning and building llms um in the future uh so I think the more the creators can be transparent around you know maybe the training data or also just like what the pants look like at the very bare minimum uh one I think they're incentivized to do that like I'm actually confused why they don't release it because it actually helps other people see the amazing performance of these models um so I think uh the approach for those releasing it should definitely be releasing as much information as possible um and going Beyond just like open source weats uh and definitely showing what what pants look like for this model um but yeah I think the uh I think the approach is the same like because it's it's empirical in terms of what's been trained on it what's been what's been done to this model um you have to you have to test it uh and the way kind of I go about it the framework I go about it is having different test sets for different use cases and I think um this is why I'm kind of excited about uh you know models that are going to be more specialized because I think you could actually have real test C over a specialized use case versus saying like it has to be General it has to do everything even things that contradict each other it's it's a little bit harder um when when that's the case um but yeah but of course you can always do fancy techniques um examining the training Set uh and doing things like that being like Oh it covered these modes AKA it covered um modes are just like modes and distribution it's like covered these these topics you know um so you could do that uh for example perfect perfect uh recently uploaded very high can you provide examples of different pants yes than I don't know if you understand okay I think people here understand the pants and I deeply appreciate it reward you with some lavas on screen um okay let me actually read the question because I'm a little bit flustered about how great this is uh so can you provide a few more examples of different pants oh okay um I think uh the most important like this might not be like pants but like pant leg like if you were to extend the okay sorry this analogy kind of breaks down at some point but is uh multi-turn so what I didn't share with you was multi-turn right um I didn't share it so like this assumes no chat history and uh it looks a little bit weirder with it and a little honestly like kind of unintuitive uh because the creators chose it like it's probably they they chose it um so it's a little bit unintuitive um but uh yeah if I can think of it off the top of my head I can try to do it um but basically yeah I I I can share that with you I can add that to the repo if that's helpful um I would say that aside from the ones I shared for those models uh I don't know because those are things I found in like slightly unofficial but slightly official ways and um this is where I wish things were a little bit more transparent around it right like it'd be easier to see it if you saw the trading data you definitely see it um uh I've tried you know hacking these models to see if they would leak the prompt to me you know I didn't try very hard but I'm sure there are ways to try to go about that um so yeah I think uh aside from these are not like any super known ones um for these these these couple models um that being said the multi- turn and it's possible they didn't train beyond that does that make sense like it's also possible they didn't they didn't have a template beyond that it's totally valid um um to do that um and and maybe just to give like one level deeper detail um so as a creator of llms what you would do is you think about well what control do I want my user to have right and so the Llama creators probably like you know I want there to be a system profit I think that would be pretty cool if there was a system prompt because it seems like slightly different you tell the model to to take on a different identity as opposed to like you're you're having a conversation with the model right so um they trained on you know I know there's a default system prompt but what they did was say fine tune on a lot of different possible prompts to get a distribution so that when you put in a system prompt it doesn't only know one kind of system prompt it can actually adjust the model towards that system prompt um so it's important to when you're fine tuning in that case uh get a get a good distribution over over what you would expect your users to to do with it I don't know if that helps I'm sure it helps them a lot last question how can we address ambiguity context shift and the correct semantic nature of the prompt given to an llm let me see if I understand that ambiguity context shift and the correct semantic nature um I kind of want a specific example from this person that'll be tough sorry people watching how do I address oh my God it's a little bit better I'm like I'm like this question this prompt to me is ambiguous but I mean I can make some assumptions and see if see if I can answer something so um how I mean my response is you should be less ambiguous but I don't I don't know what the person's looking for um quite frankly uh so I'm sorry okay that's okay well I think that's perfect to wrap up then uh thank you so much Sharon this has been an amazing session and we'll definitely have to have you back and some more courses and llamas you know Po from anywhere but um feel free to uh you know register for our next event we have so many things coming up for our courses and many other things uh with deeplearning.ai and we'll see you guys next time thank you everyone bye
Info
Channel: DeepLearningAI
Views: 27,230
Rating: undefined out of 5
Keywords:
Id: f32dc5M2Mn0
Channel Id: undefined
Length: 60min 18sec (3618 seconds)
Published: Tue Jan 23 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.