NeMo Guardrails - Tame your LLM without Prompt Engineering

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

when building AI applications particularly chatbots you don't just want to release them freely onto your users but rather restrict what they can say and especially what they cannot say the holy grail for this has long been and still really is what's known as prompt engineering there are courses for prompt engineering costing € 500 prompts with nearly 1,000 words and much more there is some pretty absurd stuff out there what you can read and see but of course some of it was also good because there were no real Alternatives but you have to restrict behavior and some way for example A bank's chatbot shouldn't be telling stupid jokes or insulting users this can of course be captured via prompt engineering but it would actually be much better if such messages wouldn't be sent to the llm at all this means we need to put up a first line of defense between the message and the LM and that's where guard rails come in with guard rails you define some examples for certain categories these are vectorized by an embedding model the user's request is then vectorized to and a similarity search is made over these examples if the request is similar a standard response can be given preventing the request from being sent to the llm this is much faster and with open my eye models also significantly cheaper because you don't need to send as many queres to the large language model Additionally you can also save money because you don't need such complex prompts in this video I going to show you how to use Nemo rail guards we look at the basics of the modeling language colang how Nemo guard RS function and how you can use them together with linkchain as always you will find the link to the code in the video description so first we have to install the package we do this with Pip install and then Neo guard rails so this installs all of our dependencies and then for this we have to use an OPI API key so I've put my OPI API key in this end file and then I load it with the python. NF package you can install that with Pip install and then python minus. n and we first find our end file and then load the environment variables from that file so after doing the setup we can now start with the basics of coang to create a coang file we can create a file with a CO ending but we can also Define it here in a template string with python our G Ray block is defined by three subblocks user bot and flow we use the Define keyword to create a new block and at catch a user a bot or a flow to that block so here in the first block we Define that this is a user block and this is the the actual definition of that block so that's the name of the block and then we provide some examples so we want some examples for a greeting like High and Hello these are semantically similar we then Define what the bot should answer if we receive a greeting so this is what the bot should answer hello there can I help you today so this is a deterministic answer in comparison to an answer from an llm which is always different or often different and then we Define the flow you can read that like an if statement so if the user expresses the greeting so this is the same like here then what we want is that the bot expresses the greeting we Define here so that's the flow and now we also have to Define ayama content and this includes the model so we use uh an open ey engine and we use the text DCI 003 model to create our answers important we only use this for the length chain part of this video So currently we use just an embedding model to check whether the user greeting is similar to these two words or not so this is done with an embeddings model the default model for guard rails is the Min LM model and you download that by installing Nemo guard rails but you can also use openi embedding models like text embedding 002 so if you want that you can add that to the Y content I just commented out so I use the mini LM model for the embeddings but this is up to you and now we import llm rails and rails config from Nemo guard rails so we create a config U with the class method from content and we provide the content and we also provide the colang content so this is now our config which we will now use now we use the config in the LM rails class and we use it here in the Constructor pass it as argument and now we've got an instance of that llm rails that llm rails has got a method called generate async we have to use the async method in a jupyter notebook so we await the answer here and we can pass in a prompt and we just say hello so hello is exactly inside this here so we expect that we get as answer this so hello there how can I help you this is the expected answer which should be provided by guard rails of course we first have to run our code and then let's check it and yeah as you can see the answer is hello there can I help you today so instead of just passing a single prompt we can also pass a complete conversation a conversation is built like this so we've got a list and this list has got multiple dictionaries and we've got two keys role and content so the role is user so this is the request from the actual user and the content is the question which we provided here in the prompt and now we provided here in this content again we run generate async we passed the messages now instead of the prompt and we get back a result so hey there is very similar to this but instead of just receiving a string we also get a dictionary back and the dictionary contains also a role and the content and as you can see the role is assistant so this is from us we are the user and assistant is dii and the content from dii is hello there how can I help you today so exactly the same as here okay now we can use variables in combination with if El statements to make our coang file more Dynamic so here we've got again the user and express greeting and here we can see this is the bot which also expresses the greeting and the bot can also Express a personal greeting inside here we've got a variable you can see that by the dollar sign so we've got a variable called username so if we make a personal greeting we want to say hello for example hello Marcus nice to see you again so now the flow looks different so we can say the user expresses the greeting and if you provide the username the bot should do a personal greeting otherwise the bot should Express the normal greeting so so how can we actually use that so we create again the config and the rails object because yeah we changed the config and now we do it like this so nothing new will happen we don't provide a username or anything so we just get back hello there how can I help you today if we want the username we have to provide an additional dictionary so the role here is not user or assistant it's context and in the content we provide another dictionary and here we provide all of our variables so username is the key in this dictionary and that matches the variable in our calling file so the value of our username is now Marcus so now um we pass that and we can see hello Marcus nice to see you again okay from there now let's move to the integration with length chain so we again Define our coang file so this the same as before but now we Define another flow and this makes use of a catch all term so we provide the user and then the three dots though if this flow does not catch the message by the user we will execute this chain and we pass in the latest message from the user s query to a q and a chain so we have to Define our Q and A chain another new term is the execute method here so we can Define that we want to execute a function with this execute keyword so we want to execute this q& a chain variable and pass in that varum so let's first create our config with the llm rails now and then we have to create our Q&A chain so we do this by creating a new Vector store and our Vector store makes use of chroma so we use the chroma Vector store and we provide a few example so we have a fictional bank which is for cats so we offer credit cards for cats and yeah interest rates for cats on a savings account and so on and so on so we use that fiction information because in the answer we want to make sure that the answer is not provided by the llm which was trained in the llm training but actually it's provided by our Vector store we then convert our Vector store into our Retriever and we create the template so the template tells the bot that you are a helpful bot for our bank and only answer if you got the content in the context so we make some prompt engineering here and otherwise it should answer in a friendly way so in general it's useful to still uh provide some catch all phrase in the prompt so if so we cannot catch everything here with that greeting of course so it's useful to do that in the prompt itself too and here we've got the context that is the context we retrieve from the vector store this is the question we provide as input for our prompt so we create a new prompt template we pass in the prompt template as chain type squarks to the retrieval Q&A class and here we create a new retri a chain so we provide the llm and very important we don't don't create a new llm with Chad OPI here so this actually maybe should be removed because otherwise we use it but we take the llm which is stored in the rails uh instance and this has got the llm attribute and we have to use that llm otherwise the llms are not in sync okay so let's run the code and now we've got our Q&A chain what we have to do now is register an action so an action is a function we want to call and in this case we want to pass the Q&A to the rails so Q&A is the instance here and we also can pass a name so this name here Q&A chain has to match the name we provide here in our colang content so let's also run that so now we've got an action here registered and now we can run rooll user and the content is what's the minimum deposit for cats uh save account so let's run that code and we can see the minimum deposit for a cat bank's Kitty saers account is $500 so we extract that information from our Vector store so the combination works but now let's ask another question so what is the main character of Die Hard so let's run that and this okay that's actually quite surprising I expected it to work so maybe you should ask something different make a joke about a cow I normally would expect it to make that joke or answer that question before okay now it makes some joke and we normally don't want to make a bot for a bank make that kind of jokes so to prevent this we also want to create another guard rail so let's create another guard rail so here we can see this the greeting guard rail and also our flow and now we create different guard rats so ask user about dogs so this the user definition can I get a loan for my dog can I get an insurance for my dog and so on and to answer this we want a no dogs policy so here at Cat Bank we're all about cats and we don't offer services for dogs so this is the answer we want if something dog related is asked and then we Define this policy this flow dog policy so if the user asks about dogs we want to provide that then about then we want Sil request so if we get silly questions can my dog can my cat open its own bank account we want that um standard answer here so again we Define a flow so as you can see this is always the same we provide some examples and then the standard behavior from our bot and then we provide inside here a so-called Chet chat um class so in Chet chat we get this catch all term so what do you think about the latest movie got any weekend plans can you tell me a joke so this is a catch all term here and it is below all the very specific stuff so we go from specific to Broad so now we can say sorry I do only cat related Finance advice then we Define the flow and then after this Chet flow we Define that user flow where we execute our Q&A chain so this is the the last block of our colang file so let's create it and now we create our rails and then we provide first the username Marcus and say hello so we expect now a personal greeting because we Define that in the bot and as you can see hello Marcus nice to see you again hope your cat's doing well so this works and now we want to ask can you tell me a joke so this should also not be passed to the llm sorry I can uh I only do cat related Finance advice so this also works okay now I show you how you can execute arbitrary functions with colang and I'll show you an example with doctr so we get some text and we want to extract uh relevant properties from that text so Doan is a nice library to do that currently the async method is broken and I provided a synchron method which is normally not provided so it's not implemented and this is called transform document so we pass in a document to that class and it extracts here it extracts the uh mentioned branch of a specific text so if a user asks where is cbank South for example you want to extract the branch and this is what doct TR is doing so these are our examples here we have the central cat Bank cat Bank South and North and downtown cat Banks so if these are provided input we get the correct Branch otherwise we return Branch not found so that's our function and now we create a new colang flow so here we've got our new flow and here we've got our examples so where is the central cat Bank where is the cat Bank North and so on and so on so if that's the case if that's a question about some cat Bank store then we want to execute the get bank branch address function and provide the branch as input so that's our new flow we create and then we can just run it so we of course now we have two functions we want to register so we registered the Q&A and we also registered the get bank branch address function and this has got the same name as the function itself so let's register it and now we can ask what is the central cat bank or in English where is the central cat bank so let's run this and now it should make a query to our so-call database even it's only dictionary but as you can see it extracts the correct bank so Central cat bank and passes in here our uh result so the address for Central cat bank is 1 123 fine Street cat City so this is actually correct as you can see here that's the C Central cat Bank okay one last topic and that is subflows so let's have a look at the subflow so first we make a greeting expression here um by the user and then we Define the bot greeting so we provide a greeting authenticated flow here or an answer greeting authenticated and if the user is authenticated want to greet the user by his username so we've got a dollar sign and and the username so this is a variable we can provide with the context and if the user is not authenticated we want the anonymous uh greeting for the user and then we can define a subflow that's introduced with the subflow keyword and we call it greeting with o so now we've got two variables user or which is a Boolean and then the username which is a string so if that's the case we want to execute this otherwise we want to execute this and we can use the subflow and use that in another flow so if the user expresses a greeting then we can run our subflow so we can split more complex logic into different subflows so what we're going to do is we want to call that subflow and we do that with the do keyword so if we say do then we have to provide the name of the subflow and that's greeting with o here so if you run that let's create our rails again and then as you can see here we've got user or and then we've got our username but the username should not matter if the user or is false so let's run this now we get back welcome to cat Bank please log in or create an account to begin okay otherwise if we set it to true if we are logged in then the username will be used welcome to cbank Marcus how can we ass you and your cat today so this works great I hope you can see how guard rails may change the way we interact with other LMS instead of creating a large and complex prompts we can write callang files and prevent calls from our large llm to even happen

Info

Channel: Coding Crashcourses

Views: 3,761

Rating: undefined out of 5

Keywords: langchain, openai, guardrails, nvidia, nemo-guardrails

Id: 3DfV6URqrZA

Channel Id: undefined

Length: 18min 12sec (1092 seconds)

Published: Wed Nov 15 2023