LLaMA2 with LangChain - Basics | LangChain TUTORIAL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Okay. In this video, I'm going to look at using a LLaMA-2 with LangChain, specifically, I'm just going to use the small model here. I'll do a number of videos, going through more advanced stuff What I'm trying to do is show you the basics of getting something going and also how you can run it locally. we will look at running the 70 billion in the cloud where you can use it like an API in the future. But in this one, I want to actually basically just load the whole thing in a notebook, run the whole thing with pretty good response times, And use it that way. So you'll notice just to set up we're bringing in the sort of standard stuff of transformers, bringing in LangChain here. because the LLaMA model requires you to get permission, as I've talked about in the previous videos, you will need to put in your hugging face token. from this. So when you see this pop up, you can basically just click this. It will take you to hugging face where your token is. You can either create a new token or bring a token across. You just need to read token for this going through here. once you've basically got that in, you can then download the model. So the 7 billion model is not that big. you'll find that you can probably load it. You can see that, when I'm, basically running this through, it's using under 15 GB of memory. so you can probably actually load it in a T4 GPU as well here. we need to set up some of the things that we did before. So remember I talked about in the previous video about the different sorts of prompts. this is setting up. So I'm using the same sort of system. I've just altered it a little bit here. because we're going to be using it in LangChain one of the challenges that we have with LangChain for this kind of thing is that in some ways, this model is a chat model in that it's built for having, Meta's actual sort of API for serving it runs it. As a chat model, just like you would use a GPT 4 or the GPT 3.5 turbo model or Claude but when we're using it in here, we're using it just as this completion model. So we need to basically go through and make this customization. So here you can see what I've done is I've got this get prompt, which we can pass in instruction to. Now, if we just pass in the, just the instruction, we will basically get back the default system templates. You can see from here through to here is the default system templates, and then we've got the instruction after it. And they're going to be wrapped in the instruction there. If we pass in our own system templates into here then we will get our system, the new system template that we've got followed by the instruction in the same format that LLaMA-2 wants to see it here. And this is key for playing around with the prompts and trying different things out. Now I will just preface this by saying that with this small 7 billion model, it is, I think is a very good model. there are certain things that, is not great at the logic stuff you want a bigger model, of course, for that kind of thing. It's also not great at returning things as JSON or returning things in a structured output way. my guess is we will see some fine tunes. coming that will improve that over time. but for now, what it is good at is that, we can use it just like a normal language model to do a variety of different tasks, like summarization question answering, all this kind of thing. So the key to this though, is you really want to play around with both the system template and the instruction. in here. So don't be afraid to go and change the system templates that I've put in here. I've put some that I've played around with a bit but I'm not going to say that these are the perfect ones. You could probably get a lot better from doing this. Now we set up, the model just quickly up here as a pipeline, a transformers pipeline. This is where you would make the changes. If you want to make the contents longer. anything that you want to change there. and then, coming down to use this in LangChain, we're just basically using the hugging face pipeline where we're bringing in that pipeline here, you can see I'm sitting in temperature to zero. So once you've got your LLM set up with the hugging face pipeline. you then want to basically make an LLM chain here, which is going to require a prompt. But we've actually got multiple prompts, right? We've got the system prompt and the instruct prompt here. so this is where our helper function for get prompt is going to be used. So you can see here, I'm passing in the instruction. I'm passing in the system prompt. And that's going to format it out like this. So we've got this, you are an events assistant that excels at translation in the system prompt part. And we've got, convert the following texts from English to French, in the instruction part. And then we've got this text where we're still gonna pass this in. so you can see that's where, when we define our prompt template, the input variable is going to be text that matches up here, that what we've got in there. And then we're just doing our LLM chain, passing in the LLM, passing in the prompt. And then we can basically run it. And you can see here, we can ask it, okay, the text is, how are you today? so that's going to be translated from English to French. And you can see here that the output that it gives us is this. And if we look at, Google translate, We can see that seems to be translating quite well from English to French. So this French is translating back to the English, which is what we wanted in here. So even though this model is not built for translation, it's had enough data that it can actually do that task or, as it goes through. So let's look at another task that we want to do. So if we wanted to do summarization. So here again, I've got my instruction. I've got my system template is going to be, you are an expert at summarization expressing key ideas succinctly. instruction is summarize the following article for me and then passing in the text. and you can see here that this is going to put it into the right format and we've still got the text input of what we're going to be putting in. So here is basically an article from TechCrunch. all about some of the changes at Twitter over the past few days. And you can see if we count the words, it's 940 words there just splitting on spaces. and if we come through and run that text through, we get show here's, a summary of the article in 400 words or less. It's actually a lot less than 400 words. and it gives us a decent, summary for this. Now, if you want it to get bullet points, you would just play with this instruction here to say, summarize the following in key bullet points, et cetera. so again, this is making use of basically getting the sort of merging of the two parts of the instruction prompt and the system prompt to create this template and then passing that into the prompt template with the input variables that we're going to use there. So you could do a variety of different tasks that you want to use that for. Anything that you want to transform some kind of text from one thing to another thing you would use this kind of a task for doing it. If we wanted to do a simple chat bot we can certainly do this here and this is going to be just a simple chat bot with memory. We're not using any tools here. in the future, I'll look at sort of tool use and ways that you can do that with the LLaMA-2 model as well one of the key things here is that we're going to have, A system prompt. I'm going to override the system prompts. We're going to say you are a helpful assistant. You always only answer for the assistant. This is key because if you don't have something like that, you'll often find that it will try just generate lots of answers for both sides of the conversation coming out. read the chat history to get the context of what's going on here. So here you can see I'm passing in the instruction, the chat history. which is one thing we'd be passing and then the user input in here. we're wrapping the whole thing in one instruction in here. Now this is a little bit different than how Meta does it, where each interaction they're wrapping as a separate instruction here. I found that actually, that wasn't necessary if you basically put the prompt like this. So playing around with this prompt, I found that you really need to make, tell it where the chat history is it won't just infer that like perhaps a bigger model would. So by making it really clear that below here is the chat history and then the user input is going to be here. It will then be able to operate on the history and use it like a memory. So we've got our prompt templates set up. This time we pass the in both chat history and user inputs. we've got our conversational buffer memory, which is the chat history that we're going to pass in. And then we've got our LLM chain here. So you can see here, we're passing in, both the LLM and prompt, but also the memory going on here. okay, let's look at the conversation. If I start out just say, hi, my name is Sam. So it's getting this full thing going in. There's no chat history at the start. So it says, hello, Sam, it's nice to meet you. How can I assist you today? and I ask it, okay, can you tell me about yourself? And it comes back. and it says, of course, And notice here now it's got the chat history. So it's got the chat history from before in there. So it's answers, of course, I'm just an AI designed to assist and provide helpful responses. I'm here to help with any questions or tasks you may have. How can I assist you today? Now to show off the memory. I wanted to play around with some things with this. And for testing the memory you want to try these kinds of things out. So here I'm saying, okay, today is Friday. what number day of the week is that? Okay. So it goes through and it gives me an answer out and it says, ah, great question Friday is the fifth day of the week. now I think in different calendars, people count the days different. That I'm not really interested in. What I'm more interested in is the next question. When I say, what is the day to day? and without that chat history, you'll find that it will just make up a day. it will, just generate something random or something like that. But here it's got the chat history in it. So you can see it's can see that, the human said today is Friday, so it knows that, oh, okay, the answer is, today is Friday here. Now, actually this AI thing, I could have put this in the prompt to so that it doesn't actually feel that bit out itself. It just gives us this as we're coming back. And again, another thing I wanted to try was okay, what is my name? So remember way back at the start, it said my name, so sure enough, it's able to say, your name is Sam in here. You will find, for example, with the different size models. This is an example of the 13 billion model. and then in this one, it's, sure thing, Sam, as a helpful assistant, I can tell you that your name is Sam, And it's got some more sassiness with the bigger models too. but back in the 7B, we can see that, okay, it's gotten that. if I ask it now completely different question. Can you tell me about the Olympics? it then goes on to give me a bunch of information about the Olympics in here. and then final question, I ask it, okay, what have we talked about in this chat? and you can see that it's able then to do a summary of this. Of course, here's what we've discussed in the chat. And then it's, I'm actually not, you're not printing these out, but if we were printing, you have a new line. assistant introduces themselves. the user, asks them to tell them about themselves. It's got the conversation of what we've gone through and talked about in there. So it shows that the memory is working in here. so this is kind of a good sign that even the small model is working with the memory. it allows us to do that kind of thing. If we want to incorporate tools, we'll look at that in a future video for this. Anyway, this gives you the quick basics of using LangChain for doing a variety of different tasks with LLaMA-2 the same thing you will be able to do with a four bit version of the model if you're running this locally and you want it to basically do this as a 4 bit model, perhaps look at that in a future video. And also if you're actually pinging an API where it's been served in the cloud, you'd be able to do that as well. Anyway, as always, if you've got questions, please put them in the comments below. If you're interested in these kinds of videos, please click and subscribe. And I will talk to you in the next video. Bye for now.
Info
Channel: Sam Witteveen
Views: 14,405
Rating: undefined out of 5
Keywords: gpt-3, LLM, large language models, Meta LLM, llama-2 model, Llama-2 7B, Llama-2 13B, Llama-2 70B model, RLHF llama 2 meta, llama 2 ai, llama 2, llama llm, llama meta tutorial, llm, meta llm, llama-2 30b, llama-2 7b, llama-2 13b, llama-2 70b model, rlhf, machine learning, ai, meta ai, facebook ai, llama, v2, llama 2 demo, langchain, chatbot, langchain summarization, langchain llama2
Id: cIRzwSXB4Rc
Channel Id: undefined
Length: 12min 14sec (734 seconds)
Published: Tue Jul 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.