How To Use Llama LLM in Python Locally

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so welcome back again my name is Jesse and in this wonderful and exciting tutorial we're trying to see how to use your large language models locally right so in case you have some llms and you want to use it locally how do you do that so some some time ago we saw how to use GPT for all which is another alternative in this direction you see how to use llama right so let's explore what you'll be doing so the basic idea behind is that you need to get to the model and we can get them here so if I go to GPT for all we have those from nomic or yeah phonomic which has several models right that you can download so there are several of them you can also go to hug and face which is and they have a lot of models that you can also check out that you can also check them out right and then download them today so how do you use one of them right so we are using llama so to be able to use llama you just have to install it on your system so in case you don't have llama on your system you can just go with this option here to pick install llama in this so that is one of the ways you can use llama index or you can also use pip install llama CPP right is it python minus for the C plus plus package then python so these are the ones that you can use so you can either use the Llama index or you can use the Llama C plus plus so let's see how it works so this is the model that I'm using which I've already downloaded and stored here inside my llama index very important now let's start with it so I'm just going to start with the method number one so method one kilometer one that we are trying to use the C plus plus right so it's going to be from llama C plus plus right we want to import the model so the model you want to import is this particular grammar model right the class and then from there we can now create our llm so we can call this my LL name and then I'm going to pass in my llama so if you check out the content of this class and then we just do a question mark for this we're going to have the model part right the context window how much you want by default is 5.2 you can expand it there are some other parameters you can also pass right so the model part is the part that we have here that's what you want to supply so I'm just going to go back again and then I'm going to specify my model part so model parts and then I'll pass in the part to my model which is the one we had above that is also this is order you need and consider just downloading it it's already there so it's going to initialize it and now I can chat next time chat right so it's going to be llm then I can pass in my query so let's pass in some query which is going to be I want a large language models and how do we use them how do we use them right so there is a question that I'm asking and then I can run it so if around is going to analyze it behind this thing is going to talk to my model right this it start doing over the Internet like opening AI charge GPT or GPT today this is doing it to my local model so it's interacting with this local model and it's going to give us the result that is one of the way so it's similar to using GPT for all which we saw earlier on so in case you want to see how to use GPT for all locally you can check the links below because I have some there and now it's going to retain the results this one depends upon how fast this time is right and then we also have some of them here right some of the models here so the advantage of open AI and then this is that open AI is fast right because it's a service for like more powerful system but this is just locally here and now just retain my result so you see that we have the ID the objects and then you can see the model that is using and the choices so this is going to be the result right you consider this is the result that was written here right and it's giving us all of this out of the box so that is one of the ways you can use your large language models locally right using llama right that is especially if it's a llama model let's see the other alternative that you can also use this is going to be method number two so formatting number two we are using method two this is using llama index not llama but llama in this so in that case it's going to be the same thing so similar so from gamma index I want to go to my other lamps because slammar index has a lot of elements that we can use right to see if we have llms here and then I'm going to import my llama right llama CPP this is the base class behind the scene in case you don't specialize it or in case you don't specify and now I can call it as my l Lim let's call this two or GPT something like that and then I'm going to pass in my llama and this one also takes the same thing it's also going to take the model part and the model URL so if I go back and I explore this so you have a llama and I explore it with this question mark you can see that this one also takes the model URL which is by default is picking list right in case you don't specify so it is similar to where we took it from right from hanging face and then you can also specify the part so either you specify the URL or you specify the part and it's going to work for us and there are some other parameters you can specify and behind the scene is actually being built on top of right as you can see from here it's actually using llama CPP right to actually do the same thing but this is a wrapper around it right as you can see from La minus so now I can pass in my model part which is going to be equal to my model part so in case you don't have the part you can just specify the URL and it's going to download for us let's finish and now I can chat with it so to chat with it you need to go with the chat right and then you have to pass in your Tuple of go to both so the Tuple supposed to be like this so let's go it's from llama index we have some very nice systems that we can use to import we have I think it's supposed to be there are a lot of ways you can do that right so the idea is that we have to create a chat message from which we can interact so I think it's supposed to be inside my base I'm not sure but yeah and then we have here you have the chat message yeah so this is how to do that and then from there I can now create my chat message and then pass it into my so I can contact my message which is is going to be my chat message and the transmission takes in the role so whatever row you want to be so less I want it to be let's say a user and the next one is going to be or an assistant or whatever it is the next one is going to be my content which is going to be what I want so how useful are in indices in search oh yeah in database search and llm search or something like that so this is the message I'm sending and now to send a message I can just create the same thing we did and I'll pass it here as a tuple or as a list inside my message and it's going to go behind the scene and then send this information to my model and return the result right so that is what is actually doing behind the scene so these are some of the ways we can actually use our models our llama models locally so we have seen the first method using llama C plus plus or CPP and we also have the same as using llama index you can also use LINE chain to also do the same thing and you can also use GPT for all to do the same thing which we have seen some time ago so that is how it works so let's go back and check and see it's taking some time for you to retain the results okay so in case you have any question or contribution you can preset the conversation below and let me know if you like this video GT4 has some very nice models and my concern is I hope in the future these models they are becoming Begin by getting bigger and I think one of the ways that everybody who benefited you have a compression algorithm to compress these big models into smaller ones but more effective and faster right that is going to be a very good breakthrough because this is good and you need a lot of space and internet bandwidth to get these huge models there is oh let's go back again paper so just finished like as you can see it gave us a result as a chat response you can see the chat message and then you can see that it came as assistant the content so indices can be useful in both search llm right so very useful information that is how to use llama locally on systems interactive system like your models looking from python so thank you for watching tutorial and see another time stay blessed bye
Info
Channel: JCharisTech
Views: 13,705
Rating: undefined out of 5
Keywords: llama, llama2, meta, palm, gpt-3, how to use llama python locally, llama-cpp, chatgpt, python data analysis, analysing data with chatgpt, openai, stable diffusion, chatbot, google vs chatgpt, jcharistech, jesse jcharis, Jesus Saves, llm, llm python package, llm datasette, large language models
Id: C3-SkB1s9bs
Channel Id: undefined
Length: 11min 6sec (666 seconds)
Published: Tue Aug 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.