Run Llama 2 on local machine | step by step guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is arohi and welcome to my channel so guys in my today's video I'll show you how to use llama to on your local machine so llama 2 is a large language model released by meta and this model is available for free for both research and commercial use right and llama 2 comes in two flavor llama 2 and llama 2 chart okay so this llama 2 chart is fine tuned for two-way conversation okay so this mattera researchers have released way different variants of this Lama 2 and Lama 2 chart with different parameter sizes including 7 billion 13 billion and 70 billion parameters okay so today I'll show you how to use one of these chart model on your local machine we'll see how to download that model and then how to use that model on your local machine okay so let's start so for that guys you have to download the LM Studio you can simply visit this site LM studio. a and here you will see these two buttons okay so from here download LM Studio for Windows will use this you can use for Mac also if you're using a Mac machine then use this one and if you're working on Windows then download LM Studio from here okay so once you download the LM Studio you will get a executable file let me show you okay so just go to downloads you will get a setup file like like this okay so you will install it once you install it uh the app will open so I have already installed the app so you can see the app over here now click on this and we can open the app so here is our app so this is how this app look looks like okay now here you can see right now we are at home and this is the chart okay so let's just uh start from the very first step first thing over here is here you need to search the Llama 2 model so let's write llama 2 and uh guys this LM studio will search the Llama 2 models from hugging phase and you can see that with this name you'll see different latu models okay see with different parameters here we have models with 7 billion parameters we have model with uh 13 billion parameters okay so here you'll see lots of models but which model we need I will paste that model over here and which model we want is so we want this model right llama to 7 million chart so what we are doing today is we want to see the two-way conversation user will type a message and then AI assistant will type a message back right so we want to see we will try the chart model so that's what I have written and this gml means we want to use the model on a CPU okay and if it is um if over here gptq is written that means we want to use it on a GPU okay so right now we are using a gml model today I'll show you how to um use this llm model on CPU okay so I've written this over here and then click when you just click on go and you will find this model okay now we have this model now guys on the right hand side you will see the all the quantizations uh these are the quantized models okay so we are using this quantized Q4 we will use this model so guys why we are using quantization because with the help of quantization quantization is a technique to reduce the memory and computation requirements of a neural network while preserving their performance to some extent okay so uh that's why we are using a quantied model so we are using this quantied model and you will see a download button in front of it so in uh in my case I have already downloaded this model so that's why it's written downloaded you'll click on this download and then the model will start downloading so when this download will start it will take some time okay because uh these are the language uh large language models and uh while downloading these models it will take some time and based on your internet connectivity it will you know take some time all right so once you download this model you will see that model in this folder click on this folder you will see that model over here okay now we have the model now we want to use that model to we will see how to use that model for chart so click on this okay here chart now so first select a model from here okay which model you want to select the model which we have downloaded just right now we have selected the model now we are loading the model the model is loaded now you can type a message over here you can ask any question to this let's see let's ask what is AI you can see we are getting the different uh we are getting the answer answer of this and the same way you can ask different question now let's stop generating the answer you can regenerate the answer okay now let's ask a different question uh write a python function okay so there's a spelling mistake but still so see it created a python function it's showing us how to use a python function in the same way you can ask as many question different questions to it so this is how you can use this model once you done the task then you can eject the model from here okay guys before ejecting the model you can see that RAM usage is this okay so this much RAM is getting used to used while using this model now when you'll eject the model here RAM usage is also zero and CPU usage is also zero okay so that's it this is how you use it and if you want to delete the model so suppose you are trying different models and you're getting memory issues you can delete the previous models from here and this much memory space will be free then okay so yeah this is how you can use uh LM Studio to load different um llama 2 models and you can try it out okay so in my next video I'll show you a different way to load uh llama 2 model on your local machine and to work okay I hope this video is helping thank you for watching
Info
Channel: Code With Aarohi
Views: 16,859
Rating: undefined out of 5
Keywords: llama2, llama, meta
Id: YlGxaqyQ_r4
Channel Id: undefined
Length: 7min 1sec (421 seconds)
Published: Wed Sep 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.