Run Llama-2 Locally without GPU | Llama 2 Install on Local Machine | How to Use Llama 2 Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I will share how you can use large language models like llama2 on your local machine without having to worry about the GPU and that means you can use these models on your CPU and without the usage of apps like the text generation web UI so I am going to tell you about an app that you can use in your local machine to access these models uh totally offline so stick till the end of this video llama 2 has three different models running the parameters from 7 billion to 70 billion now based on their performance the 7 billion and 13 billion models are state of the art in their own model size category similarly there's a 70 billion parameter model is also state of the art when it comes to 65 billion or a larger parameter models these models are open source and you can access these models and use them for commercial purposes as well alright if you want to use this model you need to fill out this form and meta will give you the access in my case I got access to these models right away but you don't really need to do that there is a famous wrapper on hugging Pace the block where they have quantized versions of these models I opened this llama to 7 billion repo and you can see this is in ggml format this means we can use these models in different apps like the one we are going to use in this tutorial on this model page you can see there are different Ram requirements for different quantization methods like if you are running the models on CPU the model will be loaded into the RAM and if there is GPU available then the model layers will be offloaded to vram instead so you should select your model according to your RAM requirements and the available Ram in your system as well so for this video I am going to use LM Studio AI so this is a fully featured local GUI supports Windows and Mac OS with or without GPU acceleration all you need to do is go to their website lmstudio.ai and you will see this page they have they have developed this particularly for llms available on hugging page as you can see this app can score tdml models available on hugging page let's download LM Studio for Windows and by clicking on this download button the downloading has started now let's wait for it to finish okay the app is downloaded and now double click on it to install if you happen to see Windows Defender smart screen just click on more info and then click on run anyway the installation of this app will take only a few minutes and after that you will see an app UI like this so in the search bar you can search by keywords or you can also paste hugging paste wrapper URL here I will try by searching with the block slash llama2 to see how many models will it return yeah so they have a whole collection of from the hanging face let's find our desired llama 2 model I would prefer to use the 7B model as it will be easily loaded into my Ram you should also check the model Ram requirements as I mentioned previously on hugging press repo I am selecting the 7B chat tdml model and downloading the Q4 quantization method so this is the same model that we were checking on the hugging face repo earlier in this video all right download this one and it will take some time depending on your internet speed the model is downloading now as you can see here and let's have a chat right go to the AI chat section from the left side menu and from the top there is a load model button click on that and choose the model you downloaded earlier give it a second to load into the ram all right as you can see the model is loaded into RAM showing the RAM usage and the CPU usage as well so this is running without GPU right okay in the text we can change the input to user or assistant from this button Let's test it out I'm giving a prompt to write a poem about AI and it is generating the response now let's wait for the response to be completed and uh right that's a pretty good result [Music] let's ask about the capital of Turkey and it's gonna take time [Music] right I'm not sure why it is taking place so much long all right it took so long and to respond but the answer is correct regarding the code Prospect let me ask about writing a python code to find a prime number in the series and is generating the code and let me have a look now just authorities of code and answer is correct the results are promising for a 7 billion parameter model the recording capabilities are amazing all right in the chat section you can also add a new chat from the button located in here okay you can also set up a local server from here but uh I'm not gonna do that right now just to make this guide simpler all right in the my model section you can see all of the downloaded models available here you can also delete them in order to free up some space from your computer in the end you can also eject the model by going again in the AI chat section and clicking on eject model as you can see the RAM and CPU are free now to load the other models that's the tutorial to run llama 2 7 billion model on your local machine without having to worry about the GPU if you learned something from the studio give it a thumbs up and don't forget to subscribe to AI brainbox until next time
Info
Channel: AI Brainbox
Views: 13,457
Rating: undefined out of 5
Keywords: llama-2 model, Llama-2 7B, Llama-2 13B, llama 2 installation, llama-2 with local gpt, llama-2-7b-chat, llama 2 api, llama-2, llama-2 on cpu, llama 2 download, llama 2 tutorial, how to install llama 2, how to install llama 2 locally, llama 2 meta, artificial intelligence, llama 2 demo, llama 2, how to use llama 2, meta llama 2, fine tune llama 2, llama 2 fine tuning, llama 2 install, how to download llama 2, what is llama 2, llama 2 local, llama 2 langchain
Id: ua6zKtig-rw
Channel Id: undefined
Length: 5min 38sec (338 seconds)
Published: Fri Aug 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.