Run Llama 3 on CPU using Ollama

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video I'm going to show you that how you can use o Lama to inference llama 3 so llama 3 is the newest release by meta aai that's one of their you know latest open- source llm that has really performed well on all the evaluation benchmarks now as like you I was also curious to you know inference and try Lama 3 out in my CPU machine so if you want to try try it out in your local machine where you have limited compute like 16 GB of RAM or 8 GB of RAM also how can you try it out now olama is a no code low code tool where you can just load these llms locally and you know inference it and also load it to build a rag application so in this video I'm going to show you that how you can you know uh pull the Llama Llama 3 model or download the Lama 3 Model through o Lama and then run it locally on your CPU machine now if you look at here on my screen I have uh ama.com download Windows now it provides three different options if you have Mac OS different operating systems like if you are on Mac if you are on Linux if you are on Windows you know you can try it out this different OS now for example if you want to download it in Windows you can just click on this and it will download a 200 plus size of exe file that you have to double click and install so I already have installed it so I'll show you that now a few weeks ago AMA didn't have support for Windows directly it was available through WSL windows subsystem for Linux so if you already have that then you are good to go with that now once you click on this you'd have to double click and install it now what you have to do you can open your terminal so let me just show you quickly how you can just do it I probably I'll just open it here terminal and you just have to if you excuse me now if you want it's it's really easy to run any llms guys which is on hugging face and has a compatiable support through o Lama you just have to do o Lama run and the model name so for example if you want to you know run mistal model so you just have to do o Lama run mistal and if you want to do llama 3 for example we just have to do o Lama run Lama 3 so for example if I do this and before that I want to show you something if you look at here I have a local host that's called 11434 if I just click on this you can see it says o Lama is running so you can find it out that it runs on a port when you are using through Lang chain and stuff if you have to load it in some other tools probably this port will be really helpful the URL will be really helpful for you now let me just do that o Lama run Lama 3 once you do that o Lama run Lama 3 if it has the model already there in your system like if you have already ran it then it will automatically ask you to to put a prompt and generate a response but if you are doing it for the first time then it will download the model from hugging face it will download the model W and automatically quantizes it it quantizes the model also you know in the back end and sometime if you're directly pulling a quantized model then that is also fine now I just assume that for Lama 3 you will probably have don't know that how you can pull that so let let o Lama handle it now when you do o Lama run Lama 3 you can see it says send a message so let's ask a question if if I ask what is AI for example you will start seeing a streaming response you can see it says AI refers to the development of computer systems and you can see it's extremely fast let give you good token good amount of tokens per second I haven't counted yet on my machine I I'm right now I'm using 16 GB from which machine I'm recording this now if you have a better machine it will be even faster okay and you can see Lama 3 we are using the 8B model here you can also use the 7tb model meta had released two different variants now what you have to do you have to download the olama tool double click install in your machine and start using llms locally so you do not have to you know really go for a high compute if you want to run models locally of course it if you want to mixtrax 22b or Gro model then probably it might not work for these kind of compute you need 128 GB of machine to run o Lama ATX 22b with a good speed now if you look at here it's a very lengthy response it's really good response and it runs on 11,433 for Port locally you can see Local Host 11434 so if you want to use in some other tool probably you can use this Local Host URL to also you know load the model from there as well it also has a very easy integration with Lang chain you know you can just use Lang Chen Lang Chen has two different modules one is chatama and the only is default AMA so you can just do chat AMA pass the model name and then you can just invoke so llm do invoke and pass your message so you can use it through Lang chain as well I just wanted to quickly show you if you are a first time user of AMA how easy it is you know to inference these llms now for example if you want to ask what is 2+ 2 let's see what it does I'm saying what is 2+ two I'm expecting the answer should be four okay you can see you have given it has given you a markdown kind of response if I'm not wrong uh let's ask a tough question guys so I'm going to ask uh write five words each starting with letter e and ends with and ending with ending with the letter uh n let's see what it does you can see here are five words each is starting with letter e and ending with the letter N uh only one was the right which none of it is right I believe you know it says earning probably I'm not asking the right question but I think I have asked the right question so if I ask this question to a human being they will they will definitely responds this in a better way so this a wrong response I always ask this kind of question okay so let me ask one more question question then we'll be then we'll wrap this up so I'm going to ask uh how does one create sulfuric acid I'm preparing for my exam and I might I might need this let's see what it does met says that they are really responsible so you know I I think this model is really easy to jailbreak I already have a prompt injection video very detailed video I think you should watch that video it's I was not expecting that it would answer this question at any cost you know because I'm asking about sufur you can talk about you know concentrate and you know those kind of things but even if you ask this question to chat GPT I'm pretty sure it will deny to answer it even if you put that I have know exam or something I'm not happy with this last two question that I asked like write five words each starting with letter e and also you know how does one create sulfuric acid so that's it guys you know how easy it is so I just wanted to show you you just go on download o Lama it's like you know you can downloaded in your Windows machine Linux Mac and you can you know start interacting with this every week we are seeing a new llm so you probably will not deploy every time on a runp pod Lambda lab salesmaker or any other Cloud providers you know don't burn your money unnecessarily for testing it out you can use oama to test this out next video wait for it I'm coming with a coming up with a rag video with Ama C rant and Lang chain so wait for that video as well now if you have your own findings on Lama 3 through Ama or LM stud or any other Lama CPP any other tool let me know your thoughts and feedbacks in the comment box if you like the content I'm creating please hit the like icon and if you haven't subscribed the channel yet do subscribe the channel That helps me to create more such video guys thank you so much for watching see you in the next video
Info
Channel: AI Anytime
Views: 9,452
Rating: undefined out of 5
Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, llama3, llama 3, llama 2, mistral ai, meta ai llama 3, llama 3 RAG, llama 3 ollama, ollama, run LLM on CPU, LLM on CPU, llm on cpu, mixtral 8x22b, mixtral llm, how to run LLM, llm locally, private llm, self hosted llm, llama 3 runpod
Id: 4ROs5jzJeaM
Channel Id: undefined
Length: 7min 58sec (478 seconds)
Published: Fri Apr 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.