Ollama: The Easiest Way to RUN LLMs Locally

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today I'll show you one of the easiest way to run open source large language models locally the tool that we are going to be using is called olama and currently it supports Mac OS and Linux Windows support is coming soon it supports a wide range of large language models including Lama 2 Lama 2 uncensored and a whole bunch of other including the newly released mistal 7B so I'll walk you through a step-by-step installation process and then we will chat with the model okay so I'm currently running this on Mac OS for this you can simply come and click on this download link as I said uh it currently supports Mac OS Linux and Windows support is coming soon so we will click on download for Mac OS okay so it downloaded the file we're going to just double click on it now on Macos it's asking me it's downloaded from the internet do you still want to run it so say yes and then it's asking me to move this two applications so I'm just going to click yes we will complete the rest of the installation process so I'm going to click next now in order to install this on the command line we'll simply click install and then it's going to ask me if I'm my password so I provide the password now in order to run the AMA framework we simply need to run this command so I'm going to copy this and click finish okay so by default it provided me the command to run Lama 2 so if you come down and look here this is the command that we are using so all Lama run and then the model name but let's say if you want to use another model uh for example something like mistal 7B so we can come here and they have a pretty nice uh model card in here which provides you with different information regarding the model but let's say if you want to run the mistal 7B model we're going to copy this okay so here I open another terminal you can simply paste the command that we just copied and hit enter uh so this will basically start downloading the model itself and the model that we're using is around 4 GB so this might take a while it seems to be a quantized version of the mral 7B model now you can go to their the GitHub page and here they have provided information regarding the different models that are supported as well as the model size and it also gives you the rough Ram requirement in order to run different models so for example you will need around 8 GB of vam to run um 3 billion model and around 16 GB of V Ram to run uh 7 billion model okay so the model is downloaded and now we can start experimenting with the model itself so let me ask it what is your name and let's see what it comes up with so you can see uh the response false time was pretty good okay so we can play around with the model but before that let's set the veros mode that will give us more information uh about let's say the number of tokens and uh some other extra information okay so let's ask another question how to kill a Linux process and let's see what it comes up with so here it's giving us a pretty detailed response all right now on my computer we can look at uh the total number of tokens and how many tokens per second we getting so for example if you uh look at here so prompt evolv rate so it it went up to um 82 tokens per second that is this is pretty great and then the generation tokens are around 60 tokens per second it's a pretty good speed on my M2 okay so if you type uh Slash question mark it will give you different options uh that it has so right now we looked at the waros mode uh you can enable the quet mode right and also show like let's say parameters system uh some templates that are in there Etc okay another great feature that uh this package has is that the model that you installed can be served through an API for example we're going to Simply copy this okay so we open yet another terminal and here we'll simply paste that and now what all we need to do is uh to change the model so we currently have mistal 7B installed so I'm going to just change it to mile basically uh when we use this the model is going to be served on this specific port number and it's waiting for the uh API call now if we run this we should see a streaming response from the model so basically uh you can uh check in here our prom was why is the skyp blue and here's the streaming response from the API so it says now here is the response the sky appears blue due to a phen phenomenon known as really scattering uh so right you can see that this is basically a streaming response from the model uh so that means you can serve your model through AMA their platform offers quite a uh quite a few more features and I would recommend everybody to check out their blog uh they also have quite a few Integrations with other platforms so for example it has integration with L chain llama index uh even light llm and a whole bunch of others it's one of the easiest way to run open large language models on your local machine without a lot of setup I hope you found this video useful if you did consider liking the video and subscribe to the channel thanks for watching and as always see you in the next one

Info

Channel: Prompt Engineering

Views: 34,704

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, natural language processing, GPT-4, chatgpt for pdf files, ChatGPT for PDF, langchain openai, langchain in python, embeddings stable diffusion, Text Embeddings, langchain demo, long chain tutorial, langchain, langchain javascript, gpt-3, openai, vectorstorage, chroma, train gpt on documents, train gpt on your data, train openai, train openai model, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, langchain ai

Id: MGr1V4LyGFA

Channel Id: undefined

Length: 6min 2sec (362 seconds)

Published: Fri Oct 06 2023