Ollama: Run LLMs Locally On Your Computer (Fast and Easy)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hey everyone, in this video, we're going to take a look at how to run a generative AI model on your own local computer. We're going to use Ollama, which is an open source platform that helps you to manage and use different open source LLM models, which range from two gigabytes to over 40 gigabytes in size. You'll be able to run inference on them locally on your machine for free and without an internet connection. And if you want to build an AI application with it, you can also access it via a local REST API. Or you can also call it by using a client library like Langchain. In this video, we'll look at how to install Ollama, how to download and switch between different LLMs, and finally, how to connect to it from a Python application. I'll be using a 2GB Llama2 model on my 2021 MacBook Pro, but I also expect it to work on any recent machine. There's also a bunch of other models available to try as well. Running generative AI locally has been one of the most highly requested topics on this channel, based on the feedbacks and the comments I've received. So I'm finally excited to get to explore this. Let's get started. First, we'll need to install Ollama. Once again, this is not a service or an LLM model itself. It's just an open source platform that will let us download and use a bunch of different open source models through one interface. Follow this link to go to the official Ollama repo on GitHub. And here right at the top, you'll find the download links for Mac and Windows and the installation instructions for Linux. Once you've installed it, you can run this command in your terminal to try it out. If you're running this for the first time, it will first have to download the model, which is about four gigabytes. Once it's ready, you get this interface in your terminal where you can chat with the LLM locally. Since you're watching a video on how to use local open sourced LLM's, chances are you value the ability to be model agnostic. Or in other words, to be able to swap to different AI models easily with minimal change to your application code. With Ollama, you can do this very easily. All the major open source models are already available on it, and you can just switch to them by changing the name of the model. Here's some examples of popular LLM models. There's Llama2, which we just used, Mistral, and Gemma. If you want to see a full list of all the models available, you can find them under the model library on the official GitHub repo. Or you can find even more information on the library page of the official Ollama website. There's a lot of options for customizing the prompt or the model as well. You can even train or fine tune your own model. But that's a little bit out of scope for this video, so let me know in the comments if that's something you're interested to see. So now that you've got your LLM's working locally on your machine, you're probably now wondering how to build an app with it. The easiest way is to run Ollama as a local server on your machine and just interact with it via a REST API, as if it was an online service. Of course, this is still all happening locally on your machine. To start up Ollama as a server, just use this command. If you are already running it in the background, it might complain that the port is already in use, but that's okay as well. If you're on a Mac, you can check whether or not Ollama is running by looking at the top icon bar here, and if you see this icon, you can force it to quit by pressing this button. Now if I run it again, I should be able to start the Ollama server from my terminal. And if you see something like this, it means it's probably running. You could do this with a curl request directly in your terminal, just like in this code snippet here. And here's an example of the response you get back from that request. But as you notice, there's quite a lot of metadata, and this is quite a complex object to work with. If you want an easier way to work with just the response text itself, then we're going to cover that too in a later chapter. If you want to call the endpoint directly from your code, you can also use one of the available clients. There's an official library for Python and JavaScript. This will make it slightly easier to interact with Ollama from your application code without having to create your own API request yourself. Or if you want to use a different language, then there's also a bunch of community maintained client libraries available on the Ollama GitHub. So now you know how to start a local LLM server and how to access it programmatically with an API. But if you are developing an LLM app in Python, for example, you'll probably be using a higher level library like Langchain. This is easier than just calling Ollama directly because it gives you a better abstraction along with a wider set of tools. Once you have Langchain installed and have the Ollama server running, you can just make a call to your local LLM with just a few lines of code. So now I'm in my VSCode editor and I have a main Python file here with just the line of code you saw earlier. And I've got my LLM server running as well in a different terminal window. So here I could just run the script now and then I should just get the response stored in this result variable here. And here I just get the text content of the response that I can use directly. So I don't have to deal with all the metadata that we saw earlier if we use the API directly. I think that this just makes the code a lot easier to read and also the response easier to work with. If you have a Langchain app that you're working on already, you can make it run completely locally by replacing any of your existing service calls to things like OpenAI or AWS with Ollama instead. And it's going to be free because you don't have to pay for tokens anymore. You'll also have more control of the AI model itself since it's on your machine and it's open source. If you're looking for project ideas or tutorials for generative AI, then I also recommend you checking out my previous video on retrieval augmented generation. This will let you customize an LLM agent with your own knowledge source and documents. I didn't use a local LLM in that video, but now that you've seen this one, you might be able to figure out how to modify it if that's what you wanted. If you have any other video suggestions or project ideas on applying generative AI, please let me know in the comments. Otherwise, I'll see you in the next one.

Info

Channel: pixegami

Views: 5,594

Rating: undefined out of 5

Keywords: ollama, llms, run llm locally, local llm, python llm local, ai models, llm local, langchain llm, local langchain, python ai local, generative ai local, llama 2, mistral, ollama tutorial, local ollama, python ollama, what is ollama, ollama python api, local llm install, local llm langchain, ollama server, ollama api, ollama python, ollama langchain, ollama llm, ollama ai, local ai

Id: SZNRkGqaYTI

Channel Id: undefined

Length: 6min 5sec (365 seconds)

Published: Mon Apr 08 2024