Ollama on Windows: How to install and use it with OpenWebUI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
This week we had quite a few announcements in  the AI world. First, there was Google kicking   it off with Gemini 1.5. Then we had OpenAI,  which released their Sora model, which is an   AI that can create realistic scenes from text.  Then there was Meta with V-JEPA, and we had a   little release from Ollama, which made a preview  for Windows available. I think that's amazing,   it can be used right now. Ollama was always one  of the easiest ways to get started with local   large language models [on the Mac]. It packages  llama.cpp, which can use a CPU or GPUs under the   hood, and provides a local OpenAI API that you can  use from clients, such as OpenWebUI. With Ollama   on Windows, you can use NVIDIA GPUs, and you can  decide how many layers of the model should run on   a GPU or on a CPU. In addition, llama.cpp uses  quantized models. This is why you can run quite   large models on your PC with llama.cpp and Olama,  of course. We will run this on my PC, which has a   GPU, which is quite old. It is an NVIDIA RTX 2070  Super, which has eight gigabyte RAM. So let's get   started. In this blog article, you can see how  you can install it. Basically, it's just download   Ollama on Windows, double-click the installer,  and run it. When you want to use the GPU,   you need NVIDIA drivers beforehand, but I have  them already installed. I think they distribute   the CUDA toolkit, so it is enough to run it on top  of the NVIDIA drivers only. I think it's installed   now, hopefully. There it is already. So I can now  view logs and quit Ollama. Nothing really happens,   actually, because it's an API. You can check if  you have Ollama installed by running "ollama",   and you can see the commands available to you.  You have basically the serve commands, create,   show, run, and so on and so on. I think it's  already running the server, but we need to run   a model at first. "ollama run llama2". So let's  try this model. To show the tokens per second,   you start the run command with the verbose option:  "ollama run llama2 --verbose. So, okay, 24 tokens   per second. Not too bad. To end this command line,  you just enter "/bye", and that's it. This is the   easy way to get started using a large language  model on your Windows PC, but there is a nicer   way, which is using a web chat UI, like you maybe  know from the OpenAI ChatGPT web client. There's   the project called OpenWebUI, and this gives you a  web UI on your PC, which can also use Ollama under   the hood. To install it, you need Docker on your  system, so either Docker Desktop or maybe Rancher   Desktop, for example, and then you can just run  the command, which is given on this readme page,   which is here ... to, on your computer, use this  command. You can also use more than one model.   When you run "ollama list", you can see which  models are available right now. As we have seen   so many visual stuff in the news this week, I want  to have a visual model at least in my video here,   so I run the llava model. That's it. Now I also  want to run a model that's too large for my GPU.   You can find the models on the Olama website in  the models area, and we can maybe use llama2,   but not the 7B model, which is the default tag,  but the 13B model. You can see the default is   when we run olama run llama2, it will run the tag  "latest", which is basically the same technical   tag like the 7B model. When we want to run a  larger model like this one, or maybe the 13B chat,   we can run "ollama run llama2:13B-chat". Let's  start this one. So let's test this again. Okay,   it works. Yeah, now we should be able to just run  the OpenWebUI. Where is it? on localhost:3000,   sign up. So that's it. Got it. And I can also type  in 42. Model not selected. Ah, that's correct.   Let's check the small one first, 42. Okay. Okay.  That is enough. And then let's check another   model. New chat, select model. Now I'll use the  image model. That's normal. Yes, yes. Okay. Okay,   it at least sees that it is a Llama. As  you can see, Ollama is a nice and easy   way to get started using large language models  locally. It can use NVIDIA GPUs. Unfortunately,   at the moment it cannot use AMD GPUs, but that  may change in the future. Of course, you cannot   just use it from this WebchatUI, but also from  Python or any other client. The prerequisite is   that the client supports the OpenAI API, and then  you can just use it. That's it. See you next time.
Info
Channel: Mitja Martini
Views: 13,689
Rating: undefined out of 5
Keywords:
Id: z8xi44O3hvY
Channel Id: undefined
Length: 7min 7sec (427 seconds)
Published: Sun Feb 18 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.