Getting Started on Ollama

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video you'll get from Zero to Hero on olama and using AI on your local machine whether you're on Mac or Windows or on Linux I'm Matt Williams and I was a founding member of theama Team until well until earlier this year now I am 100% focused on building great content getting you up to speed on how to use everything in the project I may go pretty quickly in this video to get everything into a reasonable amount of time but hey it's YouTube so you can pause go back or skip ahead at any point and if you have any questions you can either comment below or join us on the Discord at discord.gg olama okay so before we install olama let's make sure you have the hardware required olama is going to need either Mac OS on appal Silicon or a Linux drro based on system D such as Ubuntu or or Microsoft Windows you may be able to get it working on Mac OS on Intel or on a non systemd Dro it won't be supported by the olama team you can get help on the Discord but you're a bit on your own there and to have the best experience on Linux or Windows you need a recent GPU from Nvidia or AMD Apple silicon includes the GPU so you don't have to worry about that there you may be tempted by some of the cheap Kepler Nvidia cards on eBay but they're not going to work with AMA they're just too slow basically your card needs a compute capability of five or higher there are similar requirements for AMD cards here is a list of Nvidia cards currently supported and here are the AMD cards currently supported olama can also use CPU only if you don't have a GPU but performance is going to be a annoyingly slow now make sure you have the drivers for your GPU installed for NVIDIA that means Cuda and for AMD that means Rock M so with that out of the way let's get it installed first go to ama.com in your web browser click the download button and go to your OS Mac and windows will have an installer and Linux has an install script to run run the install and at the other end it's going to be roughly the same on all platforms I have another video just on the installation process on all the platforms that you can watch here the process is pretty transparent and there isn't a whole lot going on here other than copying a file and and setting up a service so now there is a service running in the background that service is what actually runs all the processes for olama and then there's a client that client is a commandline client for some this might feel a little scary but it doesn't have to be there are plenty of uis out there as well but the process for using them is the same as the CLI you type in the text you want to send to the model and you press enter Then you see the results show up as text that's the same as what you do in the CLI so try it out once you get used to it you may never want to go back to any of the slower goys the first step with a AMA is to get a model there are a lot of models out there you could go with the original that came out just as AMA started that's llama 2 from meta a more recent one is Gemma from Google mistol and mixol are also pretty popular for this one let's start with mistol so you can download the model using the command olama pull mistol that's going to pull the 7 billion parameter version of mistol from the mistol AI team it actually pulls the files from the library at ama.com that may take a few minutes so while we wait visit ama.com library in your browser this is the list of models available they're listed by featured but you can choose to sort by most popular or by most recent as well click on mistl that's the one with the S and not the one with an X every model shows a short description and then some info about the latest tag latest is a unfortunate name here since it's not actually the latest but rather the most common variant we can see the model family the number of parameters the quantization and depending on the model we may see parameters a template a system prompt a license and and maybe more each of these things are the layers of the model AMA is a little different than lots of other tools because it considers a model to be every everything you need to start using the model most other tools think a model is just the weights file of the model which is which is the really big file if you don't want to go all in on AMA right now and be able to use some of the olama models in other tools take a look at this video that'll allow you to sync the model weights with those other tools this tool takes all the files that have those strange names in theod blobs directory and attaches sensible names to them to learn more about those crazy names and why AMA does what it does check out this video above the table on this page is a number next to the word tags click on that these are all the tags that represent variants of the model they're all mistal in this case but they're different sizes fine-tune different ways Etc notice under each one is a hash value for latest you can see that it is the same hash under V 0.2 and 7B and instruct and 7B instruct v02 q40 that means all of these are aliases for the same file and that's the 7 billion parameter model which is the instruct variant it's version 0.2 and is quantized to four bits there's a lot packed into that name models tend to get smarter and also slower as the number of parameters go up 7 billion parameters is pretty good but 7 billion 32-bit numbers will take 28 GB of RAM and most of us don't have cards that work with that so quantization is this seemingly magic process that reduces the Precision of the numbers to four bits in this case and that's what the Q4 at the end means quantize to four bits so that means it'll fit in about 3 and 1 12 GB of vram your OS and other software needs some memory as well so we usually say 7B takes about 7 GB of vram but that's super rough guidance 4-bit quantization is the size that most default to it tends to perform the best in terms of speed versus every other size and it does a really good job it's actually pretty hard to see much difference between that and the original model the instruct in the middle of the name means that it's been fine-tuned to respond well in a chat versus a text or base model that completes whatever you're saying now the model should be downloaded so you can run it with AMA run mistl this drops you into the prompt or repple repple is usually associated with programming tools it stands for read evaluate print Loop and it's an interactive place to play with commands in the language now that you're in the olama rapple ask the model a question why is the sky blue and pretty quickly we get an answer now notice how it's streaming out one word at a time this is how these models work they figure out what is most probably going to be the next word when they start they don't know what the end is going to be until they reach it which is a lot like how we think and how we speak let's say you want to have a model that always explains complex topics like you're 5 years old an easy way to do this is to create a new model a model as we saw before is a combination of the weights file with parameters a template and maybe a system prompt so let's set a new system prompt in the repple type /set system the user will provide a concept explain the concept in an easy to understand manner so that even a 5-year-old child can understand it now type slave like I'm five now type slby and or you can press contrl D to launch our new model type AMA run like I'm five now type any complicated concept let's try quantum physics and we get an answer that maybe some 5-year-olds might understand press the up Arrow to go to the previous entry and press enter again notice that it's a little different try that again it's different again models are always trying to figure out the most probable next word and often it ends up with different sentences to explain a topic if you need something that responds the same way every single time large language models are not the way to go so now you've created a new model and learned a little more about how models work at this point you can find new models to try and find the ones right for you but watch out some of them can be really really big if you have a slow connection make sure to use the _ no prune environment variable otherwise AMA prunes all the disconnected and half downloaded files each time the service is restarted such as when your machine reboots I have a few videos on my Channel about setting environment variables and there's a good section in the fact about doing it right if you have a model you want to remove just use the command AMA RM and then the model model name to remove it now you should know everything you need to know about how olama works or at least enough to get started with if you have any questions leave them in the comments below or join us on the Discord at discord.gg oama if you really want to try out some of the goys for AMA go to ama.com then click on the link at the top for GitHub now scroll all the way down to the bottom to find the web and desktop Community Integrations there are so many to choose from and there are even more you might find Elsewhere on the web thanks so much for being here goodbye
Info
Channel: Matt Williams
Views: 41,344
Rating: undefined out of 5
Keywords: llama 2, how to install ollama, ollama on macos, llms locally, llama2 locally, large language models, mistral ai, ollama integration, ollama api, installing ollama, llama2 on macos, how to install llama 2 on mac, installing llama on windows, installing llama on linux, installing llama 2 locally
Id: 90ozfdsQOKo
Channel Id: undefined
Length: 11min 25sec (685 seconds)
Published: Tue Mar 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.