Hugging Face GGUF Models locally with Ollama

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

ggf is a file format used for storing large language models and in this video we're going to learn how to run hugging face models in GG UF format on our machine using ol Lama so let's head over to the hugging face website and we're going to just click in the search bar at the top and we're going to type in ggf and you can see it comes back with models which mentioned this and there are more than 1,000 of those but I find generally for running models on machine it's better to get those 7 billion parameter ones so let's add in a 7B at the end and you can see now we're down to just under 400 models which are still quite quite a few to choose from uh on the search page let's sort it say by recently updated so we'll get the latest ones uh and we'll scroll down and let's find one that we can use so the bloke has been creating loads of these so let's let's pick one that that he's been working on so we'll pick mistal light and if we click on that we get this this page and it has lots of information about the model and if we sort of go all the way down we can see that there's actually lots of different versions of this model that have been created by the bloke the ones at the top uh tend to have been quantized much more so they they will be smaller and run quicker but generally with less U quality and as you go down the quality goes up but the size and the time that it takes to run go up as well so it seems to be like there's one in the middle Q4 korm that seems to be a good one so it says it's got medium Balan quality and is recommended so let's let's pick that one so if we click on that we get taken through to a page describing that particular model now we can download these models using the hugging face Hub CLI so if we come over to the terminal we're going to have a look at my uh poetry Pi Project file and you can see there in the middle I've got the hugging face Hub dependency so let's start writing this we're going to say poetry run hugging face CLI download and then you need to put in the name of the repository so we'll just come back and we're going to copy the repository from that page and then we'll go back and paste it in and then the next thing we need is the name of the file itself so let's go back to the page and we'll get that as well and we'll paste that in now if you don't pass in the file it will try to download all the files which will take up a lot of space so you probably want to make sure you just have one file once we've done that we're going to tell it where to download it so we'll say local directory downloads and then we'll say don't use any Sim links and then we're going to we're going to run it now it comes up with this message telling you that you should use HF transfer for faster downloads but I have actually had a look at that and I I find this method that we're using is actually perfectly fine and I so I I I wouldn't necessarily recommend against using this hugging face CLI let's speed this up a bit because we've all got things to do and you can see it takes about a minute to download it I have got this connected on ethernet and then afterwards we can have a look at our downloads folder and you can see in there we've got our mystal light ggf file and it's 4.1 GB now we're going to learn how to run this model on our machine using a tool called the Lor now Alama is a tool that lets you run llms on your own machine at the moment it works on Mac and Linux but Windows support is coming soon and we're going to have to create something called a model file so I I kind of think of this as like a Docker file but for llms and we're in there we can say from and then we need to put the location of the ggf file so it's do/ downloads and then Mr light at ggf uh we'll close that now and what we can do is we can call the alarm create command we'll say what what name do you want to give your models we'll say mral light and then point it at the location of the file and it will say it's going to created it it will only take a few seconds and and then we've got our file which we can see if we type in a ll list as you can s see in the middle there two seconds ago we've got our mistal light latest model we can then run this model using the AL run command so we'll say Alama run mistal light I know what is grafana and you can see it sort of gives us like a big explanation of the the grafana visualization tool now I want to conclude this by showing you a tool that let you see what's going on when you're running these models so the one that I'm using for Apple silicon is called ay top but if you look on that that page and I'll include the link below there are other ones for for different operating systems and so we're going to just split the terminal into into two and I'm going to do Pudo asy top and then what we're going to do is Rerun the previous command where it was asking what is grafana as you can see at the top there's my machine's spec and while this is running the GPU is at 99% and my RAM usage is increasing by about three gigabytes or so and then it goes back down once it's finished and so being able to use ggf models is quite a neat feature of Al LL and it means you can take any of those other thousand ggf models and run them locally but Al also comes with many built-in models one of those is mystal AI and if you want to learn more about that check out this video up here and I'll see you in the next one

Info

Channel: Learn Data with Mark

Views: 9,483

Rating: undefined out of 5

Keywords:

Id: 7BH4C6-HP14

Channel Id: undefined

Length: 4min 55sec (295 seconds)

Published: Fri Oct 20 2023