Running Mistral AI on your machine with Ollama

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

last week Mistral AI announced the release of their first large language model trained with 7 billion parameters it claims to have perform meta's 13 billion parameter Lama 2. we're going to try it out using a tool called a llama which lets you run llms locally let's download a llama from the download page and then we'll install it and once we've done that we're going to navigate to the models page and if we browse through we can see lots of different ones and eventually we get to Mistral so let's click on that and so on this page we can see a bunch of information about the model how to use it and the memory requirements now let's click on the tags tab at the top where we can see all the different variants that have been made available for this model so what we're going to do next is move over to the terminal and we're going to call the alarm command then if we just call it with no arguments we get back a list of the available commands and so the one that we're interested in first is called pull so we can say a llama pool and then we can pass in the name off the model that we want to pull down to our machine so here we'll say Mistral instruct for example and you can see it says it does it crazy fast but that's because I've already downloaded it before recording this we can then call a llama list and that will give us back a list of all the models that we've got on our machine so you can see I've got I've got a few I've got the Falcon one I've got the Llama two and then you can see there in the middle I've got Mistral as well what we're going to do now is run the model we can do that by calling a llama run and then the name of the model so we'll go alarm run Mistral and we're going to call it in verbose mode so that we can we can get some extra metadata after each uh prompt is executed this launches it in interactive mode and then we can ask her questions we'll say what is Apache Pina and you can see that it sort of prints out the results in real time as the tokens are coming through and then once it's done it tells us how long it took and how long how many tokens we use then how how quickly it was it was rendering the tokens and you can see the answer is not bad I mean I mean I find with with some of these smaller models sometimes the factual questions can give you the wrong answer so in this case it says Pinot was developed by Facebook when it was actually developed at LinkedIn but the rest of it looks alright let's ask a follow-up question so how does it compare to Apache Kafka and you can see it gives us better results and they're both open source distributed streaming platforms that's pretty good and then contrast them uh with each other and I think it's done it's done a reasonably good job there there might be a few things that you might tweak but that's not a bad answer now on the product page it says that Mistral is actually optimized for tasks like summarization and classification so let's see if we can give it a try with some of those with the help of this BBC article so the article is about a use of the video assistant referee or VAR in the Liverpool versus Tottenham match last weekend and basically things went very very wrong so for simplicity's sake I've copied all the text from that page into a text file on my machine uh let's just have a look at it so if we scroll through you can see we've just got just the text so no HTML or anything like that we've just got the pure text what we're going to do now is we're going to call alarm run again but this time we're going to pass in the prompt directly rather than going into interactive mode and we're going to say please can you summarize this article and then we're just going to cut the article into the prompts it's just going to have all the text from the article in the prompt and you can see it starts printing out a result so it says this article reports the situation surrounding the VAR is at a crisis Point following the controversial decision and so on and I think it's a really good it's a really good answer and it took about like in total just over six seconds to give us that result how about if we ask it something else let's say can you pull out five bullet points from that same article again it does a good job pulls out five pretty good points how about if we try out the categorization so let's say hey if you had to categorize this article what tags would you use and so it comes up with some some pretty good tags here the only one that's a bit confusing to me is why does it say it's FIFA World Cup 2018 I wonder whether that was mentioned on the page I'm not sure otherwise how it came up with that if we did this again maybe we could guide it a bit and say hey just give me top five tags or something like that now as well as using the CLI we can call alarm on models via HTTP API so let's try that so we'll use the curl command we can call localhost 11434 a slash API slash generate and then we need to pass in some Json so we need to give it the name of the model so in this case Mistral and then the prompt so in this case let's say what is the sentiment of this sentence so it's the first sentence from the article and then we'll just put the results into list and you can see it comes back we get a stream of of Json which has the sentiment of this sentence is negative and then and then it has a little bit more and then at the end it has all the metadata as well that's kind of a bit tricky to to process but luckily for us there are libraries that uh that make this easier so we're going to conclude by learning how to call the model in Python code using the Llama index library now I've been using poetry for all my python projects so we're just going to have a quick look at my pipe project a tunnel file so you can see here under the dependency under the dependency section I've got Lamar index everything else is just is just the default of what it created for me okay so first of all we're going to say from Lumber index llms import Alarma then we're going to create an instance of the model so we're going to pass in Mistral instruct as our model name next we're going to just load that BBC file into memory so we'll just create a text variable and read it and then we'll just print out the first 500 characters so you can see we've got we've got the text ready and now we're going to ask the model to do a bit of entity extraction so which people are mentioned in this article and then we'll pass in the article and as you can see it comes back with the response so it looks it looks pretty good but let's just print it out by calling response dot text and you can see it's pulled out and I think that that is pretty much all the people who are on the page we've got Jamie carragher we've got the pgo mol that's the governing body for referees as it tells us in Brackets we've got it and we've got the players involved as well so I think Mr are both awesome and I'll definitely be playing around with these tools more if you want to learn more about running llms on your own machine check out this video here where I show how to run hugging face models locally

Info

Channel: Learn Data with Mark

Views: 7,377

Rating: undefined out of 5

Keywords:

Id: NFgEgqua-fg

Channel Id: undefined

Length: 6min 25sec (385 seconds)

Published: Thu Oct 05 2023