Run your own large language model with Mozilla's Llamafile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Wasilla just rolled out a really cool project called llama file which allows you to run large language models on your local computer from a single file so why is this useful it means you don't need any network connection you don't have to be dependent on something like open AI you can run any compatible large language model so to start I'm just going to show you what it looks like I've run llama server and I get this interface with all sorts of options to configure the large language model I'm just going to type in a prompt here write a poem about llamas okay beautiful now if we wanted to say lower the temperature meaning make the response less random or less creative and ask it to write a poem about llamas let's see what it generates okay doesn't want to try to be creative you can get llama file from this GitHub repo from was the documentation in the repo can be a little bit confusing if you're just getting started with machine learning or are not super familiar with code so I will explain how to get up and running in the Mozilla repo they give you some llama files that you can download and run for example mistl 7B instruct and the link here is to hugging face if you haven't used hugging face it's sort of like the GitHub for for AI models so in this case I'm looking at the mistl 7B model weights and if I were to run the Llama file server against this file here it will use mistol as the large language model for that interface that we just saw so let's see what that looks like I'm going to scroll down here and I'll just pick this first GG UF file actually let's go down to their recommended model here and we'll click download I'll see in a bit it's very large so the next thing we'll do is come back to this Mozilla GitHub repo for llama file and we'll download the latest release that's over here on these releases got llama file 2.1 we'll grab the zip and download that once I have that just going to unzip it so I'm just going to put this in a directory put it in my documents just call it llama file and now in here I'll make a directory called weights and see where my downloads at it's almost done we'll go into that downloads folder and now that it's done want to drop this into that weights folder okay so now I've got everything I need to run the server with the model that we downloaded I'm going to open up a terminal window I'm going to change into that directory just drag it over here so we're going to run that llama file server I set the m flag for model and direct it to that weights Mr 7B going to load up see it's pretty fast and there you go got our language model up and running on our local server no open AI no network connection needed and let's ask it something to test it out what animal is depicted on the Mozilla firef Fox logo see if it knows this it's a red panda that's correct nice job mistol so that's the server method of running llama file but you can also just run these files directly so if I were to download the command line binary here I've got that file here let's change into that directory and we'll just run that llama file we're going to pass it this prompt dasp parameter and we'll ask it a question what is Mozilla best known for spinning up the model again asking the prompt what is Mozilla best known for Mozilla is best known for creating and maintaining the Firefox web browser gives us some data on how long everything took and there you go if you want to see all of the commands that you can run you can pass the dashh flag and it'll give you all the different options of things that you can run so to be clear this is not a llama file thing this is what is available via the Llama language model itself if you are more of a reader Simon Willison has a great primer on how to get this up and running and he has a sample where he passes it an image and asks it to describe this plant so that's it for this getting up and running with llama file keep an eye on this repo this is an early release and I know the team is doing a lot of work here if you want to try it out against other models just head to hugging face and look for any ggf model got Zephyr open chat lots of other things coming up on the search list here and good luck thanks
Info
Channel: Practical AI through Prototypes
Views: 6,999
Rating: undefined out of 5
Keywords: llamafile, local ai, mozilla, simon willison, ai, llm, large language models
Id: GjP7y3AiFWc
Channel Id: undefined
Length: 6min 2sec (362 seconds)
Published: Fri Dec 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.