Run a Local, Private LLM By Downloading Just 1 File - ChatGPT-like Bot on Your PC!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello there my name is Gary Sims and this is Gary explains now the ability to run large language models uh locally rather than using a service up in the cloud is becoming increasingly important for developers enthusiasts students even privacy Advocates and I foresee that really in as we're heading now to 2024 it's become even more uh important now there are different ways of doing that here on this channel I've talked about llama.com LM studio now there's a new way to do it where you download just one file just one file regardless of what operating system you're using regardless of what CPU you're using you download one file and that gives you everything that you need so if you want to find out more please let me explain okay so this video is split roughly into two parts first of all a quick demo showing you what you get by downloading that one file and then a kind of a bit of a look at some of the details what it's all about where it's coming from and so on okay let's get cracking okay so the first thing you should do is go over to the Llama file GitHub repository scroll down here in a little bit there's a quick start guide and it gives you the direct link that you need to download notice it is almost 4 gigb download that file onto your PC now I'll go more into the details of this a little later but if you notice here that file uh was just Lama file if you're running on Windows you need to add XE Exe on to the end as I say I'll cover this more in detail in just a moment then once you've got that file just double click on it that will open up a terminal window and llama do CPP which I've covered here before on this channel will appear in the window and it will start to run and then a window will open on your web browser so here is the window that has open now there are lots of things you can fiddle with here lots of stuff to play with but just to start using it we could go in here and we could say list five things to do in London full stop send and then you start to get a reply just like you would if you were using uh chat GPT or Bard or any of the other ones that are now available of course the difference is you're running this here on your PC that file you downloaded has got all of the the large language model in it it's got all the stuff you need to run it run the web browser and now you start to get up your own uh version of it you don't need to get to the internet none of you information is shared anywhere this is all coming from uh your PC locally and obviously the speed of your PC would affect the speed of the the output the best results I've had are with a an apple M1 um air MacBook Air that I've got if you had an M2 or an M3 the results will be even better because that also it's a fast CPU of course but also it uses uh the GPU which is not using in this case here on my Windows PC but we'll go again more details of that in a moment now now this uh model that they give you here is not only just a text model you can also upload images so I'm going to click on upload image and pick an image to upload okay so there is an image that I've picked actually it's an AI generated image but we can say to it now uh describe this image now it did take a few minutes to get to it here the image features a brown and white raccoon sitting on an office desk using a laptop computer it appears to be focused on the task at hand or possibly browsing through some content and then it goes onto this describe it so here we have a model that can do both text and media stuff running locally on your PC okay so let's just quickly go through some of the details llama file distribute and run llms large language models with a single file the goal is to make open-source large language models much more accessible to both developers and end users done by combining ll. CPP the project that's actually able to run the model and cosmopolitan libc that's a a runtime Library into one framework that collapses all of the complexity of lmm down to a single file called a llama file that runs locally on most computers with no installation so it's a single file and runs locally on most computers you don't download a different file for a Mac download a different file for x86 download a different file for Raspberry Pi on arm you download one file and it works on all them it's really quite amazing actually so how we do it on Windows as I've just shown you you download that file from the GitHub repository you open up the file explorer you go to downloads folder you rename the file by adding. Exe on the end double click on the file which is what I showed you a terminal window will pop up as will the uh web browser and when you're done chatting you need to go back to the terminal and hit contrl C in fact let's just do that now so there's the terminal I had this running in I hit contrl C and now it goes away the whole thing's been shut down frees up all the memory and so on now if you're running on Mac OS or Linux then you download the same file you don't need to addon Exe on the end you open up a terminal window you will probably need to Grant permissions for the file to become executable so by doing that you use change mode tood plus X make it executable and then the name of the file and then you run the file just by running it in your terminal dot meaning the current directory slash and then the name of the file and that's it the same thing will happen in the browser will start up and it will start running now I've tried this on a MacBook on a Windows machine on on a Raspberry Pi on a Jetson Orin uh so that's with arm stuff and with uh Intel stuff or amd64 stuff it works and all of them now there might be a couple of gotchas I want to listen here on a Mac OS with apple silicon you need to have xcode installed for llama to be able to bootstrap itself so if you are doing this on an M1 an M2 or an M3 machine you do have to have X code installed X code is free available from the App Store uh and if you're using uh Zed shell and you have trouble running the file try running it like this now all this stuff is listed on the the GitHub repo uh so you should find all these instructions but I'm just highlighting them them for you here and on Linux and I did have this problem on the Jetson or in okay it won't run the file so you run these commands here again they're all there on the GitHub repo so it works out about how to run these AP format files okay once you do that it works uh I did find that because this is a temporary setting here the echoing to these things here you have to do it every time you open up a new terminal but if you do that it works no problem now it can run in General on six different os's so it can run on Linux anything from 2.6 which is you know pretty old uh Mac OS 15.6 on arm 64 or on the old Intel Max you can run it on Windows 8 upwards on any uh Intel or AMD 64bit processor notice here there's no arm 64 here for Windows 10 or 11 which is a shame but there you go it will work on free BSD 13 onwards net BSD 9.2 onwards open bsd7 onwards all with uh Intel 64-bit or AMD 64bit uh and uh the GPU support uh varies across platform as you can see here so it runs on AMD microprocessors they must have ss3 uh when they're saying amd64 here this is them pointing out the Intel's 64-bit architecture is really what they borrowed from AMD so this is for Intel 64-bit and for AMD 64-bit otherwise Alarma file will print out an error so basically anything that's an Intel core or newer from 2006 should work and anything that is um a bulldozzer or greater design from AMD from 2011 should work if you have a newer CPU with AVX or better yet avx2 then lamaar will utilize that doesn't yet use AVX 512 arm 6 4 microprocessor must be rv8 uh a plus which means anything from Apple silicon to 64bit Raj Pi will work it does I've tried it on the rajb pi also works on the Jetson Orin and I'm pretty sure it will work on other Jetson boards as well so basically Intel AMD or arm processors if from Apple or from Nvidia or from broadcom it's going to work and all from one file this is the thing that you got to remember this is just one file you download and it just works okay GPU support and apple silicon everything will work as long as you got that X code installed on Linux there is some stuff you need to do it does try to compile some drivers along the way you need to read these details if that's what you're trying to achieve and on Windows the same you're going to need to open up a visual studio uh 64-bit native environment you're also going to need the Cuda uh SDK installed and you need to be able to comp that the version I was showing you although I do have an Nvidia GPU was just running on the CPU okay so there you have it I would love to hear in the comments below if you download it and give it a try do let me know what you think okay that's it I'll see you in the next [Music] one
Info
Channel: Gary Explains
Views: 11,135
Rating: undefined out of 5
Keywords: Gary Explains, Tech, Explanation, Tutorial, Llamafile, LLM, LLaVA, llama.cpp, large language model, Cosmopolitan Libc, Mozilla Ocho, Mistral-7B, WizardCoder-Python-13B, Mistral-7B-Instruct, Apple Silicon, Windows, Linux, macOS, AMD64, ARM64, Raspberry Pi
Id: xD999pkcrks
Channel Id: undefined
Length: 9min 22sec (562 seconds)
Published: Thu Dec 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.