Run Your Own ChatGPT-like LLM on Your Windows PC!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello there my name is Gary Sims and this is Gary explains now previously I've shown in a video how you can run a large language modern llm on your laptop and it will give you similar capabilities to chat GPT but it doesn't go out to the cloud it doesn't do anything else it just runs directly on your laptop now that was using a project called Lama dot CPP now the disadvantage of llama.cpp is it was very much command line based you had to compile the code yourself you have to fix the language models from somewhere you had to kind of type in some odd commands minus this plus that to try and get it to work so if you were a bit technical it was great and I showed how to use it however it wasn't easy to use now the good news is there's now a new app that's come out for Windows and for the Mac that wraps up the llama.cpp project into a nice user interface it's got built-in ways to download the models it's got a chat window so you can start interrogating them straight away it's really simple and that's what I want to look at today so if you want to find out more please let me explain [Music] okay so the first thing you need to do is go over to lmstudio.ai to get to the LM Studio website that is what we have here in front of us few things to note first so you can download LM studio for the Mac with apple silicon or you can download LM Studio for Windows I've downloaded both they function pretty much exactly the same way there are some differences for example about any potential GPU support but this is mainly designed as it says down here made possible thanks to the llama.cpp project of course alarma.cbb budget is designed to run this primarily on the CPU without needing to have any particular GPU now what can you do with LM Studio you can run large language models on your laptop entirely offline or on your PC on your desktop as well which is what I'm doing you can it's already got a chat mode so you can use the models via the UI there and you get your kind of chat mode just like you would with chat GPD that makes it really videos because you can literally download a model and start using it all within the same UI just straight away and as I said you can download any compatible model files from the hugging face repositories and we'll go into that in a second so once you click the download you then need to install it there's one thing to note about the install and that is that this is an unsigned binary so windows will complain saying hey where did you get this buy a new from are you sure you want to run it now this is an open source other than the parts that you find in llama dot CPP so none of the UI and all that so there is a risk I haven't noticed anything adverse I've not had any virus warnings or anything but just so you know there's an unsigned binary and you have to install it on your machine okay so this is LM Studio up and running here on Windows this is the home screen and you are presented with a search box which we'll use in a moment and then basically here is a kind of a curated list of interesting models that you might want to try so here is code armor as a 7 billion version and you could basically click on down and let's just go through some of these first of all it tells you how much RAM you're gonna need now I've got 20 Gigabytes in my machine I found that I can run the 7 billion models fairly easily I can run a few of the 13 billion connection models if I close everything else down including uh you know Google Chrome and any other big apps I've recently just got all of my free memory I can just about run them I have not had any success in running any 34 billion parameter models for that you're going to need 32 gigs 64 gigs of RAM now the other thing is there are different files you can download it's highlight a couple for you so to make these things run on a laptop on a CPU without any uh kind of GPU it uses a quantization methodology which reduces the complexity of the neural network down from Maybe 16-bit floating Point numbers to 8-bit to even four bit and so you get to download these models and already even doing that the models are four five six seven eight nine ten gigabytes in size and you need to use that if you wanted to run the full models without any quantization then you're going to need to have specialist hardware gpus and so on and the quantization basically reduces the accuracy slightly but using methods that hopefully don't lose too much of the functionality so you have to play around so for example if we click on see all files what we're going to get is a list of all of the files it's all the same model this is code llama seven billion it's code Lama instruct okay but they've applied different levels of quantization so here you can see you get a three gig file right down to the other end you get a seven gig file and this directly impacts on the features the performance and on how much memory it takes now I've had a lot of success with the 4-bit ks ones there are different very different techniques for the quantization you explode stuff you can read on the internet about this but basically these are the ones I found are the mo give you that balance between memory usage and functionality if you can use the uh Q4 KS models so back here on the home page you can scroll down and look at lots and lots of different models that they recommend you should try out they're telling you where it's coming from so this is a llama one they've done some other things they've done some extra training for it and so on and so on gives you the size and you can really easily just pick a model and play with it oh I quite like that one oh let me try out that one and as new models appear they will appear on this home page if they are useful you can also just search the hugging face page if I just type in here if I want to I can just type in llama and it will give me a list of all of the Llama based files here we are look 337 of them and you can sort by for example by most recent so that's one here from September the 8th peanut butter so you know you can then click over here and it will open up the web browser and show you that page from the hugging face page describing what this file is where it all comes from is it worth looking at and so on and you can also sort by most downloads so you can see that the Llama 2 7 billion 13 billion chats have been very very uh popular and people have downloaded those a lot and then obviously a more uh interesting ones and more Niche ones I've been getting less attention so if you're looking for a particular model you can search that from the hugging face page there now once you've downloaded some models mutuals you can download however you want note there is this folder here where you can Define where you keep your model so you can change it I've put it on a separate Drive D backslash NM Studio because that's where I've got some space because we're talking four five eight ten gig files each time and so you can actually download these are the ones I've been downloading and playing with and once they download you can hit this chat box here and start using them so you pick a models here all the ones I've been uh downloading so what I want to try is the Llama 2 chat 7 billion connections the K air s ones as I said those are the ones I've been having a lot of success with so if you click on that it will now load that model into the computer memory and as you can see it's big file takes a lot of time to load it up and also what's happening is that in a minute here is going to tell me how much RAM it's used to do that so let's just wait and see what there we go it's starting to crank up 2.6 gigs three gigs of RAM just to look at the 3.4 gigs four gigs it's now almost five gigs it's used to load up that model and now we can start chatting to it just as if we were you know chatting to chat GPT so I can enter the message in here give me your review of Casablanca now because this is running on the CPU this is not going to be lightning fast but if you notice it's actually not that much slower than the kind of speed you're getting out of chat GPT if you were connecting over the internet to their servers so Casablanca is a classic movie that stood the test of time and continues to Captivate audiences with the Timeless themes of Love loyalty and sacrifice releasing 1942 so let just see what it tells whether it's actually gives any controversial opinions or is this just going to be a summary of all-star cast you know just a summary of what you would kind of know about casablanc I tried this earlier it did say that it didn't like some of the performances of the secondary characters the dialogue is witty and engaging okay so let's see what else it has to say we'll wait for this just to complete okay so it's finished and the last night I highly recommend Castle blanket to anyone looking for a thoughtful and engaging cinematic experience that will leave you both entertained and inspired now of course just like all of these gpts we can ask you all kinds of uh questions so let's just run through a few more uh let's try this one that could be interesting uh tell me the best places to travel on a budget of a thousand dollars including har hire flights and accommodation so it will start to give us its answers for that let's just wait till it finishes okay so it's come up with the list there Thailand Indonesia Costa Rica New Zealand I found that was an interesting one uh but they are saying you could fly out there maybe for 700 and then stay in some hostels at 25 a night Bane Mexico Croatia Brazil South Africa Philippine all uh great uh choices there obviously costing is different but it's interesting that it's given some very interesting uh destinations that you can go to so remember again this is all running on my Keys it's not connecting up to the cloud it's not connecting to a model running in the cloud which is running all on my PC and what's great about this is eventually what we're going to see in the future is a kind of chat GPT level functionality built into everything that runs locally so not going up to the cloud and of course this is going to you know revolutionize the way we work because it's all running locally just like a spell checker really it's kind of the next kind of level after that okay let's just try uh one more thing and then I'll show you that these models like for example this seven billion connection model is not going to be good at everything so let's just try one here create a thank you email template I can send after completing an interview so that should be a fairly thing for these uh simple things for these models to do as again remember this is running locally on the PC not getting up to the cloud okay so there you go uh it's you know even done you know the subject because it knows it's an email uh idea and then you can put in the interviews now I want to take a moment to express my sincere gratitude for the opportunity to interview with and then company NeXT it's a template and you can go through all that and customize it as you want so these are the kind of things that these locally run uh models llms large language models can do on your PC without going up to the cloud now there are some things it can't do now this is a particularly hard thing in fact only chat GPT 4 could do this so if I said something like this uh this this is something very Advanced explain the plot of Shrek that's the famous cartoon movie in a sentence where each word has to begin with the next letter in the alphabet from A to Z without repeating uh any letters that's a very hard thing for it to do uh a orc awakened by the loud noise ogre Shrek grumbled and groggly got out of bed so B so it's actually doing quite well it's talking about Shrek it is using ABC and it is starting each sentence with that it is talking about what he's doing he's getting up and he's yawning uh and then he donned his armor a suit of green and grimy gear uh eagerly he embarked on his journey through the enchanted fire so it is talking about Shrek this is kind of what happens in Shrek uh so there's a range of bit of course it's not what I asked but what I asked for was a sentence where each uh word uh began with a different letter going through the alphabet it is doing it a b c d e f g h so it's doing very well if you asked uh chat GPT to do that what you get is annoyed by cheeky donkey exiled fairy tale gang heroic impressive Jolly Knight loves Majestic Noble ogre princess quickly reveals secret transforms unite vanquishes Wicked xenophobic youngster zsly so that is trying really hard to sum up the plot of Shrek and he's doing a pretty good job since I told you I had to do it with a b c d and it does work annoyed a Bye b cheeky C D donkey so it is working so obviously the complexities of these models depend on what they're able to do that this was pretty interesting now I would like to do a future video on using Code instruct that's the one that we're seeing here which is designed specifically for writing code uh and so you know it can do Python and so on and so that would be very good if that would interest you to run code instruct and seeing what it's capable of doing please tell me in the comments below and I'll make a video uh just about that okay that about wraps up so this is LM Studio I really would recommend you give it a try I'm not affiliated with this in any way whatsoever but it really is great because you can download models just with a click and then you can try them just with a click over here and start typing in new things pick different models which ones do different things how that well they're working it's great if you've got lots and lots of memory there is some uh GPU support if you go over to here in the chat you can open a little window here you can go down to here and you can enable some GPU acceleration that's worth playing with as well that does help uh speed up things and also helps the memory usage love to hear if you give this a play uh what you think of locally run llms in the future do you think this is going to become a big part like I do I'd love to hear your thoughts in the comments below okay that's it my name is Gary Sims this is Gary explained I really hope you enjoyed this video if you did please do give it a thumbs up if you like these kind of videos and as I said that one about uh code instruct I'm thinking about doing do subscribe to the channel also please do drop a comment so I can hear your thoughts on it okay that's it I'll see in the next one [Music] foreign

Info

Channel: Gary Explains

Views: 15,435

Rating: undefined out of 5

Keywords: Gary Explains, Tech, Explanation, Tutorial, ChatGPT, LLAMA, AI chatbot, GPT, Windows PC, Windows, Run AI locally, Local AI, Generative Pre-trained Transformer, LLM, large language model, Chatbot, Windows Chatbot, Hugging Face, llama.cpp, Large Language Model Meta AI, Meta AI, AI, Meta, Facebook

Id: x4_iwOznvgE

Channel Id: undefined

Length: 14min 49sec (889 seconds)

Published: Fri Sep 08 2023