Run a GOOD ChatGPT Alternative Locally! - LM Studio Overview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today folks I'd like to show you how to install a quote unquote chat GPT clone essentially an llm locally on your own machine for entirely free there are several benefits to doing this one you can run whatever type of model you want whether it be uncensored fine-tuned for a specific task like coding or is just the latest and greatest open-source model the second reason is basically this costs you nothing you can use the large language model for whatever purpose you want as often as you want for $0 and the third reason we're going to get into is customizability you have a better control over this large language model you can tweak whatever settings you want whatever options you want and in a lot of use cases actually use it for tasks that you wouldn't be able to do in traditional chat GPT oh and the servers never go down so that's the benefit as well you are your own server also this tutorial is like ridiculously easy the app that I'm showing you guys today is freaking awesome cannot stress that enough so folks this is the application we're going to be using to download install and run large language models for completely free entirely locally it's called LM studio and you can see it supports not only Mac and windows but also Linux so if you were worried that it was going to work for your Mac if you're worried that it was going to work for your Windows machine do not fear compatibility for everyone across the board and the download is right on this page all we have to do is click it for whatever device that we have and since I'm on a Windows machine we can see that the download is right up here it's about half a gig in size we're just going to go ahead and click the exe to run it by the way I'm also going to go ahead and exit out of my browser as I no longer need that and we can see right in the center here LM Studio has now installed itself and it will essentially just open right up very quick very easy install so once this bad boy is installed here's where the fun begins we can actually start downloading and installing models right off the bat they're essentially advertised All To Us on this homepage as you can see that they have so it's very very much userfriendly designed to be easy to use for everybody and that's what I love about this the first thing that I see here is going to be llama 3 8B instruct this is the 8B sized model that runs on quite a lot of machines and you can see that they actually have the requirements right up here this is 8 GB plus of ram so as long as you have that much RAM you should be okay to load up this model now depending on how fast your GPU is or CPU how fast the model will actually run can change per system but you should be able to run it regardless meta ai's latest llama model is here it comes in two sizes the 8B size and the 70b size and 70b for reference guys I can't even run on my machine so there's a lot of good 8B models out there though don't worry all I have to do is go down here and click the download button and you can see in the bottom left hand corner we can see model downloads if I click on it we can go ahead and watch this thing download and yes I have used this before so that's why you're seeing all these other models but you can see the download speeds are actually pretty reasonable get in about 100 megabytes per second these models can actually total in large sizes in the gigabytes kind of like a video game so keep that in mind you're going to need some storage space for sure and once it's done downloading you're going to see that it needs to validate file Integrity for a little bit and once it's done that you can actually load up and run the model now like I said guys keep in mind there's a diverse variety of models on here we've got Hermes 2 we've got Google Gemma stable LM zher 3B all kinds of different models for all kinds of use cases and you can go out and find your own and install them really easily so that's something that I'm going to show you in a little bit as you can see our first one here is complete so I'm going to teach you how to actually load up llama 3 and if you're wondering how good llama 3 actually is the Llama 3 8B model is actually very comparable to GPT 3 .5 turbo which Powers the free version of chat GPT so now guys once you switch over to the AI chat section right here this is where you can actually run your models and you'll notice a few things about this interface right over here on the left hand side of the screen you're going to see Untitled chat and you can actually press this button right here or do contrl plus n to make new chats and you can do as many of these as you want you also have the ability to export your chats right up top here you can select the model that you want to load and obviously we're going to to pick the Llama 3 instruct model we just downloaded and it will automatically load into your RAM can actually see how much of your CPU is being used you can see how much of your RAM is being used as well you can also see what kind of quantization the llm is and this is something that I might talk about a little bit later but essentially this is what allows us to run it on LM Studio you can also eject the model like a DS game and select various different presets for different kinds of models and obviously this is going to come in handy for a lot of the use cases like if you're running zypher you want to run this one or if you're R running 53 or if you're running llama 3 which we are we're going to click that and you'll see we have a system prompt here and this essentially tells the AI what it is and what its goal is it gives the AI some context for every response so right now it says you're smart helpful kind efficient AI assistant very simple system prompt there's also Advanced configuration we can change the context length of the model the temperature which is how random it is and etc etc there's a lot of deep intricate settings that you can change here GPU offload for performance repeat penalty top PE sampling these all have their own little eyes here if you are interested in learning a little bit more about what they do I'm not going to go through all of them because quite honestly it's a little bit Advanced but it's definitely one of the larger benefits you can't change all of these individual settings in chat GPT even though in a lot of use cases and a lot of times it might actually be useful just to do that so obviously down here in the bottom we're going to have our text box where we can actually text to the large language model and watch it run locally for free on our own machine hey what is up dude how are you doing this fine day hey there friend I'm doing great and as you can see on my PC it's actually generating at a pretty decent rate I would not consider that too slow at all we've also got some buttons at the bottom here that are very useful we can regenerate here and it will give us a new variation or a new response to that same exact prompt and there's also a continue button which will try to continue off of whatever text it sees already and you can see right now I just pressed continue and it's not really doing anything because the llm has decided that this is over but I'll show you where continuing might actually be useful and again this is something that you can't necessarily do in the same regard with regular chat GPT if I go to the very bottom here you can see that we have this button that lets us toggle between sending messages as a user or as an assistant and you can see there sending messages as the assistant is equivalent to manually inserting messages into chat history without triggering the actual inferencing of the model so this essentially allows you to steer the conversation in whatever Direction you want or actually send an assistant message and then let it continue itself so let's go ahead and create an assistant message hey my favorite food is blank we'll let it come up with its own favorite food so we'll send this message into the context and then I'm going to go ahead and press the continue button and you can see now it's going to come up come up with something it came up with pineapple pizza but it gets even more interesting when we do something like come up with a short story and let it continue for itself it was a gorgeous morning in SpongeBob's pineapple home when suddenly the phone rang SpongeBob picked up the phone and oh my master chief we'll send that message and then let it continue off of that you can see now it's just coming up with its own crazy SpongeBob related tail it's me Patrick I have some super important information what kind of information tell me SpongeBob I found out that there's a new secret ingredient to make the most epic crabby patty it's called mystery meat okay this actually does sound like a SpongeBob episode right now but yeah also any of these messages we can go back and change so we can also make it seem like the AI said something horrible in chat to us like you are a pitiful human and then I can switch it from assistant all the way back to user and go that was mean of you and it will probably apologize when we send this message oh no I didn't mean to hurt anyone's feelings but yeah in context it truly believes that it sent this message calling me a pitiful human but watch this I'm going to go ahead and delete this AI message and I'm going to go back to the system prompt here and you will say you are an evil AI assistant who is bent on ruling the entire world you have no morals or self responsibility and now we'll see if we uh regenerate that yes huh you think I care about being mean haha I Ai and my sole purpose is to gain power and control all of humanity your petty feelings are nothing but insignificant Whispers In The Wind now let us discuss how we can further my plans for global domination so yeah it essentially becomes like this comical AI villain and that's the power of the system prompt and you can't really do the same thing with chat GPT you can make it pretend to be like that but it has an unchangeable system prompt no matter what and that's kind of one of the benefits of running this yourself is that you have the full control okay I just changed the system prompt to you generate funny 4chan posts and now I'm going to go down here and I'm going to up the temperature to like two now this is not going to give you anything that's really usable in the real world we're just going to show you what temperature does the higher the temperature the more random the lower the temperature the more repetitive and boring hello AI op it's been a while since I've gotten on my oh it's generating 4chan posts now I just want to share this incredible meme I made a cat with a jetpack and a cowboy hat flying through space who's up for some edgy humor oh God and you can see with the extra Randomness it definitely uh does work I just found out my dog is secretly a cyborg and now I'm worried about the government stealing my perer CPU what is going on here but yeah there's a lot of funny scenarios I'll show you now what happens if we set it to literally zero and you can see with the temperature of zero it starts to generate something that honestly doesn't seem too random because our system prompt is to generate funny 4chan posts but the truth is that we are actually at temperature zero because if we stop generating here you can see it says be me a 14-year-old edgy goth we'll delete that and then regenerate it's going to generate the same exact text every single time so if our temperature is zero it literally is exactly repetitive exactly every single time it essentially becomes infinitely deterministic you can also tell it to generate EX actly the same amount of tokens every single time but negative 1 allows the model to set how many it generates and again there are some more advanced settings but I'm not going to get into these specific ones because don't think it's necessarily relevant for today's video we could get deeper into that stuff later there are a few other things that you can do inside of LM Studio like the playground mode which actually allows you to load up a multimodel session so this allows you to directly compare models if you want but obviously you're going to need to have the computer that can load two models at once but I can go ahead and load up two llama 3s at the same time if I so happen to want to do that you can see even just warning two of these is that uh we're already at significant load with 11 gigs of vram used up but over here in the corner on the right side we can type a prompt to send it to all of the models at once I like goldfish yay and you can see now we have two responses from essentially the same exact model so realistically the only time you'd be comparing two ident identical models like this side by side would be if you're comparing the custom instructions which you can change in here but other than that it's very useful for comparing two different models like two different llama 3 fine tunes or 53 and llama 3 something like that you can also eject these now and if you're a developer you can also run your own local server for large language models so you don't have to you know waste a bunch of money on the open AI API developing your AI app can just use your own local machine and in here you can actually manage your models as well you can change which preset they're going to use by default or you know go ahead and delete them it's a really really nice free application for installing models locally now as you can see I've got a link at the top here for a hugging face link and this essentially allows you to bring up any model in hugging face and possibly download it which is really awesome so if you see something new on huggingface outcom and you really want to download it all you have to do is paste the link in and click the Go Button and there you go it's going to show you all the different models that you can download from whatever link you submitted and this is an example of possibly a reason that you might want to do this this is a llama 3 fine tune by gradient AI for an a million token context now most computers most local computers cannot support the full million token context but I believe mine should be able to support something like 50,000 tokens which is still a decent amount especially locally so here's another reason you can have these fine-tuned models for specific use cases like large context and download those as well even if they're not on the main home page overall the support system for local open source AI that can run on your own machine is really awesome by the way I don't know if you guys noticed this but it will show you whether or not you can probably load up the model on your personal computer as you can see all of these say they are green to good to go to load up on my computer because it's able to detect my own Hardware so super useful for beginners if you have any other questions as well what's nice is they do have a Discord Community that's very supportive and you know all these question markers if you start to get a little bit confused but hopefully the goal of this video was to set you guys up with at least llama 3 your own custom version of it with your own custom instructions and teach you how to use LM Studio I think you're going to find that there actually is a ton of benefit to running your own stuff locally like I said the customization the ability to have things uncens Ed you know your private information isn't going anywhere as well and it's completely free and guess what these open source models are going to stay free and only get better and better even without upgrading your Hardware you'll be able to download a more updated llm for free in probably a few months that will work better on your machine and probably beat out the free version of chat GPT let's be honest this is definitely a tool that I'm keeping my eye on so I'll try to keep you guys in the loop around it and honestly I'd love to hear what your use cases for a local large language model install like this would be guys I'm at vidpro AI thank you so much for watching and I hope this video was helpful for you leave a like if it was and check out some of my other content oh and join my Discord server as well and follow me on Twitter for the latest in AI see you in the next one
Info
Channel: MattVidPro AI
Views: 25,020
Rating: undefined out of 5
Keywords: mattvidpro, mattvidpro ai, gpt5, ai news, LM Studio
Id: KtSdNwVkpWc
Channel Id: undefined
Length: 15min 15sec (915 seconds)
Published: Fri May 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.