How To Install TextGen WebUI - Use ANY MODEL Locally!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
text gen web UI is an incredible piece of Open Source software that allows you to run large language models locally on your computer I'm going to show you how to install it how to get it set up and we're going to be using it in a lot of future videos so let's go so the first thing you're going to do is come to the text generation web UI GitHub repo and it's by uba Booga and if we scroll down they have one-click installers for Windows Linux Mac OS and WSL however I've actually never been able to get these to work in one click essentially what they are is just scripts that run the following steps that we're going to do anyways they have two other methods of getting it installed one is through conda which that's what we're going to use today because I find that to be the easiest and they have another one through Docker okay let's get started open up your terminal and make sure you already have conda installed now I'm using a Windows machine so keep that in mind as we go through this and on a Windows machine there is a specific terminal with conda enabled after you install conda you cannot open the usual terminal so here you can see it says Anaconda to prompt and that means that I am in the conda version of my terminal and then the following command is going to allow us to create a brand new conda environment so that we can keep everything nice and organized and we don't run into any of those Python and module mismatch version issues so we type conda create dash n t g t g is text gen you can name it anything you want and then python equals and then we're going to be using python version 3.10.9 then we hit enter then it's going to ask us if we want these packages installed yes we do and somebody also pointed out in a previous comment you don't actually have to type y when the Y is in Brackets that's the default so let me actually do that now I'm just going to hit enter and there it goes and thank you to who told me about that and there it goes it's all done so then we have to activate the environment so we copy that right there conda activate TG we'll paste it in and hit enter now we know we're in the conda environment because instead of Base it says TG now next we have to install all the torch libraries that we need so here it is we're going to type pip3 install torch torch Vision torch audio and then we're going to get that from this URL here then we just hit enter and torch is the library that runs the math behind all of these large language models okay there it's done the next thing we're going to do is clone the repo so on this code green button we're going to click it and then we're going to click copy right there then we switch back to our terminal and we type git clone and then paste that URL in and that's going to download all the files then we're going to CD change directory into that folder text Dash generation Dash web UI we're going to hit enter now we're in that folder the next thing we need to do is install all the python modules so to do that we're going to type pip install Dash R requirements.txt and then hit enter alright there it goes it's done and the nice thing is because we're using conda we didn't run into any issues while running the requirements.txt file the next step is where it gets a little tricky now if you're just running this on a CPU and you don't want to use your GPU to run these models you can ignore this step but if you're using a graphics card you do need to do this the first thing we're going to do is actually uninstall the Llama CPP python module and it's pip uninstall Dash Y and then the name of the module then we're going to hit enter And So It uninstalled it next we're going to set one of the environment variables and you can see it right here set cmake args and we're going to hit enter then we're going to set one more environment variable Force C make equals one enter and then last we're going to install a version of llama CPP python again with the no cache directory flag okay so I actually got this error right here failed building wheel llama CPP Python and what solved it for me was doing these three commands so set cmake args equal to this right here set Force make equals one and then running the install and that seemed to work all right and now I'm gonna make sure that Cuda is working by using the Checker file that I created in a previous video so let's run it and this is just going to make sure that Cuda is available to torch okay so we run Python checker.pi and it shows 11.7 and true so that means that Cuda is available to torch and the last thing we have to do is just spin up the server so python server dot Pi enter okay so we actually ran into an error Cuda said have failed despite GPU being available now I've run into this error before and I know how to solve it and what we need to do is actually install a specific version of bits and bytes and what that looks like is this pip install bits and bytes Dash windows then hit enter okay there we go now let's try it again python server.pi okay we ran into another issue so this is not straightforward to set up now this says attribute error modules bits and bytes has no attribute linear 4-bit I did some Googling and I found that a lot of people are suggesting removing parts of the code but I don't want to do that and what I actually found instead is a version of pepft that makes it work and so that's what we see right here pip install and then git plus https and then the URL and I'll drop this command in the description and after we install it we'd run Python server.pi and then hit enter and now we got it working and then the only thing we have to do now is grab this local URL open up a new tab paste it in and we have it running so I'm going to show you a couple more things in this video one I'm going to show you how to download models and there's three separate ways to do that and then I'm going to show you around some of these settings to get some of the models working so first let me show you how to download models so let's go to hugging phase Dot Co slash the bloke and that's where we're going to find all of the models that he's created and he is doing it all the time here's one that was uploaded just three minutes ago this is a vicunia 13B version 1.3 ggml and what I usually look for is the gptq versions because I find those to be the easiest to run so here's the same one but the gptq version and you could tell because it's in the file name so we click that and from this page we click the little copy button right here we grab that and then right here on the model tab where it says download custom model Alora we paste it in there and we just click download and it'll download on its own so that's method number one method number two is downloading it using the python script from the command line and that's the method that I usually prefer because we can open up multiple terminals download all of these models in parallel so if we're in the text generation web UI folder we type the command python download Dash model dot pi and that's the script to download the model and then we just paste in the same thing as before the author slash the model name then we hit enter and it will download everything in that folder and the last method is doing everything manually so if we open up hugging face again under the tab files and versions We can download all of these different files that are required to run it the main file is always going to be the large one and this is 7.45 gigabytes we just click this button right here to download it then in the text generation web UI folder we look for the models folder and this is where we place the model we just put it right here then once we've done that from the interface we click this little refresh button and then we can see we have the model available then we click it and it'll load it up but I'm not going to do that right now next let's take a look at some of these settings generally you don't have to mess with these but they do provide a couple additional features now coming back to the blokes hugging face page he is usually generous in his description of how to get these models running and so if we look at this Robin 7B model so it's the gptq version and we scroll down it says how to easily download and use this model and text generation web UI and if we look through the instructions here it doesn't say that we need to set any of the settings manually so that means we just download it as usual and then it's running the other thing to look at is the prompt template right here and this is the prompt template that is necessary to run the model and actually get viable outputs switching back to text gen web UI if we go to the text generation tab right here we can get a list of those prompt templates and you can see it's a pretty long list they usually associate with one of the models but if you don't find it and you can try different ones and see which one matches with what the bloke recommends separately of course you can always just delete this and put in your own prompt template here going back to the model tab this CPU flag means you want to run it on CPU only we also have this Auto devices flag which allows you to run some of it on a GPU and some of it on your CPU and we have a few others sometimes when you load up a model you're going to get an error over here that says review the code and then enable trust remote code and all you have to do is Select this checkbox right here next I'm going to switch over to the interface mode so typically we're using instruct models but let's say you want to run a chat based model like Samantha then over here you just click this and select chat and then you apply and restart the interface let's try it and there we go now we have a chat like interface and the nice thing about this is you can actually have a continuous conversation where the model will remember your previous prompts and responses then if we come over here we have chat settings and there's a bunch of nice settings I won't go over all of them but one cool thing is you can actually upload chat history and continue from where you left off in the parameters tab this is where you can fine tune how the model responds to you so they have temperature they have top P top K all the settings that you would expect they have right here and you can play around with these and see what works for you last they have a training Tab and this is where you're going to do fine tuning I haven't done a tutorial on this yet but I am planning on doing it and so that's it now you know how to get text generation web UI running on your local machine downloading models and actually running them if you have any problems getting this set up of course jump in my Discord I'll try to help you out if you enjoyed this video please consider giving a like And subscribe and I'll see you in the next one
Info
Channel: Matthew Berman
Views: 19,584
Rating: undefined out of 5
Keywords: text generation webui, chatgpt, ai, openai, open source llm, text generation web ui, textgen webui, textgen tutorial, artificial intelligence, large language models
Id: VPW6mVTTtTc
Channel Id: undefined
Length: 9min 46sec (586 seconds)
Published: Mon Jun 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.