How To Install CODE LLaMA LOCALLY (TextGen WebUI)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today I'm going to show you how to install cold Lama locally this is an incredible model based on llama 2 that beats gpt4 at codingtask I just did a video comparing it directly against gbt4 and it did incredibly well so I'm going to show you how to install it locally let's go so this is going to be an install from complete scratch assuming you have nothing set up on your machine I'm going to be using my PC today we're going to be using text generation web UI for the interface and we're going to be using the wizard coder version of codelama which is a fine-tuned version that is the one that beat gpt4 so the first thing we're going to do is get text generation web UI installed and if you don't already have Anaconda installed in your machine go ahead and do that that'll help us manage the python versioning issues that I always come across this is the GitHub page for text generation web UI I'll drop a link in the description below directly to this the first thing you're going to do is come over here to this green code button click it and click copy and this is going to copy the URL that we need to get the code from then switch over to your terminal with Anaconda ready and we're going to create a new conda environment conda create dash ntg for text gen python equals 3.10.9 we're using this version because that's the version they have in the examples on the GitHub Page hit enter it's going to install all the packages and everything else we need and then you're going to hit enter again to proceed and we're done next we're going to highlight this Command right here copy it and then paste it conda activate TG and that's going to activate that content environment for us now we can see we're in it because it says so right here TG next we're going to clone the code so we type git clone and then paste in that URL we have copied already hit enter alright now that we have that cloned we're going to CD change directory into that new folder that we just created so we type CD text Dash generation Dash web UI enter now we're in that folder next we're going to install all of the requirements we need for this repository so we're going to type python-m pip install Dash R requirements.txt and then hit enter and this may take a few minutes all right once that's done we're going to need to install the the torch libraries that we need now I already have these installed so I'm not going to do it but I'll show you the command here pip 3 install torch torch Vision torch audio dash dash index Dash URL and then this URL I'll drop step-by-step instructions and adjust in the comments so you'll have all of these links okay now sometimes when you spin up the server you're going to get this error that says Cuda is not available and the way to solve that is this Command right here con to install pytorch torch Vision torch audio you're going to specify the pytorch version that has Cuda and then Dash C pi torch Dash C Nvidia and again I'll put this all in the just file in the description below okay I got this weird error saying module not found error no module named Char debt so I just went ahead and installed it directly using python-m pip install Char debt now let's try to spin up the server again all right I ran into another issue saying it doesn't have this C Char debt so now I just installed it the same way and now we're good to go and once you have that installed we're ready to go to spin up the server you're going to type python server dot pi and then hit enter so we're going to click this URL right here and now we have text generation web UI up and running click on over to the model Tab and we're going to need to download the model so switching back to hugging face now we're at the wizard coder python 13bv 1.0 model card now I'm going to be installing the 13 billion parameter version because it's smaller but since I have 24 gigabytes of video RAM I think I can actually get the 34 billion parameter model working now if we look at the wizard LM page we can see all the different versions that they have here so all of these wizard coders these are the code llamas that have been fine-tuned by wizard LM so they have a 1 billion parameter version which will fit on most devices and they have the python the wizard code or python 13B version as well here and that's a model specifically trained for python code so take a look at all of these you can even get the larger 70 billion parameter version and of course you can grab any model so if you want to download the raw unquantized model directly from meta you can do that as well and I'll show you how to do all of now I showed you the wizard page now I'm on the blokes page and he provides quantized versions of every model pretty much he's amazing and so we are going to be using the wizard coder python 13B version and we're going to be using the gptq quantized version quantize just means compressed it just allows it to run more quickly and usually you don't lose a lot of quality and if we click over to the files and versions tab we can see that the model.save tensors file is about seven gigabytes and that's what we'll be downloading so go ahead up here right next to the name you're going to click copy we're going to switch back to text generation web UI and in the first input box we're going to paste in what we just copied and clicked download and this will take a while it's still seven gigabytes okay it's done next what we have to do is click this little refresh button right here and that'll load up the models in this drop down list so we're going to click the model that we just downloaded it defaults to Auto GPT queue for some reason but what we want to do is actually use the xlama HF for this so go ahead and click that and those are just different model loaders you can experiment with different ones some work for certain quantization and some don't and we can have our Max sequence length set right here apparently it was trained on 16 000 tokens and it can be fine-tuned up to a hundred thousand tokens I haven't tested it but I've read that it can be done then we click load and there we go successfully loaded switch over to parameters over here set your max new tokens to the same thing 2048 and for the temperature since we're going to be dealing with coding I like to set this at the minimum and the temperature just dictates how creative the model will be and we don't want it to be creative because this is coding then switch over to the default Tab and we're going to be using this prompt template below is an instruction that describes a task Write a response that appropriately completes the request and here we're going to put our prompt and then that's it we're done so let's test it out once write python code to Output numbers 1 to 100. all right it's done so there it gave me two examples and these both work this is absolutely perfect I'm running this locally it's using my GPU there's some options if you don't have a GPU you can use the CPU version only and it describes that in the GitHub page but that's it it now you have a fully functional extremely capable coding assistant locally on your computer absolutely free if you like this video please consider giving me a like And subscribe and I'll see you in the next one
Info
Channel: Matthew Berman
Views: 80,486
Rating: undefined out of 5
Keywords: code llama, code llama 2, llama 2, llama, ai, artificial intelligence, large language model, coding assistant, programming, python, ai coding assistant, chatgpt, install code llama, install llama, textgen webui, code llama instruct
Id: Ud_86SaCTrM
Channel Id: undefined
Length: 6min 11sec (371 seconds)
Published: Thu Aug 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.