Updated Oobabooga Textgen WebUI for M1/M2 [Installation & Tutorial]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I will show you how to install the Kuba booger text generation web UI locally on your own machine if you want to run open source language models this is the best tool that you need to get it supports a large number of llms out of the box and is relatively easy to set up so in the rest of the video I will show you a step-by-step installation process and then I will show you how you can actually install the okunya 33b the 1.3 version model with this for this video we're going to be specifically focusing on installation for Mac OS with apple silicon if you are interested in the installation instructions for Linux and windows I would recommend you to watch this video now in this video we are not going to be using the one click installer because I have found it to be buggy when it comes to M1 and M2 processors instead we will be doing a step-by-step installation to start on installation process first we need to actually go and clone this report so for that you go to the top of the repo and click on the screen button you will see this link now copy this link and open a new terminal so here let me type in LS so this gives me a list of all the files and folders that are present in the current working directory I want to move to the documents folder so I'm going to type in CD documents now in order to clone the repo so I'm going to type in git clone and then in the repo link that we just copied Now by default if I click enter it will create a folder with this name text generation Dash web to UI but I want to put it in a custom name folder so I'm going to call it text gen and let's click enter and it will start copying all the files from the repo now keep in mind for this to work you will actually need to have get installed on your local machine alternatively if you do not have git installed then you can actually go to the same code button and download the repo as a zip file as here and then unextracted and work with that zip file okay so we're going to go back to our terminal again I will simply type in clear so that it will remove all the text that we had so far next we need to create a new virtual environment so I will be using corner for this we will type in conduct create dash n and the virtual environment name so I'm going to also call this text gen I already have a virtual environment with this name but in your case it's going to be different then and this is the most important part if you want to define the python version that you want to use so for this specific case I'm going to be using Point python 3.10.10 now you might be thinking with this coming from so this is coming from one of the issues that was created uh specifically for installation on Apple silicon and here this person is recommending uh python 3.10.10 so let's go back to our terminal I'll just hit enter and it will start creating the virtual environment for you if you do not have in this case it's simply asking me there is an existing python version virtual environment with this name do you want to remove the existing environment so I'm going to type in yes and let's wait for it to remove everything and then it's asking me whether I want to actually proceed with the installation of so I'm going to type in y and it's start the acceleration process okay so the installation process is complete and now we need to activate our virtually environment so we're going to be using this specific code segment which is conda activate and then text gen now notice here instead of base is going to change to text gen okay so I simply clean uh my terminal again so let's type in LS and here is the folder that we just created using the get cone command now we need to move to that folder to in order to do that we will type in CD that is change directory and then type in text gen and if we type in LS it will show us the contents of this folder now the content of the folders are going to be exactly the same as the repo on the GitHub so now let's go back next we need to install all the required packages and they are here in the requirements.txt file so in order to do the installation for that we will actually need to type in pip install Dash RN requirements.txt actually let me show you uh one more thing before this so sometimes uh when you're working with virtual environments it may not pick the right virtual environment so you can check which python you are using so you can type in which Python and then it will tell you the path where the python virtual environment is located so in this case we are using the right one so it shouldn't be an issue however I would recommend to use this python M right and then pip install Dash R and the requirements.txt so in this case it will make sure with this command that it's using the corresponding virtual environment python so let's wait for it so it's simply downloading all the required packages and installing them on our local machine so the solution is complete however we need to do one more thing so we actually need to go back and install a specific by torch version this is the nightly version that works with apple silicon M1 and N2 processors so we're going to go back to our terminal and type in the command that we just copied so I'm going to just paste it and hit run now keep one thing in mind that in this specific case the virtual environment manager being used is different than conduct so just skip this step and install it using condo okay so the installation is complete now we are all set to run the script so I'm going to type in Alice again now there is this specific uh file that we need to run and that is the server dot Pi however we need to make one more change before uh running the server navigate the most out of the hardware that I have we will need to set number of threads so I have a m to Max that's why I'm going to be setting this to 8 when I'm running the server.pi file for your specific case set it accordingly okay so let's go back to our terminal again and now if in order to start the server I'm going to type in Python server Pi Dash threads and eight that will give us the maximum performance and we simply run this command now after running that command uh so it will give me this localhost web address that you can use to actually access the web server now you will notice that I'm getting a warning regarding the bits and bytes package because that's specifically designed to run on Nvidia gpus so I'm going to Simply ignore this one more thing to keep in mind you can actually create a public link so you can access your web server that I'm about to show you um over the internet as well I will show you that in a later video so what we're going to do is we'll simply copy this and actually head over to a browser and paste it there and here's the interface that you're going C now if you have seen my previous videos this looks very different because they have updated the web interface however this is now pretty neat a couple of things that I really like about it is now in this input box you can Define the custom prompt template that you need for different models and we're going to look at a couple of examples in a little bit but before that we need to actually download a model so I would recommend to go to this model tab right now there are three different ways in which you can download models this is the one that I really like because it's the easiest one to do so let me show you how you can actually download a new model so we're going to move to hugging face and here you can actually search for different models that you want so for example uh if we go here all you need is the link of this model but I'm interested only in uh this wokunya 13B the version 3 Model this is the new model which was released just a few days ago so I will just copy this link go back to the text generation web UI paste the link here and then simply click on download now if we go back to our terminal you will see on the download progress is being shown in here so while this is downloading I would like to show you some other options first and foremost is the model loader so most of the models that you are going to be seeing on hugging face they are based on Transformers however there are some other models as well for example llama CPP based models and there are even quantized versions so if you're using version of the models so simply select this now here are some more model specific options so for example you can use a CPU to run them if you download a full model the way I'm doing right now you can load it in eight bits or 16-bit version or if you want to unload a part of it to the GPU and part of the CPU you can select auto devices there is also an option to load it in four bits and so on and so forth and you can also uh use this trust remote code that will be required for some of the hugging phase models okay let's look at a couple of other things so if you go to this parameter tab the most important parameter is temperature so you want depending on your applications you probably want to keep it to a low value if you do not want any hallucination in your output is this top P top K also repetition penalty these are different options that you probably have seen okay another thing that you can do is you can actually fine tune your models within the Uber buga text generation web UI so I will be creating a few videos on this but you can essentially take your own data set and just fine tune models with it in terms of the interface so if you go to the text generation interface it looks like this so you provide input here it will generate outputs and this works for instruct fine tune models however if you want to have something like checked interface you can select this then click apply and restart the interface and that will be converted into a chat interface where you can essentially chat with your assistant or your model so I'm not going to do that right now because uh we are currently in the process of downloading models but that's an option that I will show you later now another very important option is actually uh The Prompt template that you're using so if you go down here there is prompt and then you have a whole bunch of different options listed in here so for example uh we were going to be working with instructor Kunia 1.3 but it has a very similar template to 1.1 so if you select that okay so the installation is that you're going to see this message done now if you come here and re-upload all the models name so you are going to see a new model so I'm going to select this and this will load the model okay so the model is now successfully loaded by default the interface is set to question answer and that's what the instructor fine tune models are however you can change this to chat interface as well in that case simply select chat apply and restart the interface now you will be able to interact with the model as if you're chatting with it use this for models that are fine-tuned for chat now let me show you one last thing so I went let's go to the prompt and here I'm going to select instruction 1.1 this is the same form template that the wacuna 1.3 version is using so you see it actually changed the prompt template for us if I select another model so it will use a different prompt template so it's a pretty nice feature that they have included that you automatically get the prompt type that the model is supposed to be using now another parameter that you want to set is the max new tokens this is the number of tokens that the model is supposed to generate so I'm going to keep it to the default 200 but if you want a longer response you can actually increase this now to end the video I'm going to be testing this model with this very simple prompt so let's see what it can come up with so I'll click generate and this is the actual speed of generation that I'm getting on my M2 machine I was actually expecting a lot from this model however here is what it came up with hey there viewer of the video we hope you enjoyed our latest video about the topic of the topic of the topic and it goes into a loop but I hope you enjoyed the video if you did consider liking the video and if you are not subscribed yet consider subscribing to the channel I'll be creating a lot more videos using this new Uber web interface thanks for watching and see you in the next one
Info
Channel: Prompt Engineering
Views: 28,450
Rating: undefined out of 5
Keywords: prompt engineering, Prompt Engineer, natural language processing, GPT-4, chatgpt for pdf files, ChatGPT for PDF, langchain openai, langchain in python, Text Embeddings, openai, embeddings, oobabooga, text generation webui, text generation webui colab, text generation webui training, ooga booga text-generation-webui, ooga booga textgen webui, vicuna ai, vicuna 13b, vicuna 13b installation, vicuna 13b mac m1, textgen webui install m1, text generation webui m1 m2
Id: btmVhRuoLkc
Channel Id: undefined
Length: 14min 40sec (880 seconds)
Published: Fri Jun 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.