Quantized LLama2 GPTQ Model with Ooga Booga (284x faster than original?)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today we will try out the gptq Llama 2 model from the blokes repository on hugging face we will use these instructions on the blokes model page to run it with ooga booga if you don't already have ooga booga it is pretty straightforward to install simply go to the ooga booga GitHub page scroll down and download and extract the zip done first we want to make sure we have the latest version of ooga booga I'm going to go into the directory where I extracted the ooga booga zip file and run the update underscore Windows batch file once that is done you can press the any key on your keyboard which one is the any key I have no idea anyways now let's start ooga booga by running the start underscore Windows batch file I received this error message something about pedantic Fields if you also get this error there is a solution from the Creator from ooga booga it says this error is caused by using an outdated installer and we can use this updated web UI python file I'm going to go to my installation directory and locate the existing web UI python file then let's open this new one in a new tab and download it and then replace my existing one with this newly downloaded one now let's go to the prompt and press the any key to exit the solution says to now run the update again after replacing that python file once that is done we can press the any key to exit now let's try starting ooga booga again and let's see if it works guys it worked I'm going to smash that like button on the solution I mean the heart button now let's copy that URL to a new browser window and now let's go back to the blokes instructions step one is to click on the model tab so let's click on the model tab step two is to enter that text for the download custom model or Laura text box I'm just going to copy paste it step three is to click download it is done step four confirms it is done step five is to click the refresh icon next to model step six is to select that newly downloaded gptq model from the drop down I'm going to click the drop down and select the bloke's llama2 chat gptq step 7 says it is now loaded and ready to use step 8 mentions we can save any custom settings we want to make and step nine is to click the text generation Tab and enter a prompt to get started I'm just going to click on save settings to save these default settings and then let's go to the text generation tab to test this out I'm going to ask you about ducks let's ask it what a duck sounds like I click generate but there doesn't seem to be anything happening looks like we got an error oops I forgot to load the model we need to add another step to actually click the load button now it will take a while to load this model we can see the status on the bottom right on this UI it finished that actually didn't take too long now let's try out a prompt again and ask something about ducks this is moving in real time it is going pretty fast I remember the original llama 7B model took a long time this is definitely faster it is finished it took 8.27 seconds great now as a test for comparison I'm going to load the original non-quantized 7B model it has finished loading now let's try it out with the exact same prompt and let's see how long it takes spoiler alert this took a very long time during the time it took to run this I wondered if it was the right choice to conduct this test several times I considered stopping the application it used up all of the RAM and my OBS eventually crashed it finally finished it took 2317 seconds compared to the quantized which only took eight seconds the tokens per second for this one was only 0.04 where for the quantized it was 11.37 that is over 284 times faster with the quantized model the screen just went blank that is because my OBS crashed during this process with the original 7B model and refused to stop recording from this screen so anyways I will be deleting the original models and only using the ones from the bloke's hugging face repository moving forward for the ones I used locally that is all enjoy PS let me know if anyone figures which key is the any key

Info

Channel: Natlamir

Views: 1,484

Rating: undefined out of 5

Keywords:

Id: lgzDLMtqQ3w

Channel Id: undefined

Length: 5min 50sec (350 seconds)

Published: Tue Sep 12 2023