QLoRA PEFT Walkthrough! Hyperparameters Explained, Dataset Requirements, and Comparing Repo's.

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey YouTube today we're going to be talking about two different applications that you can leverage to fine-tune your large language models using QLoRAs the alpaca chulora and the official QLoRA now the alpaca qlora is significantly easier to install but it doesn't have the power of the official QLoRA specifically with being able to merge eloras back into the model itself significantly increasing inferencing speed but it took several hours to be able to get the official QLoRA stood up versus five minutes for the alpaca QLoRA I have created some custom repos to hopefully make this easier for people using WSL and windows but if you'd like to skip to either feel free to jump to the chapters below but otherwise let's get started before we actually get into fine tuning let's do a quick walkthrough on how to actually get this set up correctly because I had a lot of trouble so I'd like to help anybody else who has the same troubles that I did now I was not able to get it to run on Windows I had to Pivot to WSL where I was able to get things to run except for bits and bytes there was a problem with the Cuda Hardware compatibility specifically with the bits and bytes Library throwing an error of unexpected ordinals so if you run into this issue I do have my own custom Fork of the repo which you can clone with the command Below in the description and once you have cloned that all we have to do is CD into the bits and bytes directory and we're going to run a couple of commands so first we're going to run export Cuda version equals and you're going to set this to your Cuda version so in my case it's 11.7 so I'm going to set it to 11 7. and then oh and also all of these commands will be in the description below and then we're going to run make Cuda 11x we're going to hit enter it takes about a minute for this to run so we'll be right back so once that's done running we're now just going to actually set up our custom installation of bits and bytes so how we can do that is we're going to first make sure that the other installation of bits and bytes is gone and we'll do that by running pip uninstall bits and bytes so if that is still installed you can run you know hit yes go ahead and let it get deleted and then we're just going to run the following command python setup.pi and this is all in the bits and bytes top directory from the one that we checked out and then we're going to run python setup.pi install and that's it it's installed so now let's move on to actually running the application itself if you feel comfortable making modifications of the code we do need to make a couple of changes in QLoRA dot Pi specifically for us to be able to leverage custom data sets but if you don't feel comfortable making the changes there will be a repo in the description below that you can check out and use in your environment and we'll already have the changes made for you but if you do feel comfortable making the changes we just need to append the following lines after 522 and these will be in the repo as well and we need to make a change to the prompt class to include the output for the response otherwise the loss function doesn't have anything to update against so once we've made these changes now we need to prepare ourselves for what we need to consider for using QLoRA right now unfortunately for the moment it appears that QLoRA only supports structured text so it doesn't support unstructured or raw so this means that we do need to be aware of what our foundational model expects for a structured fine tune like in the example of alpaca it expects an instruction input output and we need to have our data in that format though I expect that raw and unstructured will be supported soon enough um but for the moment we want to consider why is QLoRA so incredibly important here and it's about the quality and the size of the data set required to achieve the same thing that we were achieving in LoRA so with QLoRA we're attaching these LoRAs to every single layer that we can whereas in LoRA or standard LoRA we were only attaching it to a single layer so this results in a much higher quality fine tune and we can make a much larger influence on the network with far fewer samples so if we look at this data set it is a fairly small data set but we should be able to have the same impact as a larger version of this and this is just the same one I was using before with Med quad QA but now that we have an idea of what we're going to be training let's go ahead and move on to uh getting ourselves set up to actually do it now let's go over what these hyper parameters do for us the model name or path is the model that you plan to train this can be given as an absolute or relative path and if you give it as a relative path just make sure it's in the same place as where your script is the output directory is where all of your training metadata will be saved to you this includes checkpoints and other statistics the data set is the data set that you plan to train with do train is probably not going to change we want this to be true but you can try to change it to false if you don't want to change training you just want to run an evaluation Dewey Val tells us whether we want to do an evaluation this run is uh this run as well do mmlu mmlu is a particular Benchmark for large language models and we can set this to true if we want that to run at the end of this as well Source max length is the maximum input length for our prompts and Target max length is the maximum generation that we should be expecting the batch sizes I don't recommend that we change those these four seems to be a pretty decent value for this gradient accumulation step so this could be something worth changing this is how many steps do we want to take before we start propagating our gradients back into our loras logging steps how many steps do we want to take before we create logs for our training run our maximum steps is how many steps do we want to take in this particular run our Sage strategy is based on steps probably not a lot of reason to change this our data seed is the value that will be used for seeding the lures initially our safe steps are how often do we want to create checkpoints our save total limit is how many checkpoints do we want to create in total our evaluation strategy is also based in steps and probably does not need to change and our evaluation data set size tells us how large is our evaluation data set from our source data set and then evaluation steps is how often do we want to actually write an evaluation but now our Optimizer the optimizer is where we have the largest range of choices with well over a dozen and all of them will be in the description below the atom w32-bit is pretty performance so there's probably not a lot of reason to switch from that one but the 8-Bit version does have a bug right now where a loss goes out of control I've seen that reported a couple of times those issues on their GitHub so just be wary of that if if you see any issues with that but we also want to talk about what we can change in our LoRAs and we can do that through parameters or inside the code here so for example we can tweak our LoRA rank which by default is 64. if you remember that is the size of the LoRA and the larger the LoRA the more weights that you have to tweak so you can get a more fine fine tune but the LoRA Alpha is how much of an impact does that LoRA have when we start adding their values together so the higher the alpha the higher that addition in the lower the value the lower that addition the Dropout controls how well do we prevent overfitting and other issues from these loras that's probably not worth touching this value unless you really feel like you need to but the other values the most important one is going to be the learning rate if you have a model larger than 13 billion it is worth probably increasing the learning rate just to get better gradients but now let's move on to actually starting our fine tune so once we have our data set and the large language model that we want to fine tune and we set the hyper parameters the way we want them to be after we execute the fine tuning procedure three things are going to happen the fine tune itself creation of checkpoints and finally we're going to create a merge and that merge is where things are very different than they have been with other LoRAs so LoRA's up until now have been attached to a single layer and they can be attached and detached but in QLoRA they are attached to every layer that they can be and when the model is being saved we want to create a merge so we merge the weights from the lures back into the layers kind of like what they were doing in stable diffusion for quite a long time now but are now present in llms and then we'll be able to launch it like we normally would inside of a UI so to start our training process we're going to call the script that we modified and I'm in the top level directory in QLoRA so I'm going to run sh I'm going to call Spritz and we're going to call fine tune.sh so now once we run this it will take a significant amount of time to run in my case it's going to take several hours so once it's done we'll come back now once your model is done training we're going to go into wherever you ask the output to be which in my case was just output and we should see a checkpoint that's been generated so if I go into this checkpoint we see that there are there is an adapter model and some other stuff that now exists what we care about is the adapter model and as long as that is now there and it's a bin file we can now start the process of merging our LoRA or New QLoRA into our old model so we can load it into other uis and that is just done with a script that I've created um called merge.pi and this is in my repo for this so if you want to copy the code feel free but all you should have to model or uh modify is where your model Pap is and the adapter path so after you've modified those all you have to do is simply run the script and it should handle the rest of it for you and um let me show you how that's done really quickly and all we should have to do is see the back into my output folder here and we should just have to run python merge dot Pi let this run and it'll handle the merge for you and we'll show you what you get after it's done running after you're done saving you should now have a saveme.bin or whatever you've chosen to name the file and now you can just move this into your models and uba Booga or whichever preferred chat UI you like and you should just be able to start the model up you'll need to bring along any of the additional metadata from the model though so your tokenizer your config.json and so forth but then it should just Boot and but now I would like to show you all a tool that was significantly easier to work with and only took about 10 minutes to get stood up instead of eight hours after struggling with the official QLoRA I was able to find this repository which was significantly easier to work with and I'll be linking that in the description below but let me show you how to work with it really quickly because it's considerably easier so all we have to do is clone it just like we have the other repositories with CDN and all we have to do to fine tune is run this command line at least on llama models they do have extensive examples for other models that they support including pythia and stable LM but it is a much easier model to work with you just run this and it handles the rest of it it doesn't have the instabilities that the official QLoRA appears to have right now and you just have to hit enter let it run and we'll come back once it's finished running after the fine tuning is completed all we have to do now is simply start the web application with a single command line and we just have to pass it the base model that we trained it with and where the lower weights were stored and we just hit enter and it will run the rest for us and it will take a while for it to load especially on a local machine it's just there's a lot of complexity happening attaching the various LoRA matrices to the attention layers and unlike the official QLoRA they don't support merging that I saw yet I believe they plan to support it but it does not look like it's supported yet so without the ability to merge the inferencing will be slower but it's just so much easier to at least test your QLoRAs on this than it was on the previous application and we still have the ability to tweak a lot of the hyper parameters that we had on the other application but it's just so much more straightforward on this there was so few fewer issues than there were on the other um so now we will wait for debut and when it comes back we'll see how well it performs once the app is running it's just like any of the chat apps that we're used to and in this case where you can just test out our fine tunes so the instruction in this case is your medical expert and you will answer questions related to Medical inquiries and the input will be how to diagnose myosis fungiodes and sarsi syndrome and let's see how well it performs and oh oops I forgot to stream the output that typically helps now the loras do have an impact on the overall inferencing speed for these models so unfortunately it does make them a little slower and that is an advantage of the primary uh QLoRA package if you can um use it to do the merge you don't have such a massive impact on your network so but that's really it this these are the two main approaches right now to performing QLoRAs and they're very similar but one is certainly much easier than the other if this was helpful please like And subscribe and let us know in the comments below what you'd like to hear about next and please tune in tomorrow at 6 PM Mountain Standard Time for when we're going to be having our next q a session and tune in next time when we'll be discussing the power behind icker tokens and much wider context and open source models see y'all next time

Info

Channel: AemonAlgiz

Views: 4,059

Rating: undefined out of 5

Keywords: AI, Machine Learning, QLoRA, Large Language Models, Fine Tuning, Alpaca QLoRA, Artificial Intelligence, Deep Learning, Model Training, Data Science, WSL, Windows, Chatbots, Model Optimization, Language Processing, Advanced AI, AI Tools, AI Research, Computational Linguistics, Natural Language Processing, AI Development, Data Analysis, Tech Tutorials, AI Algorithms, AI Tutorial, Deep Learning Models, AI Innovation, Advanced Machine Learning, PEFT

Id: 8vmWGX1nfNM

Channel Id: undefined

Length: 14min 55sec (895 seconds)

Published: Thu Jun 01 2023