QLoRA PEFT Walkthrough! Hyperparameters Explained, Dataset Requirements, and Comparing Repo's.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey YouTube today we're going to be talking  about two different applications that you can   leverage to fine-tune your large language models  using QLoRAs the alpaca chulora and the official   QLoRA now the alpaca qlora is significantly  easier to install but it doesn't have the   power of the official QLoRA specifically with  being able to merge eloras back into the model   itself significantly increasing inferencing speed  but it took several hours to be able to get the   official QLoRA stood up versus five minutes for  the alpaca QLoRA I have created some custom repos   to hopefully make this easier for people using  WSL and windows but if you'd like to skip to   either feel free to jump to the chapters below but  otherwise let's get started before we actually get   into fine tuning let's do a quick walkthrough on  how to actually get this set up correctly because   I had a lot of trouble so I'd like to help anybody  else who has the same troubles that I did now I   was not able to get it to run on Windows I had  to Pivot to WSL where I was able to get things   to run except for bits and bytes there was a  problem with the Cuda Hardware compatibility   specifically with the bits and bytes Library  throwing an error of unexpected ordinals so if you   run into this issue I do have my own custom Fork  of the repo which you can clone with the command   Below in the description and once you have cloned  that all we have to do is CD into the bits and   bytes directory and we're going to run a couple of  commands so first we're going to run export Cuda   version equals and you're going to set this to  your Cuda version so in my case it's 11.7 so I'm   going to set it to 11 7. and then oh and also all  of these commands will be in the description below   and then we're going to run make Cuda 11x we're  going to hit enter it takes about a minute for   this to run so we'll be right back so once that's  done running we're now just going to actually set   up our custom installation of bits and bytes so  how we can do that is we're going to first make   sure that the other installation of bits and bytes  is gone and we'll do that by running pip uninstall   bits and bytes so if that is still installed  you can run you know hit yes go ahead and   let it get deleted and then we're just  going to run the following command python   setup.pi and this is all in the bits and bytes  top directory from the one that we checked out and   then we're going to run python setup.pi install  and that's it it's installed so now let's move   on to actually running the application itself  if you feel comfortable making modifications of   the code we do need to make a couple of changes  in QLoRA dot Pi specifically for us to be able   to leverage custom data sets but if you don't  feel comfortable making the changes there will   be a repo in the description below that you can  check out and use in your environment and we'll   already have the changes made for you but if  you do feel comfortable making the changes we   just need to append the following lines after 522  and these will be in the repo as well and we need   to make a change to the prompt class to include  the output for the response otherwise the loss   function doesn't have anything to update against  so once we've made these changes now we need to   prepare ourselves for what we need to consider for  using QLoRA right now unfortunately for the moment   it appears that QLoRA only supports structured  text so it doesn't support unstructured or raw   so this means that we do need to be aware of what  our foundational model expects for a structured   fine tune like in the example of alpaca it expects  an instruction input output and we need to have   our data in that format though I expect that raw  and unstructured will be supported soon enough   um but for the moment we want to consider why  is QLoRA so incredibly important here and it's   about the quality and the size of the data set  required to achieve the same thing that we were   achieving in LoRA so with QLoRA we're attaching  these LoRAs to every single layer that we can   whereas in LoRA or standard LoRA we were only  attaching it to a single layer so this results   in a much higher quality fine tune and we can make  a much larger influence on the network with far   fewer samples so if we look at this data set it  is a fairly small data set but we should be able   to have the same impact as a larger version of  this and this is just the same one I was using   before with Med quad QA but now that we have an  idea of what we're going to be training let's go   ahead and move on to uh getting ourselves set up  to actually do it now let's go over what these   hyper parameters do for us the model name or path  is the model that you plan to train this can be   given as an absolute or relative path and if you  give it as a relative path just make sure it's in   the same place as where your script is the output  directory is where all of your training metadata   will be saved to you this includes checkpoints and  other statistics the data set is the data set that   you plan to train with do train is probably not  going to change we want this to be true but you   can try to change it to false if you don't want to  change training you just want to run an evaluation   Dewey Val tells us whether we want to do an  evaluation this run is uh this run as well   do mmlu mmlu is a particular Benchmark for large  language models and we can set this to true if   we want that to run at the end of this as well  Source max length is the maximum input length   for our prompts and Target max length is the  maximum generation that we should be expecting   the batch sizes I don't recommend that we change  those these four seems to be a pretty decent value   for this gradient accumulation step so this  could be something worth changing this is how   many steps do we want to take before we start  propagating our gradients back into our loras   logging steps how many steps do we want to  take before we create logs for our training run   our maximum steps is how many steps do we want  to take in this particular run our Sage strategy   is based on steps probably not a lot of reason  to change this our data seed is the value that   will be used for seeding the lures initially  our safe steps are how often do we want to   create checkpoints our save total limit is how  many checkpoints do we want to create in total   our evaluation strategy is also based in steps  and probably does not need to change and our   evaluation data set size tells us how large is  our evaluation data set from our source data set   and then evaluation steps is how often do we  want to actually write an evaluation but now   our Optimizer the optimizer is where we have the  largest range of choices with well over a dozen   and all of them will be in the description below  the atom w32-bit is pretty performance so there's   probably not a lot of reason to switch from that  one but the 8-Bit version does have a bug right   now where a loss goes out of control I've seen  that reported a couple of times those issues   on their GitHub so just be wary of that if if you  see any issues with that but we also want to talk   about what we can change in our LoRAs and we  can do that through parameters or inside the   code here so for example we can tweak our LoRA  rank which by default is 64. if you remember   that is the size of the LoRA and the larger the  LoRA the more weights that you have to tweak so   you can get a more fine fine tune but the LoRA  Alpha is how much of an impact does that LoRA   have when we start adding their values together  so the higher the alpha the higher that addition   in the lower the value the lower that addition  the Dropout controls how well do we prevent   overfitting and other issues from these loras  that's probably not worth touching this value   unless you really feel like you need to but the  other values the most important one is going to be   the learning rate if you have a model larger than  13 billion it is worth probably increasing the   learning rate just to get better gradients but now  let's move on to actually starting our fine tune   so once we have our data set and the large  language model that we want to fine tune and   we set the hyper parameters the way we want them  to be after we execute the fine tuning procedure   three things are going to happen the fine tune  itself creation of checkpoints and finally we're   going to create a merge and that merge is where  things are very different than they have been   with other LoRAs so LoRA's up until now have  been attached to a single layer and they can   be attached and detached but in QLoRA they are  attached to every layer that they can be and   when the model is being saved we want to create  a merge so we merge the weights from the lures   back into the layers kind of like what they were  doing in stable diffusion for quite a long time   now but are now present in llms and then we'll  be able to launch it like we normally would   inside of a UI so to start our training process  we're going to call the script that we modified   and I'm in the top level directory in QLoRA so  I'm going to run sh I'm going to call Spritz   and we're going to call fine tune.sh so now once  we run this it will take a significant amount   of time to run in my case it's going to take  several hours so once it's done we'll come back   now once your model is done training we're  going to go into wherever you ask the output   to be which in my case was just output and we  should see a checkpoint that's been generated   so if I go into this checkpoint we see  that there are there is an adapter model   and some other stuff that now exists  what we care about is the adapter model   and as long as that is now there and it's a bin  file we can now start the process of merging our   LoRA or New QLoRA into our old model so we can  load it into other uis and that is just done   with a script that I've created um called merge.pi  and this is in my repo for this so if you want to   copy the code feel free but all you should have to  model or uh modify is where your model Pap is and   the adapter path so after you've modified those  all you have to do is simply run the script and   it should handle the rest of it for you and um  let me show you how that's done really quickly   and all we should have to do is see  the back into my output folder here   and we should just have to run python merge dot  Pi let this run and it'll handle the merge for   you and we'll show you what you get after it's  done running after you're done saving you should   now have a saveme.bin or whatever you've chosen to  name the file and now you can just move this into   your models and uba Booga or whichever preferred  chat UI you like and you should just be able to   start the model up you'll need to bring along any  of the additional metadata from the model though   so your tokenizer your config.json and so forth  but then it should just Boot and but now I would   like to show you all a tool that was significantly  easier to work with and only took about 10 minutes   to get stood up instead of eight hours after  struggling with the official QLoRA I was able   to find this repository which was significantly  easier to work with and I'll be linking that in   the description below but let me show you how  to work with it really quickly because it's   considerably easier so all we have to do is clone  it just like we have the other repositories with   CDN and all we have to do to fine tune is run this  command line at least on llama models they do have   extensive examples for other models that they  support including pythia and stable LM but it   is a much easier model to work with you just run  this and it handles the rest of it it doesn't have   the instabilities that the official QLoRA appears  to have right now and you just have to hit enter   let it run and we'll come back once it's finished  running after the fine tuning is completed all we   have to do now is simply start the web application  with a single command line and we just have to   pass it the base model that we trained it with  and where the lower weights were stored and we   just hit enter and it will run the rest for  us and it will take a while for it to load   especially on a local machine it's just there's a  lot of complexity happening attaching the various   LoRA matrices to the attention layers and unlike  the official QLoRA they don't support merging that   I saw yet I believe they plan to support it  but it does not look like it's supported yet   so without the ability to merge the inferencing  will be slower but it's just so much easier to   at least test your QLoRAs on this than it was  on the previous application and we still have   the ability to tweak a lot of the hyper parameters  that we had on the other application but it's just   so much more straightforward on this there was  so few fewer issues than there were on the other   um so now we will wait for debut and when  it comes back we'll see how well it performs   once the app is running it's just like any of  the chat apps that we're used to and in this   case where you can just test out our fine tunes  so the instruction in this case is your medical   expert and you will answer questions related  to Medical inquiries and the input will be   how to diagnose myosis fungiodes and sarsi  syndrome and let's see how well it performs and oh oops I forgot to stream the output  that typically helps now the loras do have   an impact on the overall inferencing speed for  these models so unfortunately it does make them   a little slower and that is an advantage  of the primary uh QLoRA package if you can   um use it to do the merge you don't have  such a massive impact on your network   so but that's really it this these are the two  main approaches right now to performing QLoRAs   and they're very similar but one is certainly  much easier than the other if this was helpful   please like And subscribe and let us know in  the comments below what you'd like to hear   about next and please tune in tomorrow at 6 PM  Mountain Standard Time for when we're going to   be having our next q a session and tune in  next time when we'll be discussing the power   behind icker tokens and much wider context  and open source models see y'all next time
Info
Channel: AemonAlgiz
Views: 4,059
Rating: undefined out of 5
Keywords: AI, Machine Learning, QLoRA, Large Language Models, Fine Tuning, Alpaca QLoRA, Artificial Intelligence, Deep Learning, Model Training, Data Science, WSL, Windows, Chatbots, Model Optimization, Language Processing, Advanced AI, AI Tools, AI Research, Computational Linguistics, Natural Language Processing, AI Development, Data Analysis, Tech Tutorials, AI Algorithms, AI Tutorial, Deep Learning Models, AI Innovation, Advanced Machine Learning, PEFT
Id: 8vmWGX1nfNM
Channel Id: undefined
Length: 14min 55sec (895 seconds)
Published: Thu Jun 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.