Fine-tune Stable Diffusion with LoRA for as low as $1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everybody this is Julian from hugging face a lot of folks still think that fine-tuning large models is difficult and expensive well in this video I'm going to show you how you can fine-tune a stable diffusion model for literally $1 and we won't even write a line of code we'll just use an off the shell script from the diffuser library and advanced training techniques implemented in the PFT Library which as we know by now is parameter efficient find un need so this is pretty cool stuff uh it's very efficient let's get to work this video is based on a blog post that my colleagues wrote a little while ago and I'll strongly encourage you to read it of course I will put all the links in the video description so in a nutshell what we're going to do here is we're going to start from a stable diffusion model we're going to fine tune it on an image data set of of Pokemons which is pretty funny and we can do this very efficiently using uh a technical Laura which stands for low rank adaptation I will explain this in a minute um in a naturell this techniques helps us reduce the number of parameters that we actually have to fine-tune and that definitely uh lowers the bar on how much infrastructure is required okay so going with this uh this is the actual script I'm going to use very simple and of course we'll generate some Pokémons with the train model and the model is also on the Hub so you'll be able to replicate this and um and with your own images right so that's pretty fun so first things first let's take a look at Laura um what it means in hopefully plain English and why it's an amazing technique to reduce the amount amount of infrastructure that's required so as you all know traditional fine tuning updates all the model parameters so when we're working with I you know billion or multi-billion parameter models obviously this is a time consuming process and it becomes expensive quickly because we need lots of infastructure and Powerful gpus with lots of ram to even fit the model in memory right um and a while ago this new technique came out it's called Laura which means low rank adaptation I will explain what the low rank uh bit means uh please go and check out the research paper if you'd like and what Lura says is we can fine-tune models uh and update models by simply training two small matrices and multiply those two matrices add the result to the original model and have a fine tune model right and obviously the point here is that instead of fine-tuning the full original model we're just going to learn those update matrices which are much smaller so this is from the research paper um and um looks scary but hopefully I can translate this English so we start with an original model so weight Matrix which has two Dimensions D and K and instead of learning uh the total number of parameters which means the product of D and K we're only going to learn the two matrices A and B which we can multiply right and a and b are much smaller so they have smaller rank right which is the I guess the math term for size and um and the product of those two matrices will be added to the original model okay so as you can see we're still freezing the original weights right from uh the w0 original Matrix and we're only learning A and B right so only A and B contain trainable parameters which means that instead of tweaking all of w Z's parameters which again would be D multiply by K we only learn the parameters in A and B which are uh basically R D+ r k or R multip by D+ K so as you can see we're turning let's say what's a a quadratic problem into a linear problem right because if we scale the size of w0 we don't have to learn uh d multip by K we have to to learn D+ K multip by a small integer okay so the scaling uh is now linear instead of being quadratic and that's the core interest of Laura right we can work with bigger models but we don't have to scale the amount of infrastructure uh as much so what this really means in practice is we can reduce the number of parameters by at least a thousand uh so that means training only maybe 0.1% of the original parameters with negligible loss of accuracy so that's a huge bonus because now we can train those large models on um potentially a mid-range GPU right because we we don't need as much GPU memory to fit the model and at inference time uh we just collapse everything so we load the original uh model unchanged we load the Lowa w and we add them okay so there's no latency and in fact you'll see in my model repository we only store the um the Lowa weights right and when we load that process is automatically implemented by the the library so no difficulty no latency right so this technique is implemented in uh the PFT library and again PFT means parameter efficient fine-tuning and this is what the uh the blog post here is using okay hopefully that gives you a little bit of background again if you want the hardcore math please go check out the the Laura paper but again the intuition is we don't touch the original model we just learn a couple of matrices that are much smaller right they have lower Rank and we can just add that update to the original model and get uh amazing results okay so let's take a look look at the actual process I'm starting from the diffuser Library which I I cloned uh to to this machine and if we go to examples text to image we'll see if we have different scripts to train um diffusion models so there's the vanilla one and there's the Laura one okay so feel free to go and and read this um it's not strictly necessary but if you're curious about all the details you can certainly learn a lot of stuff here um to keep it simple I'm just reusing um the uh the script from the blog post uh I don't think I tweaked anything so we're going to find you this model stable diffusion 15 on the Pokemon data set uh which you can see here so it has 833 Pokemon with descriptions right that's a fun one but again uh it would be reasonably easy to build your own data set with just images and uh and descriptions right so that's what we're doing here um we're going to save the model locally here and once we're done we're going to push the model on the Hub uh with this name okay so we're launching the the training script with accelerate um the rest is really just uh all standard parameters again feel free to feel free to tweak right um You can change the validation prompt if you like so validation images are generated uh regularly if you want to uh keep an eye on the the training in process so here we are validating with uh total images why not okay so that's that's the script so now all I need to do is really launch this okay which is simple so how do we how do we actually train this what kind of instance is this well this is a very small instance this is a G4 DN x large AWS instance and which is probably the smallest GPU instance you can get on AWS right if you don't know about G4 instances go and check out the product page let's go look at those things here so yeah you can see this is the smallest one it's got one GPU it's uh as we saw here it's a T4 GPU okay um not definitely not one not one of the biggest with just under 16 gigs of RAM available and and it's got 16 gigs of memory and On Demand price is 52 cents an hour okay so definitely not expensive especially when you compare that stuff to the bigger GPU families like P4 let alone P5 right and that's I think that's the whole point here the whole point is obviously to train uh in a in a cost effective way but it's also to be able to train at all because availability of p4s and p5s is pretty challenging to say the least um and thanks to that Laura technique uh you can actually train your models fine-tune your models on much more GPU instances which are very easy to grab right whether you want to use G4 or G5 uh which I'll show you in another video later um this is just available everywhere this is available in I believe all the Ws regions uh I keep meeting with customers who you know complain they can't get p4s let alone p5s in their regions well they certainly can do G4 and G5 right so this solves a lot of problems from availability to to cost okay so pretty cool um I'm making a point to use the smallest one here um and obviously you could you could scale a little bit you could try uh g4d and2 XL which has four gpus you would probably get some speed up there uh of course it's a bit more expensive uh or you could go and try G5 but again um I wanted to show you you could find your this on the smallest GPU instance on theas right so we just need to launch this uh and why don't we do that right so that's my script and we just have to launch it okay there we go okay so it will take a while so we're not going to run to completion um I just want to show you that first of all the script works and uh how long it should take right so we can see at the bottom of the screen this will take something like six hours okay and maybe you're thinking oh wow that's way too long yeah again uh the fact that this is running at all is just amazing okay on this tiny instance um so you can just go and do this you can scale up if if you want but again you can run this on a tiny instance uh for very little cost right so about six hours Let's interrupt it because of course I've already done this and some of you are thinking wait you said I could do this for $1 so 6 hours multiplied by 52 cents that's probably $3 so how do I do this for $1 well you do this for $1 by using spot instances okay and if it's the first time you hear about spot instances you have been missing out spot instances are an amazing way to optimize cost and so go and read about that if we're looking at the price for G4 GN XL um and in the uh in the US East one region so we do see that the on demand price is 52 cents and we see that the spot price is let's say 15 cents right and this is very consistent that's a week that's a month and that's three months okay so super super stable so no worries no problem you will get G4 DN XL at 16 or 15 cents an hour okay multiply this by six it only cost you a dollar right so I wasn't lying I never lie Okay so stable diffusion $1 um so I push this to the hub obviously after after six hours and you'll find the model here okay there's everything there's the checkpoints uh there's the uh the tens board logs if you're if you're interested uh we have uh a few validation images right which are pretty nice I included the actual script just just so you can run exactly the same thing I run okay you just need to clone the diffusive library and put this in the right place and run it and I also included the full uh the full training log right so the actual output um just to show you that yeah you know this is how it happened right it's not fascinating but I know some of you want the full training log and uh here it is so why don't we try the model now okay so added a bit of information so how do we maybe I don't know okay so let's wa for a few seconds for the model to load and let's see what kind of Pokemon we get here all right well that's a pretty nice flying unicorn so there you go now you can generate Pokémon all day long so there you go fine tuning doesn't have to be complicated because we provide a ton of scripts and you saw a stable diffusion here but uh we have fine tuning scripts for for everything so please don't go and spend weeks writing fine tuning code there's a good chance we have something that you can start from and tweak if you need to and then when it comes to cost uh again techniques like Laura implemented in the Library are amazing um you can actually run this demo on even smaller gpus but this is the smallest available on the Ws and the cost is negligible so you could find tune tens hundreds of models in parallel for negligible cost you could also of course do this on Sage maker just run that same code on Sage maker is uh is no problem at all um and so you can find in tons of models and uh an experiment at very low cost um and scale on the cloud and build amazing stuff for uh again very very low cost so go and experiment go and run this model go and uh maybe add your own images to it um see how easy it is to do it okay well that's really what I wanted to show you today uh there's more coming uh I have a llama to fine-tuning video which I think is pretty cool so uh working on this one next few days keep your keep your eyes open for this and until then thank you for watching and keep rocking

Info

Channel: Julien Simon

Views: 5,315

Rating: undefined out of 5

Keywords: aws, open source, artifical intelligence, computer vision, image generation, cloud

Id: Zev6F0T1L3Y

Channel Id: undefined

Length: 17min 36sec (1056 seconds)

Published: Mon Oct 09 2023