Animation with weight scheduling and IPAdapter

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys this is mato and it's about time that we talked about animations again few days ago I shared this video on my socials and even if it might seem very simple it showcases a new strategy to use the IP adapter that is incredibly effective this whole animation is basically a cross fade between nine key frames and it only takes two IP adapters it's not a video to video there is no rotoscoping no control Nets it's just pure text to video and takes only 8 gigs of vram before delving into it let me stress on the fact that I am not an anime diff expert all I know mostly comes from the Bano Doo Discord Community there is a huge source of inspiration and information so let me thank palom kej Nathan sheep and all the other guys over there if you are into AI animation you should check banad Doo okay and enough rhetoric let's see some action this is the whole workflow here on the upper left I have nine key frames choosing the right frames is instrumental to a smooth animation so let me show you how I made these images as you can see this is a very basic workflow no control net or phase models to enforce consistency if I generate four of these you'll see that more or less I get always the same person these models are extremely biased the trick is to trigger this bias with sensible prompting to do that you need to be specific enough and enforce the model to take from a pool of very specific images but also generic enough not to give the model reason to derail from the subject so the checkpoint is realistic vision and the prompt in this case is extreme closeup beautiful blonde woman with very short hair pixie haircut Etc the trigger words in this case are beautiful blonde woman with very short hair pixie haircut that pretty much gives you always a kind of Tinkerbell type of person now that we tricked the checkpoint all we have to do is to change the prompt slightly to cover the full body rotation in the second prompt I added left profile to get the back of the head it's a little more complicated but in this case generating six at the time I was finally able to get a proper one of course using a control net or an IP adapter to drive the composition would help but I'm just being lazy here and I wasn't trying to make a perfect animation just to demonstrate the technology so when you have all the key frames ready it might be a good idea to inspect them for small defects the IP adapter plus model that we are going to use is very strong so we need to remove any detail that could drive it off for example in this Frame we have an earring that is not present in all images so I'm going to remove it also in the hair there's something that looks like an hair pin that could be confusing so I'm going to remove that too here on the neck there's also something weird if your repair job is not great you can apply a quick second pass with loow the noise using the original prompt that will take care of merging everything together okay back to comy for the checkpoint this time I'm using dream shaper not all models work uh you'll have to try them but dream shaper seems to go very well with animate diff okay since life's to short for slow Generations we are going to use LCM to speed up the process so I use a Laura loader select the LCM model and connect the model pipeline next I need a model sampling discrete node and select LCM sampling in the cas sampler I can lower the steps to eight the CFG to two and select LCM sampler and sgm uniform scheduler if I connect the model pipeline now the generation will be super fast yep perfect now it's the turn of the animate diff nodes I start from use evolved s l i drag the m models Pipeline and select apply animate diff model then the motion model and pick the load animate diff model from the list I'm using the animate LCM model but it actually works with any model even if they are not LCM specific we'll try later finally I need to define the context window I drag the context option input and select standard static context options so there are a lot of options and I don't pretend to understand them all but uh the most important thing to change is the context overlap anime diff models are usually trained for 16 frames to be able to do longer animations these nodes split the frames in chunks of 16 each and overlap them by a value specified in the context overlap option in this case I'm increasing the value to six another important parameter is the beta schedule I believe Auto Select would pick LCM by default but the colors uh come out a bit washed out for me so I'm going to use LCM average now I connect the model Pipeline and we are ready for IP adapter first of all I'm loading all the key frames I need a load image node and select the first reference then I repeat eight more times and select all the frames now I connect them all together with an image batch multiple of course you can use a load image node that takes all the images from a directory but I like to see all the frames on the screen I don't have enough inputs so I duplicate the batch node connect them together and add the remaining images perfect now that we have a batch with all the key frames we can use a new node called IP adapter weights that will set up everything for the cross fade animation let's see how it works I connect the image batch and set six frames this is not the duration of the whole animation but the length of each transition so in this case six frames by 9 images it will be a 54 frames animation if I connect a preview to image one you'll see that the not mod simply repeated each frame we also have a weight output uh this contains the weight values that will be used by the IP adapter to modulate its strength and generate the cross fade animation it's a list of float values but we can visualize them with a mask from list note and a mask preview as you can see we are fading them white to black that are the values in the text prompt if I set. five and zero The Mask will go from 50% gray to Black okay let me set it back to one okay uh one image batch is not enough though because we need to fade from one frame to the other to do that I can change the split strategy with the method option if I set alternate batch and check the difference you'll notice that the image batch now contains only odd frames and the weights also changed I can add a second preview for image two and this one will contain all the even frames with these two batches and the weight we can now Crossfade all the images between just two IP adapters and technically we could have an endless number of key frames without impacting performance very well then uh let me remove the preview and set up the table I need an IP adapter bdch and the unified loader there's no other conditioning so I'm using a plus model that is very strong then I convert the weight to an input and connect the first list of weights and the first image batch this will take care of the odd frames I duplicate the IP adapter connect the model pipelines the the second list of weights and the even frames I can bring everything closer and tidy up a little okay almost done I'm connecting the model pipeline from the animate div to the unified loader and from the second IP adapter to the K sampler the latent batch size will be the total number of frames we can calculate them but multiplications are hard so I'm converting this to an input and connecting it to the total frames of the IP adapter weight note we are all set I replace the preview with a video combined note select H 264 and generate and just like that the animation is ready now there are a few issues the first is that the starting frame is almost ignored and the last one doesn't stay long enough to fix that I can add eight frames at the beginning and at the end and try again and now it's much better the second problem is that the animation is a little flickery and there are small glitches I can try to increase the number of steps that would probably get rid of the flickering but it sometimes introduces more issues so what I like to do is to generate a first rough animation like this one and then app a second pass that hopefully will get rid of all the glitches so I'm duplicating the cas sampler change the seat and connect it to the previous latent I want to be able to use a very high the noise so I'm adding a control give control net I need an advanced control net apply Noe connect positive and negative lower the strength and the end percent to 6 and load the control gear model as reference I'm sending the previous animation now in the second pass we don't need such a strong conditioning from the IP adapter because the composition is already set so I'm adding a new IP adapter budge and connected to the original model pipeline uh skipping the other IP adapters so for the second pass we are only using one IP adapter and that will save us some computational power I need a very light strength 35 should be plenty enough and connect the second Cas sampler for the reference images technically I would need to generate a new batch that contains all the frames but it's such a light conditioning that in this case we can reuse either the odd or even frames that we already have and that's it we can generate I mean it's not perfect but considering that we are doing everything out of just two IP adapter it's impressive in a real life scenario you are going to add more conditioning like control net prompt travel or whatever at this point of course I can add interpolation with r and increase the frame rate to get a more fluid animation okay cool let's see how it goes with a different animate diff model now we have LCM let me grab V2 and reset the beta schedule to Auto Select yeah very nice the result is less realistic but the animation very consistent let me also try with a different checkpoint I'll go with realistic Vision dumb woman don't look at me like that the blink and the smirk at the end just perfect okay let's change topic as I said this is not really supposed to be used on its own but more like a template for more complex workflows don't worry I'm not leaving you hanging I'm going to show you an example so the key to get great result is not to fight with anime diff but exploit its strength for example these models are great for fire flame Sparkles and also water ocean waves so let's do that here I have a template that basically uh replicates what we've done before but instead of nine key frames I only have two the frames are very different concepts so I'm going to need a prompt schedule to get the best results I need a batch prompt schedule node uh connect the clip and the positive conditioning now I would need to synchronize the key frames with the prompts I could do it manually calculating the position of each frame in the animation but that's lame so I'm going to use a new node called prompt schedule from weight strategy here I have a weight strategy input that I can connect to my IP adapter weights note I convert the text area of the prompt schedule to an input and connect it to my note now I can describe the two frames one is Flames colorful fire Sparkles smoke the other wave ocean water the node will take care of Distributing the frames for us and if we change anything in the IP adapter weights uh the changes will be reflected automatically to The Prompt I'm also converting the number of frames to an input since I have them right here in the IP adapter weight note and I'm also appending some generic text like highly detailed 4K sharp I don't need the positive prompt anymore and we can try to see what we get okay a nice transition from fire to water so far we've been using a plus IP adapter model which is very strong since we are going to add more conditioning I want to lower the impact of the IP adapter and use instead the vitg model which is very powerful and often overlooked okay next I'm going to add some text to the animation to do that I'll use the QR code monster I already have the control net nodes up here I copy and pass them near the K sampler the model is QR code monster and connect the positive and negative I'm setting the weight pretty high but I'm going to use a scaled soft weight node with a multiplier of 92 to soften the effect I've prepared an image with the text latent in it you can use one of the many text node if you want for the sake of Simplicity I'm going to use this image I'm also loading the second reference with vision okay so the first control net is for latent and I need a new one for vision I'm copying this one connecting them together adding the missing pipelines and then to the K sampler now I need to schedule the animation in the same way we scheduled the fire and water images I could do that by using a new IP adapter weights note this would let me use different values for the control net like for example if I want to lower the weight or whatever but if I'm okay with using the same parameter that we used before then I could better use the IP adapter weights from strategy I take the strategy pipeline from the previous note use an image bar multiple to connect the images and then to the IP adapter weights note this way any modification I do to the main IP adapter weights uh will be reflected to the control net this is a very convenient way to keep everything in sync now to connect the weights to the control net I need a latent key frame batch group note and I'm finally ready to connect everything together so image one goes to the the first control net image two to the second weights in the first latent key frame and weights invert in the second that should be pretty much all I can generate ah this is nice already let's try another SE yeah pretty cool now since I have this big wave coming I want to try to have the text Vision dissolved into the waters to do that I can add another frame that is actually another wave image so now I have three frames One Flame and two waves in the text I duplicate wave ocean and water and since I want to dissolve the text Vision into nothing I can simply add a black key frame now I know this is a very lazy way of doing it it I should actually calculate the new ways and whatever but I mean it works and it's easy to do so I'm using this method let's see how it goes yeah pretty cool let me try with another checkpoint I have absolute reality here and I want to try my luck with a new seat maybe the vision text is not very readable I'm Crea ining the weight of the second control net to 1.6 yeah would you look at that very cool okay let's say I like this one so I'm going to upscale it with an Ann latent upscale version 1.5 connected to the latent before the second Cas sampler and I'm lowering the the noise to7 because I don't want to lose too much I'm enabling the video combine and Q prompt well guys we covered really a lot of ground today and this is only the tip of the iceberg I believe this new noes really empowers the animator in you so I'm looking forward to seeing what you are going to do with these new toys let me thank again all my sponsors remember that all my code is open source and the workflows always is available to all for free if I can't afford to do that it's only because of the sponsorship especially if you are a company and you are making a profit out of my work well it's in your own interest to have me working on updates and back fixes okay I think that's all for today see you next time ciao

Info

Channel: Latent Vision

Views: 28,606

Rating: undefined out of 5

Keywords:

Id: jc65n-viEEU

Channel Id: undefined

Length: 20min 50sec (1250 seconds)

Published: Tue Apr 30 2024