Stable Diffusion IPAdapter V2 For Consistent Animation With AnimateDiff

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone in today's videos we are going to talk about the new update IP adapter version two for animation workflow in more detailed way we are going to demonstrate different way of making our workflows with various settings for our characters in IP adapter and the backgrounds using IP adapter as well there's a different way in IP adapter that we can make backgrounds to be dramatic Styles or steady Styles but natural motions movement using animated motions model that is also collaborating with the control net now there's no one correct way to do in comfy UI for animation it is all about the Motions movement how you want it to present you can make your style very steady and little movement on the background or using very dramatic Styles like this big seawave of water rushing onto the screen some asked last week in the comment section why don't we just use an image as the background and need IP adapter or any custom nodes for making consistency background well there's no textbook answer for comy UI and this is not the way to go with generative AI plus if you just want an image background and stick it on the back and think this is consistency then you can just use a video editor paste it on a video this way you don't even need generative Ai and Computing all this workflows running with multiple AI models right so let's get started so this workflow has been updated for the IP adapter version too here we have the IPA characters and background IP adapter groups mainly for styling our characters and backgrounds you can do this for multiple characters as well right now in this workflow from what I've tested so far using the IP adapter Advance is more stable than using other IP adapter custom nodes to load the reference image into the model's data as you can see we have the IP adapter loader this is the first unified loader that connects with our stable division models data coming from our loader groups that we've defined the checkpoint models luras Etc we then pass that data to these IP adapter groups we have the first connection with the IP adapter unified loader and then it processes the first IP adapter for our character image frames in this case I used this white dress fashion demo image for the character's outfit then you'll see the second IP adapter below which we are also going to connect to the unified loader another one for our second IP adapter advance to process the background image for our video right here as you can see with this second IP adapter we have the IP adapter unified loader as well but the connection for this one is directly connected to the first unified loader the output from the the previous IP adapter is passed into this IP adapter here so this is the new design of Ip adapter version 2 where we don't have to load duplicate IPA models in one workflow Again by passing the IP adapter connections to the second one we'll have the same model loader and the same generation data flow this can reduce our memory usage saving a lot of memory during the workflow execution and we're also achieving the same effect by using two I PA images and processing each image frame individually here so you see this background mask that's attached to our attention mask it's specified for creating the background mask using this image in this case I have a street view of an urban city there will be some movement happening we're not just going to have everything static and still right that wouldn't be very fun or realistic if we want to have motion with people walking car wheels moving some smoke coming from the Cars lights on the banners flickering Etc we need to have a little bit of movement we don't need a ton of flickering or movement to make it look natural like it's a shot taken from a camera but realistically if a camera shot is taken from this view with the main character in The foreground usually the camera lens would be focusing on the front characters while the background is a little blurry and out of focus but you'd still be able to see people walking by cars moving and so on that's the effect this new IP adapter workflow is configured for giving that outcome it feels like the camera is focusing on the characters walking towards the lens in the foreground and you see a car driving by Within those seconds I kind of like this style I know some people prefer the completely still static background Styles where nothing moves and the lighting doesn't change but you know for a situation like this urban city backdrop we have to be realistic if it's a real Street Scene happening you'd expect things to be moving in the background objects that's more realistic than just copying and pasting this static background behind the characters of course you could do that you could just use a video editor to composite a Ste image behind your characters if you really wanted that look so why would you need an AI to generate that kind of static scene that was a point raised in Pre discussions where people said they wanted a solid completely still background behind their dancers and I was thinking well in some situations sure you could do that if you're in a room or backdrop setting that is genuinely very static with no moving objects then yes having a totally still shot could work but imagine if it's an urban city like this or even a beach scene the water is is never actually going to be frozen solid and static right honestly if you make a beach background with Still non-moving Water behind dancers in the foreground it's not going to look natural or make sense so rather than going for that super high definition clearly copy pasted look with the background just stacked behind the characters I prefer leveraging generative AI to create realistic motion and movement throughout the whole video I could just copy and paste a background image Behind these characters but by using generative AI to synthesize subtle natural movements in the backgrounds it ends up looking far more realistic and lifelike that's the effect I'm going for with this workflow but this workflow is not just utilizing the IP adapter or any generative AI nominally rather we're leveraging AI in a meaningful way that's how this worked flow is designed to operate I've also updated the segmentation groups here with two options the first one we're using is our good old Soo segmentor for identifying objects to match each video while the background will use an inverted mask we're staying the same with that portion the second option we can use for segmentation is the segment prompts in our case we have dancers so you can change the prompt text here just a simple description like a mail or if you wanted something like an animal you could put a rabbit for example for me I've put dancers here and it will do segmentation for that using the same loader as the segmentation models for identifying objects we also have the same B which is the grounding Dino modular loader for our grounding Dino models connected here as well here's a preview of this and I did two previews so you can see which method performs better and choose which segmentation approach to use you can switch between the segmentation methods by connecting the grow mask input right now I'm connecting the Coco segmenter which is our tried and true method that also worked in previous versions of this workflow if you want to use the segment prompts via the segment anything custom node instead you can connect the output mask from segment anything into the grow mask input node that will apply our segmentation output and input mask using that method so it's very flexible I can run test executions check the previews of both results and see which one looks better if the Coco segmentor output is good I'll reconnect that if the segment prompts is working better for us then it's simple to switch to using that instead for segmentation Within These groups that's basically it for this update I can run two examples using this workflow one fully applying the IP adapter image output sending that data into a control net here for masking the backgrounds we're using the tile model as well which we had in previous versions but now with the updated IP adapter performing a bit better the other example will run the full execution without the tile model so you can see the difference in the video outcome okay let's try one run using the control net tile model first and aim to keep the background as steady as possible obviously there will be some moving objects since nothing is going to be completely static with the water background I'm using here I'm setting it to 0.55 a bit higher since I need some of the water movement and in the text prompt I'll need to add something about the water waves so let's put water wave at around 0.8 or 0.9 strength we don't need building this time so I can remove that word okay everything is set let me double check you'll see the background in the outcome isn't going to look like our previous example but the character outfit will remain the same I just want to show how flexible it can be using the IP adapter to create your own styles with different images one more thing I'm going to use this Instagram video made by AI as the source video input I'll set it to 50 image frames and we can check out the result okay the workflow has finished generation so let's look at the result this is the first animated group from the first sampling run as you can see the water starts flowing in a natural motion rather than having just a single static image in the background with non-moving water that wouldn't make sense for an animation like this in this situation we need the water to be moving which is why we use the animated motions model to make elements seem lifelike and in motion as much as possible the second group is the sampler run where I'm enhancing some more detail on the fashion elements for this one I selected the Deep fashion segmentation YOLO models in the segmentation group for improved detail enhancement so you know the wide dress is going to look nicer a little nicer than the first sampling p it's also enhancing a bit better with the skin tones and details as well then we have the last face swap Group which will be the final step in this process let's take a look at the full preview as you can see I like having a little bit of movement in the background water waves here you see some waves coming towards the land which looks more natural with this kind of subtle motion not too much movement though if you overdo it it could start to deform the mountains or rocks above the sea and it won't maintain a consistently high quality stylized look for the backgrounds when animating backgrounds we do need to allow for some motion look at the water here there are some waves coming up right and here as well we have a little wave reaching towards the landside that's what I would apply if the background is this type of coastal situation of course if you don't have any moving background elements for example if the background is just a wall or interior room with no moving object then you don't need to use a higher tile control net strength to induce movement you can set it lower as low as possible to keep things steady and that's the method I use combining control net with the IP adapter for adding realistic animated motions here but I have another example not using the control net tile model so let's take a look at that it this is another window I had preloaded I'll disconnect this and simply connect the animate control checkpoint directly to the output as the conditioning for the stable diffusion models data here you can see there's more movement happening because the IP adapter is only using the style reference image for background motions it doesn't treat the IP adapter reference as a static background in this case we're not passing it through any control net or tile model for the background the IP adapter here is just adopting the colorations and and noise patterns from the reference image based on the noise in this reference it will process it as a dynamic background style so in this case it looks like the characters are walking on a street in some small Alleyway of an urban city behind some buildings on a narrow road not a Main Street like this reference shows this is how the style turns out when using this approach for production I'd say we'll likely want to use a combination of both methods using tile control net to keep a steady background with some minor movements for the character's steady walking motion but then if you see the last few seconds where the characters jump back to this position will want to apply this Dynamic blurring background style using the IP adapter method for that portion we could try this out with say a beach background instead of an urban one and you'll see the difference we'd have this beach scene but without using control net to stabilize it and remove movement it's solely dependent on the IP adapter digesting the noise from the reference image and using those colorations as the stylized output for the animated backgrounds let's run it and you'll understand what's happening okay the second demo has finished generating let's check out the result now as you can see in the first sampling and the detail enhancement sampling the water waves have this very dramatic exaggerated style you see in the first two seconds the waves are following the hand motions let me play a preview of this see how for the first couple seconds the waves are reacting and following the hands then once the hands go down the water crashes in this big dramatic wave pattern hitting the rocks and splashing up it has this very dramatic exaggerated feel without using tile control net to stabilize the background so it depends sometimes if you want the Motions to have that more dramatic Amplified style for whatever your video calls for then this is the approach you'd use but generally if you're just looking for normal animations and don't need those extremely exaggerated water motion effects then you wouldn't want to generate it this way without stabilizing the backgrounds that's how we achieve those different background motion Styles and of course we have the character outfit elements here too for these I'd recommend using an image editor or something like canva to remove the background before uploading into the workflow that way the IP adapter can focus solely on recreating the outfit style for the character without any distracting background noise or other elements this allows the IP adapter to hone in on the intended outfit look you want for your character so this is one method for utilizing IP adapter inferences for stylizing your animation videos achieving different motion effects that you want to create you could apply this not just to dancing videos like my YouTube examples but also to cinematic Styles or whatever type of animated sequence you need want to create some specific specific animated effect add prompts describing that effect along with stylized IP adapter references and you can synthesize that cinematic look through this workflow approach it provides a lot of flexibility for generating all sorts of animated video content in various desired Styles we have the steady background approach like this first example the more dramatic exaggerated motion style like the second the update version of this workflow will be available to our patreon supporters so you all can go update to the latest release all right I'll see you all in the next video have a great day

Info

Channel: Future Thinker @Benji

Views: 7,892

Rating: undefined out of 5

Keywords: comfyui, ai video generator, ai art, IP adapter, animation workflow, generative AI, IP adapter Unified Loader, IP adapter Advance, AI-generated motion, segmentation groups, Segment Prompts, visual effects, ControlNet Tile model, tips and tricks, realistic motion, comfyui upscale, ai art tutorial, stable diffusion workflow, stable diffusion, stable diffusion tutorial, stable diffusion 教程, IPAdapter V2, Stable Diffusion IPAdapter, stable diffusion consistent animation

Id: aiIzE8Oq-WI

Channel Id: undefined

Length: 17min 40sec (1060 seconds)

Published: Mon Apr 01 2024