ComfyUI: Animate Anyone Evolved (Workflow Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I am Mali and welcome to the Channel with a reference image and motion pose you can now get a short animation pose motion is captured with a video input the model then uses the motion along with your reference to generate the animation let me show you the workflow along with my own tips and tricks on how to get these results in comfy UI a big shout out and thanks to the paid channel members before I start please note that this requires a separate comfy install we will not be using the existing comfy the tech has different requirements which may interfere with or break some custom nodes as a precaution I sideloaded a new install of comfy UI on my system and installed only the custom nodes required to get this working and there are no issues the Tech is still in development and experimental if you decide to skip this one I suggest you still watch the tutorial as some of the tips and tricks used in the workflows help in other areas of comfy especially animations some knowledge of comfy UI is required for this you need to know how to set up comfy UI install the manager and know about IP adapters and control Nets that being said the channel has tutorials for each one if you are very new I suggest you look at them before continuing the tutorial this is not the original anime anyone as the devs have not released the code yet in fact this is a near 80% reproduction of animate anyone by more we will use an improved version of more anime anyone comfyi implementation by Mr for example this custom node has a different torch version requirement to avoid risk sideload a new comfy UI installation create a new folder called comfy 2 and install it here then install the comfy manager boot up comfy go to the manager and install animate anyone custom node by Mis for example now go to the custom nodes folder within comfy go inside the animate anyone evolved folder right click and open a ter Al run the following command to install dependent packages since I already have it installed I will get a requirement already satisfied message make sure Cuda 11.8 is installed in the system after the Cuda installs go to the python embedded folder within comfy and install copy and cute tensor open a terminal and type in the following commands all the commands and relevant links will be given in the description I already installed these but you should get a positive response FM Peg and FM probe are required as well install FM Pi via command by the way I am running this on python 11.7 the workflow goes beyond the animate anyone noes take note of all these nodes and install all of them you need to download some models for animate anyone they have to be put in a specific hierarchy within its folder in custom nodes start with the vae file download the safe tensor put it under the vae folder in models I have renamed the file then download the clip Vision file it's a dobin model this goes in the clip Vision folder under models next you need to download the four pre-trained models don't rename them go to custom nodes animate anyone evolved and put them in the pre-trained weights folder lastly download the unet file it's the 3.44 GB safe insource file again there is no need to rename it in pre-trained weights go in stable diffusion unit and put it here the other models used are downloaded from The Comfy manager search for ultr l install these two face models for the IP adapter install the vision Transformer H model and the plus face sdxl model using a load video node you can capture a pose motion from any video input the video must be only 3 to 5 Seconds long as the animation are very short connect it with a preview node the video motion capture should be 512 by 768 the trained model is based on SD 1.5 I do not recommend using Force size unless the source video ratio is 4 to 6 if it is the same ratio select 512 into a question mark to resize you should note that for best results the subject in the image should match the subject in the motion reference discrepancies in the subject height or width will cause some distortions the model is still in development and they will improve on this the frame load cap limits the number of frames to output and process let's select 10 you can preview the first 10 frames from the video this option allows you to skip a select number of frames from the beginning a value of 10 means it skips the first 10 and then place the next 10 as defined by the frame cap if I select every third frame it will do exactly that and give me 10 frames a cap of six or 120 does fine on the 490 it can go as high as 160 I tried 240 and it ran out of memory I suggest you start at 12 or 24 then increase to 60 or 120 depending on your vram connect it with a DW pose estimator leave all the settings at default further connect it to a video combined node for the tutorial all outputs are at h264 MP4 for smoother output keep FPS at 30 and increase it to 60 if it is too slow hit Q prompt to generate the DW motion pose animation save this output as it is needed for the video input reference since this is a bit sluggish I am increasing the frame rate to 60 you can see it is now shorter but smooth so here are my recommendations if you have enough V Ram go for a frame rate cap of 120 with an output frame rate of 60 if using 2x interpolation increase the output frame rate to 90 for the basic setup you start with the animate anyone sampler and Branch backwards from there add the animate anyone node drag out and connect the two unit nodes connect the clip fishion en code along with the clip fishion loader select the correct model connect the pose guider and code and further connect it with the load pose guider you need a reference image and a pose motion video for the inputs add the load video and image nodes it is advisable to connect both these nodes to an image upscale for resizing this is useful only if the source is in a 4 to6 ratio otherwise use thirdparty software to bring it to this ratio the video input connects with the pose guidance and code image with the clip Vision also we will use this image as a vae en code for the reference latent the vae loader model should be the one that you downloaded earlier do not use any other model for the final output add the video combine choose the desired output and change the frame rate to 30 I have made some pose motion outputs for the tutorials I will make them and the workflow Json available in the community post for the paid members here I select 60 frames for output and we skip the first 120 frames it's not perfect but it's pretty impressive Tech to run locally at the state you can add frame interpolation at 2x to make it more smoother adding more frames slows it down to correct it increase the frame rate to 90 good enough now let's go over the sampler settings typically I would explain what each and every setting does technically but I don't see the point here the CFG has a neglectable difference here I suggest you stick to the default value steps also do not make much of a difference stick around 20 to 25 the speed can help with some quirkiness however changing the frame cap does a better job if you have a lower vram reduce the context frame to 12 for the other settings keep it at default otherwise they break the output also for some reason I could not run any scheduler besides to noising diffusion implicit models ddim for short see what happens when I use DPM Caris maybe it's my environment I really don't know if you know a fix let me know in the comments in The Next Step let's generate an image based on a pose and use that as the input image for the animation typically a standard pose with the hands on the sides and clearly visible works best later in the tutorial I will show you some tricks to perfect the pose animation I am using the proton Vision Excel checkpoint add the load image node positive and negative encoding and control net Advanced add a DW pose pre-processor and make the connections change the resolution to 1024 for the control net I am using fuds open pose XL rank 256 model add a simple sampler connect the conditioning and the model for the empty latent use a resolution of 832 X 1216 only we will then resize it before passing it on to the animation input convert the seat to input and connect an RG3 node you can skip this step but I prefer the custom node always look up the sample settings and values for the trained checkpoint that's a good way to start when using a new checkpoint a simple positive and the standard checkpoint negative add the vae decode and preview then hit Q prompt before sending this image as input let's fix the face via face detailer add a basic pipe and connect the necessary inputs we will first resize the output before fixing the face choose 512x 768 for the interpolation lenos is the best add a basic to detailer pipe not node select the face YOLO 8m model for the B box prefer GPU also add and select a face segment model the 8m seg 2 60 model now add the face detailer connect the detailer pipe and image change the guide size to 1024 make sure the seed is different from the sampler keep the other settings the same as the sampler enable force and paint and reduce bbox threshold to 0.2 or lower keep the rest of the settings unchanged add a preview and Q prompt let's compare yep that's good now connect the output to the animation input selecting another open pose motion for this one here I am going to cap the output at 120 [Music] frames okay so this is why we fixed the face before animation as you can see the face is still a bit distorted which is normal now it just becomes easier to fix it further extend the basic pipe and add a clip text and code I am going to change the prompt here for the animation face fix depending on the subject change the man face to woman and vice versa add the animate diff loader node select the sdxl beta model you have to download this model manually connect the basic pipe drag and add the context options node change the context length to 32 on Lower vram stick to default change the beta to square root linear now I am going to add IP adapter nodes I want to use the same face from the image generated output which helps with consistency connect the model and add clip vision and IP adapter model select the sdxl face Vision Transformer H model add prepare image for clip Vision node then add an image crop node we would crop the image and just get the face connect the image output adjust the values to get the face centered change the crop position to center now add a simple detector for animate diff use the same bbox and segmentation models as the previous face detailer connect the animation output to the simple detector add detailer for animate diff and connect the image output and SES add and edit detailer pipe node change the model and positive conditioning and then connect the pipe to the detailer add the seg paste node I made a mistake here do not connect the image connect only the SS from the detailer dup locate the output and connect it reduce the B box and sub threshold to 0.2 increase the crop factor to 3.5 and the drop size to 50 change the sub dilation to 10 these are good settings to start with however you should play around depending on your image animation ensure that the settings for the detailer are the same used for the face detailer including the seed a Deno value of 0.3 to 0.5 works great connect the animation output to the seg paace node and Q prompt this takes some time if you add frame interpolation in between it will take even more time if frame interpolation is added ensure all three inputs are from the frame interpolation node rather than the VA decode the output is pretty consistent except for the background typically it's advisable to generate this animation on a plain background and then you can use any third party software to edit the background the way to do this is that the starting point of the pose motion video should be as close as possible to the reference image here the ballerina has a very different starting pose so I took that pose as a reference for the DW pose and generated an image the image generated pose is very close to the motion animation starting pose I then use face detailer to First fix the image face and then pass on this image to the animation input as a reference the image on the left has severe distortions I then used animate diff and IP adapter to resemble and fix the original face the result is on the right you can use frame interpolation to smooth out the animation so how did I do the line art animation simply put last November I made a line art workflow you drop in your image and generate the line art then load the line art as a reference image with a pose motion and it does a beautiful job animating it I hope the tutorial was helpful and you learned something new in comfy UI until next [Music] time
Info
Channel: ControlAltAI
Views: 6,977
Rating: undefined out of 5
Keywords: animate anyone, animatediff, comfyui, stable diffusion, animate anyone ai, ai animation, ai video, animatediff comfyui, ai, comfyui tutorial, ai animation video, ai animation tutorial, stable diffusion animation, ai video generator, ai video workflow, ai animation generator, animatediff tutorial, ai animation video generator, ai tools, animation ai, animatediff stable diffusion, ai video animation, stable diffusion video, generate ai animation, ai image animation
Id: KttO3Wiyq3w
Channel Id: undefined
Length: 25min 30sec (1530 seconds)
Published: Sun Feb 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.