AI 3D animation: convert your favourite 2D cartoons with AnimateDiff in ComfyUI #stablediffusion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
old cartoons and animations can be easily converted into fascinating 3D scenes with comfy UI first transform your images into noise then with control net Rave and Anime death create a fresh new cartoon everything is possible I hope you enjoy this tutorial and create incredible new videos the workflow I am going to present can be found in my page in open art. you will find their examples and a detailed explanation about how to use it in resources you have a couple of Clips to practice however the video of the tutorial is too large you have to download directly from the source it is a free video from VII and the link is in the description you can test the worklow for free in Open art. click on the green launch on cloud button to start a comfy UI virtual machine the machine will load in few seconds and when ready The Comfy UI canvas with the workflow will appear load the video downloaded from vidzi and you are all set for a short demo click on Q prompt and check the results for a 24 frames example if you are going to run the workflow locally download it from openart by clicking on the gray download button in comfy UI as usual drag and drop the workflow over the canvas on the top left of the workflow you you have a series of buttons which will Define the steps to follow this is an advanced workflow which requires quite a lot of tuning I provide some guidelines in how to do it but the final results depend on you the Creator I am pretty sure you can tweak and tune it much better than me please write down in the comments how can you make this workflow more consistent and with better results in step one we will set up the video frames and select which one to use as a key frame in the load video node select or upload the cartoon you want to convert into 3D set the maximum number of frames of the film you want to use this value can be changed later but use a relative large amount in step one to find out which key frame to use set the width of the video higher values tend to provide higher resolution results but require more V RAM and GPU finally select the frequency at which we extract the frames a value of 2 to four is reasonable to balance consistency and speed we will also make sure that the checks and animation models are correctly loaded the first model is animate diff we will use version three this model is used in the last render step to deflicker the animation if you use version 3 remember to load the version 3 adapter as a Lura next we have the checkpoint that we use used to convert the image into 3D we will use a specialized 3D animation checkpoint like 3D cartoon vision we use here the civit AI model loader so the model is directly download from civit AI so you do not need to do it manually because of the high memory requirement of Rave we will use evolv sampling with a Content length of one with this trick we will be able to run more frames as by default with Rave the last Model is the 2D recognition model model this is just a regular SD 1.5 model in this case Juggernaut reborn we also use the Civic AI loader so you do not need to search for the model we will render the complete animation so we deactivate all the render groups we activate also the process all frames node because process all frames is active the condition of switch any becomes true then all the frames will be processed run your workflow and wait until the frames are loaded in the frames preview node when the frames appear inspect them one by one until you find one which you think is the best key frame the key frame index will be the frame number indicated minus one in this case key frame number is 26 so the index is 25 this number is used in step two in this step we convert the 2D key frame into a 3D style image we will only process one image and tune it until we obtain me something we like we only use one image to do this deactivate process all frames and set the number of testing frames to one in key frame indicate the key frame index in our case 25 which corresponds to frame 26 we want to do the full transformation so activate all the groups in the render groups active switch because we deactivated process all frames now the condition of the switch becomes false therefore the images to be processed are selected from the get image from range batch node this node takes the key frame index as a start index and process the number of testing frames we set which is one in this step the first step to convert the 2D frames to 3D with Rave is to convert the original frames into a noise latent for that we are using a modification of the unsampled the core of this group is the sampler custom node it works in a similar way than a regular K sampler but it allows us to change the sigmas which is useful to unsampled we need to load the one we want in this case we use the K sampler select node and choose DPM p2m the sigma's input is defined by the scheduler the number of steps and the Deno we use the basic schedule node and now it comes a trick we use the flip Sigma node to reverse the steps with a regular K sampler the sampling of the latent goes from 0 to 25 with flip sigmas we go from 25 to zero this converts the latent image into a noise in 25 steps therefore this group is converting an image into noise like the UNS sampler for the last of the settings of the sampler custom we will not add any extra noise and use a CFG value of three the next step is to convert the noise into a new 3D image to do that we take the 3D model and the noise latent from the previous step and use the Rave K sampler the transformation of the image is done by the conditionings which are defined in this workflow in the positive we indicate that we want the image to be converted into 3D we do that by using tags such as as 3D style 3D rendering blender or depth sometimes the addition of some words will help to the sampler to get what we want in this example we will add green eyes to avoid the sampler makes eyes of different color in the negative we use the typical prompts plus 2D cartoon to reduce the chance to have a 2d looking image in the first control net we use line art which which will draw quite a lot of the details however we do not want the weight to be very high if we do that the final image will not have the 3D look we want as usual we need to connect the line art pre-processor with the video frames the second control net is depth with this one we want that the image has the desired 3d effect too high weight will cause artifacts to appear so we cannot push it to the maximum the pre-processor we use in this example is Zoe depth anything however different cartoons may work better with other depth pre-processors you will need to experiment with them the third control net is the T2i adapter color with this control net we can also help the sampler to be closer to the original colors we can try here to use higher values than in the other control Nets but if the value is too high there may also appear artifacts in the new images finally the CFG in the sampler has an important influence in the final result a higher value makes the final image to be more 3D looking but if it is too high the consistency and appearance might also not be good these are the main parameters to tune we can tweak others and with experimentation you may get better results than I do remember though that in the Rave sampler you have to use the same steps as used in the UNS sampling process which is 25 steps I will I will also recommend to use the same sampler and scheduler for the best results when running more than one frame the ladent from the Rave is passed to an advanced case sampler that uses animate diff and the control GIF control net the additional transformation is small and the sampling starts in Step 20 so only five steps the idea is to just reduce the flickering and improve consistency of the animation this is not yet important in Step One but after the complete workflow we combined the preview of the 2D and 3D frames next to each other so you can check whether the results are good or you need to fine-tune the workflow parameters after you run the workflow the first time you may want to adjust few of the parameters in the workflow I give some guidelines of how to do it and some ranges I found out where different animations can work okay increase depth and CFG to get a more 3D like frame but not too much much try also to have get enough detail in the image with line art and stick to the original colors with the T2i adapter test also several depth map pre-processors until your key frame looks good in step three we will test the configuration determined in step two for a very limited amount of frames we only changed the number of testing frames to a small values such as 10 or 12 in this step the switches in the image loading group remain the same name we only change the number of frames that are going to be rendered when we complete rendering the animation the result may not be as good as expected some artifacts appear color is not as good as we want Etc we then need to iterate and adjust s of the parameters for example we can increase the level of detail of the image by increasing the weight of the line art control net or we may want to reduce it because it is too flat we can also change the the depth model to Lear if the image does not provide enough detail on the background and so on like in step two you will need to experiment with the values until you get something you want if the short animation looks good you may also want to test the results with the different starting key frame or you may want to increase the number of testing frames to 32 32 is long enough to have more than one time the context length defined for animate diff it is also more than two times the context length for a 3X3 Grid in Rave thus it is a nice number of frames to make sure the final results will be nice however you will never know if everything is all right until you run the complete set of frames in step four this is the final step and what we do is to run all the frames set by the load cap value of the load video node and create the complete 3D animation we set process all frames as active but different than in step one we keep the render group switches active then run the Q prompt button and render the complete 3D animation run the Q prompt button and render the complete 3D animation if you like the final result you only need to save and share your results with your community sometimes the results will not be good you can make small parameter changes and then rerun the workflow for all the frames however in most cases you want to iterate to steps two or three think first about which strategy to follow normally issues or deviations appear at a different key frame than the one tested you can add an image preview node to the final animation and determine a new key frame set the switches of step three and fine-tune again with the new key frame the new parameters you find will be good for the range of frames tested but I advise you to also test these new settings with the old key frame if the changes are not dramatic it will also work with the old key frame then you can move again to step for and run all the frames sometimes you will need to find a compromise between the settings that are good for both key frame sets good quality using this workflow still takes some work and talent from your side but hopefully can be a good starting point to recreate plain 2D cartoons into amazing 3D animations if you are like me and do not have a powerful machine I would recommend you to use a GPU cloud service like runpod this is the service I use to develop my workflows and and later share with you while I do this for fun running runp pod to prepare tutorials and materials cost me a some money if you want to use this service I will greatly appreciate that you use the referral Link in the description if setting up a virtual Cloud GPU is too complex I can also advise you to use think diffusion or run diffusion promotion links are also in the description and will help me a lot to make YouTube tutorials if you do not want to use these services but still like the tutorials and want to support me you you can also contribute by buying me a coffee any help is welcomed and that is everything I hope you've liked the tutorial check out my other videos of anime diff And subscribe if you enjoy them and thanks for [Music] watching
Info
Channel: Koala Nation
Views: 991
Rating: undefined out of 5
Keywords: stable diffusion animation, ComfyUI, Comfyui animation, Animatediff, comfyui Animatediff, comfyui ip adapter, IP adapter, controlnet, OpenPose, DWPose, Zoe depth maps, comfyui animation, comfyui video, comfyui vid2vid, Animatediff comfyui, Animatediff controlnet, Animatediff IP adapter, lora, Animatediff evolved, controlnet animation, comfyui Animatediff controlnet, comfyui controlnet preprocessor, Unsampler, KSampler, KSampler Advanced, Noise diffusion, DepthAnything, lineart, rave
Id: bkb9fCAT55I
Channel Id: undefined
Length: 14min 27sec (867 seconds)
Published: Fri Apr 05 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.