ComfyUI Comparison Grids - AnimateDiff v3

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
don't know hopefully I'm live hello everybody all right hello welcome to YouTube I don't know if I'm on I hope I am let me know if you hear me YouTube's been acting real weird tonight specifically my connection between OBS and YouTube but things look like they're working hard to say all right well we'll just get started and then we'll see what happens uh so if y'all were here last night uh you will know that anime diff version 3 came out if you weren't here last night then I'm telling you anime diff version 3 came it um yeah uh so I've been using that all day and all night uh and uh yeah I don't know if I'm online I have no idea I'm hoping I am okay good all right right on yeah hopefully this will all sort itself out soon but yeah anime div version 3 came out so I've been testing it and testing with different control Nets playing around and seeing what happens and then uh occurred to me uh maybe I should just do like a in-depth stream on how to build out these um uh video comparison grids that I've been making uh to test the different uh settings uh because it's is a good skill to have in comfy um being able to uh align your stuff and and and make comparisons and and because you can do that you know like with even with image grids and stuff where it's like I want to try 10 different prompts across you know 10 different seeds XY plots and stuff like that we can do all that with uh with comfy um but more specifically I'm just going to talk about doing it manually uh to do uh comparison grids like these so you can test your settings and see what you're doing um it's good to do stuff like this and compare it side by side to see what effect the this particular control nut is having with this input footage things like that um this will be a nerdier than normal stream but I'm basically just going to uh save this uh grid workflow thing and then clear it and start over and uh we'll just uh build it from scratch um we'll start by building a grid and then we'll uh use that grid to uh like use that concept to extrapolate it into video grids so yeah let's start with a k sampler out of the K sampler we need a vae decode uh out of the VA decode we need a uh we can just use a save image for now because we're just doing images hey [Music] Simone all right uh let's add our checkpoint we'll just use uh Photon uh out of the clip we need a clip texting code actually we need two of them these are our you guessed it our positive and negative prompts FW nude so what's an easy thing we can compare so we can show how these these te these grids work with the text uh text masks um let's do two different prompts yeah let's do two different prompts this will be prompt number one and we'll add a prompt number two we're going to use the same negative prompt for both we need one more K sampler oops let's make sure that's negative make sure that's positive plug the model in okay lat image we're going to use an empty Laten image that's the size of our image and we're actually just going to use this for both of them all right um when I'm done I'm going to rebuild all this with getting set nodes so you can see how that works with like variables and stuff it'll clean up the workflow quite a bit but for now just going to plug stuff in so you can get a a more basic idea of the rundown before we go crazy so the reason we only have one is because we can reuse this NSFW nude uh negative prompt on both of these Samplers and then this is our first and our second prompt so we'll call this one uh title positive prompt one you'll be positive prompt to okay we're going to need another VA decoder and another save image all right and we're going to call this I'm going to make a folder for this just because we're testing we're going to call it test you don't have to call it test but I'm going to test and then I'm going to put this uh this one in as uh uh prompt one and this one will be test prompt 2 so that's going to create a file named prompt one number and then prompt two number so we can see them together all the time we're going to STP l or we're going to build on this as well um but just for now this will show what we're doing here so we're going to go apple and we're going to go orange oops what did I forget our vae needs to go into the decoder all right there's an apple and there's an orange so we want to actually take these two put them side by side and put the prompt above them so we're going to need a couple node packs we're going to need um KJ nodes for comfyi uh uh that's for the text mask and some of the other nodes we need for the uh moving of images around and I think the other one we need is uh thought it was called image combine let me go find it AI comfy I know I have a lot it's possible image selector pardon me image selector selector there it is just have to type it in 75 different ways uh we also will need uh a video helper Suite um I mean I tell you guys to download this every single video but do it if you haven't yet Do It [Music] um and yeah I think that'll do it for now so yeah we want to um combine these images so we want to do what's called image concatenation uh the word is concatenate but the node is actually spelled wrong which is hilarious we're going to need an image concate node and that's in KJ nodes so think about this is a node that takes two images and sticks them together based on the direction that you give it plug in image one plug in image two and direction right means image one then to the right of it will be image two it's dead simple uh we can use this in combination with uh different things so we could put um uh we could put uh text on top of the image and then we could have another one over here right by concatenating this one to the left this one to the right this one to the left this one to the right and then this one top this one down so I'll show you how it works um we have our two images here so we want them side by side so this image iput goes from image one and this image output goes from image two okay so we're putting image one on the left and we're putting image two on the right because it's you know on the right so we're going to concatenate these into a new image so we're going to save image and we're going to call this test uh grid we're going to do another run okay now we have a comparison grid it really is that easy the rest of it the only really hard part is when you have text masks you have to determine the width of that text mask in order to make this all work so everything has to be equal everything has to even out um I'll show you how to do that it's not that bad there's a node called get image size that helps you and there's a node called uh uh image image crop Plus or image crop plus I think yeah and uh yeah image crop plus which is in comfy UI Essentials pack uh yeah comfy UI Essentials let's drop that in there and uh the other node that's helpful is get image size which is in dfu comfy UI modded no what is this oh this isn't it I just found this one today bear with me there it is yeah same not pack uh comfyi Essentials get image size and image crop these two nodes are so helpful when you're trying to push stuff around I'll show you why later but you know imagine you're bringing a video in and you just want to crop to the center 512 x 512 to do all of your control net pre-processing you can do all that with this I mean you can do it with the load video node from from VHS also but these two we're going to let you get these images around your system at the right sizes it's awesome uh so yeah we'll be using both of these nodes today uh specifically here so we want to add text text masks on these so we know what the hell they're saying okay let me just find the right one here yeah okay same way we're concatenating uh uh videos we can concatenate text as well um so uh this is like uh confusing but like let's go so we want the prompt from both of these but we don't actually want to have to um write it uh over and over again so we're actually going to um I'm trying to think of the best way to do this trying to explain what I think and it's that's a bad bad combo just let me uh we got this we got a clip can we convert the text to input we can I get the conditioning out okay so let's change our text to inputs on these okay uh string simple string which is in use everywhere there's a bunch of string nodes string input uh string anyway find a find a string node you like Tech string from W node site actually this one's perfect uh I like was node site uh recommend it there we go that's a good one uh yeah so we'll use this so this will be easy prompt and then this is going to be text B is going to be our first promp Apple second is going to be our second prompt Apple orange second second one goes here first one goes here and we're going to add a uh string concatenate prompt yep and we're going to make a and b inputs so we're concatenating prompt with apple and then that's going to go into our first text one and we're going to do the same thing here we're going to concatenate the the title with the second prompt okay so I'll show you why this is important but this is for text labeling uh one thing I notic when I'm doing this is just start naming all of these things what they are okay prompt this is our prompt a this is our prompt B okay actually minimize all these okay we need to feed these into text nodes now so let's get a text mask node create text mask you only have you after you do this you'd only have to do it again like you can copy and paste all this crap so like yeah it's going to be annoying and infuriating the first few times you do this but I've been through all the pitfalls so hopefully this will be faster this way so we need to know it needs to know two things it needs to know the width and the height and the width and the height matter because when it's concatenating all that stuff has to be equal so the the width of the text mask you create has to be the same width as the images that you're putting together or the image that you're putting together in this case it's actually easy because everything's 512 x 512 but if we were doing stuff where we wanted variable sizes and things like that we want to actually set this up so it can it can expand and and subtract to to what we need uh so we got our text mask and in our text mask we're going to change um text text to input and we're going to grab string one so that's going to be prompt apple right same thing down here 512 by 40 prompt B prompt orange prompt orange this is taking string two B which is Apple this is taking string three which is orange we can actually create if we want multi-level prompts like long long worded prompts um we can change these inputs to these text things to inputs and get longer string inputs and plug those in or we can change this up we don't have to use this node we're just using this node because it's easy and I'm only going to do short prompts for this demo okay we got our prompt we got our thing everything's Cod sure and then so uh what do we want to do we want to actually put these text Ma masks above the things but we still want this to go to the right let's pull this out because this is our sort of our final one um so we just need to put it on top and and and put it below so we need another image concatenate con concate uh VA decode into image two same deal here VA decode sorry into image two so what goes into image one our text masks text mask a test text mask B okay if I did this right and I probably didn't 32 32 yes I forgot well this will work for now there's two more things I want to cover right what did we do wrong what did I do wrong I didn't connect these two to this one so it just pushed all the way through let's try that again size of tensor is much match accepted size expected size 40 but got size 512 so let's look at our thing here yes you see I've got them going to the right I want them to go down so text on top image on bottom text on top down image on bottom these two Tech image left on on left image two on right down down right right down down right hey look what we did so now you can make comparison grids of anything you want um however you like uh so let's uh let's make this 48 height and then the other thing we're going to do is um our width and height uh that's being determined here in our empty latent image if we were using input footage or whatever we can actually get the image size from that input footage so um let's walk back to what I was doing with animate diff now that we understand this so uh hopefully this makes sense to everybody if anybody's confused feel free to drop questions in the chat but yeah this is the most sort of simplistic way to do um comparison grids of whatever you want to compare uh yeah and text mask one and text mask too yeah cool I mean you you can rearrange it however you like but uh I'm trying to get it all on the same screen so you guys can have a screenshot will that work come on that's pretty good I think that pretty much oh I see there's always better ways to do it I guess but yeah so that's the the basic idea right uh text image text image side by side um if you wanted to um if you wanted to put them all um you know stacking down change this to down and it should do it like in a whoa what did I break this is [Music] cool let's just leave right alone I think I might have unplugged something here when I was moving stuff around so here's our chance to rebuild the whole workflow this is my only gripe at these you know concatenate things and all this is it's very confusing uh to figure out the routing what's mad about what did I mess up a file size somewhere oh yeah look at that 552 by 512 I've been having this bug all day with comfy where when I I I'm uh I just click somewhere and the numbers change it's uh it's incredibly annoying I love how it keeps making the Apple logo okay down down down should work now yeah so what that tense that apple looks a little busted um what that tensor error means is that there's like a size mismatch somewhere and that was because the the file sizes were wrong so yeah what I was doing was just making these grids with videos to control the um to see what the different control Nets were're doing the different control net pre-processors and the different strengths and stuff like that so uh let's delete this stuff and start over VHS video [Music] combine 24 frames per second MP4 all right uh let's change this to what's today's date 2023 12 16 uh 24 FPS okay uh so we're going to add animate diff to the party model animate diff loader with context we're going to use version three pass the model all the way through so animate diff goes through the model uh or the model goes through animate diff add context options uh that's the um that's this note here animate diff uniform context options all right yeah cool so for our batch size we're going to go 16 and this should make an animation of an apple if all is right with the world nice intenser tiar okay that doesn't look like an apple but uh you know it's definitely an animation I'm not sure why it's so terrifying let's try tropical Sunset see if it's a scary head as well I know my nodes are uh close together but um the uh so what is going on here it's like it's not doing my prompts very strange might delete these and try these over again maybe changing the inputs in and out was breaking it yeah what the hell one sec there we go I think I broke it by uh switching those inputs so maybe don't do the input switching and maybe we'll find another way to pull the prompts out if you need that for something that is more like it okay so animate diff is working everything is hunky dory so we should be able to do um you know 32 frame run have it jump context Windows let's find out yeah nice yeah version 3 is really nice super impressed all right so we made concatenated uh we make concatenated images but we have to to make concatenated videos we need to do uh a little bit more calculation we need widths and Heights we need um well we don't really need widths and Heights if they're all the same but uh if we want to make this modular to basically take in any input you can do that as well but let's just keep it all 512 x 512 because you're going to be when you're making these grids you're going to be generating a ton of Clips so it's probably best to just keep it 512 by 512 anyway but we can do that by cropping uh the input footage so then your input footage can be whatever the hell you want so yeah let's hit it let's say we wanted to test uh two different um two different animate diff motion models on the same seed so we'll set our seed to 42069 42069 because it's the best number in the world uh we're going to actually duplicate this K sampler and just drop another one down here right duplicate this whoops it's down here there we go duplicate this lat the samples image to video combine and then the rest of this stuff except another animate diff and we'll try the temporal diff model model goes to model model goes to model can use the same prompts same lat and image just different animate diff actually use the same context options too hey okay so we're missing the negative prompt didn't come through because I missed all right let's make sure the positive is in there beautiful good positive negative all right uh vae goes into the decoder which is down here okay so this now is going to generate two different videos uh uh let's do 16 frame animations because they'll be faster yeah DW pose is uh really good uh I was also playing with media pipe face today which is really good there's a control net for that as well uh you can use it exactly the same as use DW pose and there's a pre-processor for it just type media pipe in here media pipe face mesh pre-processor sick so we got our fixed seeds so we're going to know we we basically want to do the same run but this one with the V3 and this one with the temporal diff animate diff so yeah uh that should do it let's have it run you know it's because it's comfy everything is nonlinear in the sense that we can share uh data streams from two so we only have to set our prompt and our empty Laten and our uniform context options and our checkpoint stuff in one place then we just route it all to the two different case samp for the two different runs so yeah it's really sick so so remember with video you always have the same have to have the same number of frames and the same size for the videos if you're going to be stapling them together with the concatenate thing all right so there's our two videos this one is with um B3 this one's with animate uh temporal diff so how do we stick them together image con image concatenate image con concate it's concatenate okay uh vae decoder image 1 vae decoder image 2 right grab another one of these video combined nodes spit that out right 24 thing and then uh this folder we're going to call it grid we're going to call this 24 frame per second uh uh V3 and this one we're going to call 24 frame per second uh temporal diff just so we know you know we got right in the file names and the grid is the one that you know 24 frames per second grid this should just work unless I miss something so let's TR try it yeah so now you have a you know a video where you can see the two uh the two clips looping together um but you know obviously we're going to want to add titles to the because we're going to forget which side is which and stuff as as time goes on uh so we've got our things to the right but we need to add stuff on top of each clip so we need to go one image con concate before image concate and then uh that's going to go uh here uh hold on let's think this through first we need our text masks text mask create text mask that's in KJ nodes uh so all we have to do is just change the text for this one this is V uh animate diff uh V3 and uh we're need another text maskk nose a copy paste and this one is a temporal diff it's all we need uh there the videos are 512 x 512 so 512 by 48 512 width by 48 height will give us enough room to put them on top so we're going to need two of these image conate nodes so we can make these things called redirects and comfy which is nice so just drag something you need out and do reroute and then we've got this guy here so we'll drag our VA decode reroute out and then we have our two Vie decoders for uh our image our two videos so this helps us here when we have our mask our thing uh uh move this up so this is the first one this is the mask this is the video text mask video text mask video down down all right and now we need to connect them to this uh we you go here and you go here yes it's mad now you see what that error said Q prompt uh size of tensors much match sizes of tensors must match except in dimension one uh expected size one but got 16 for tensor number one uh what does that mean how many frames are in the video there's 16 frames and it got one because we have frames set to one right so batch size I need to actually set that first now normally we you do this with control net with a load image for our control net and we will end up doing that but for now I'm just going to use an INT node um like the int constant node from KJ nodes this lets you just set a value and that's all we're doing value of 16 and this we're going to rename this to our frame count okay so we have this node that has our frame count in it so we're going to change batch size to an input convert batch size to input plug that in grab this over here we're going to convert frames to input convert frames to input now we've got frames as the input this goes here as well now it knows I need to make 16 frames of text masks and 16 frames for the video so this is our Master frame count we can use this anywhere we want anywhere we need the number of frames so uh this this [Music] should technically work oh it's doing another one for some reason did I change the prompt oh I changed the technically I changed the value okay let's move this stuff down here that sucks you can't uh see the title on the reroutes there we go so now we have titles on our videos so we can actually see at a glance what we're doing these are great for Twitter or great for anything else when you're trying to compare stuff or show people you know the difference between different things this is like Aces for that another thing we can do is on the text stuff this text X and text Y is the like amount of pixels it puts on the left or on the top of the text if you notice that text was like right up against the like ra up against the side so I'm just going to give it a little 15 15 pixel buffer on the left there so that should yeah just give them a little indentation on the end there that's all uh yeah so that is the most basic setup for a uh a video window uh if you have multiple uh like if your videos are different sizes and stuff we can capture the width and stuff and then and then put that in uh to the width so like any of these things these are called widgets these uh these like sliders and selectors any of these are convertible to inputs and then you can plug stuff into the inputs so if you have something in multiple places that you need to set you can do it like this in one place and this is you know setting the value in three different places right so stuff like this will save you a lot of time uh because you can put all your master values at the very front of your your project you know over here right and then you just really only have to select the stuff load your video and hit go every time and you got your prompt and your size and like everything's just right here so and then everything else can kind of just exist um yeah so we're going to move this stuff up here we're going to put that in a group and then we're going to put this over here all right so this will be our we'll call this um uh grid assembly and we'll make this crazier in a minute because we're going to add control net and then we'll be able to show the input footage the control net footage and what it's doing um side by side by side yeah sorry I got the hiccups all right if anyone wants to uh yeah if anyone wants to screen grab the way the uh the way this works here this would probably be the best place to screen grab it take a screenshot I'll leave that up for a sec this is our text node the text mask and this is our vae decoder reroute text node VA decoder 2 concatenate them down so text above image text above image text above image on the left text above image on the right so that's how we're building it out right so that is this that is this that is this that is this and that's how we're connecting them all um you can you can then take this and go crazy with it the only thing you have to remember is the text width has to match the entire width of the the row it's in okay so if I I'll show you how this works in a minute when we get to the other stuff but if I had three videos side by side and then I wanted another row or row yeah another row of videos underneath um that text is not going to be long enough the text mask is not going to be long enough to go across three images so we're going to have to sort of uh like uh get the final width of that you know uh the three videos smashed together and then put that into the width of the text mask so it's a little nonlinear and it's a little hard to think of that way but yeah so it'll it'll make sense when we do it because it's going to error out and then I'll explain why but yeah basically for just doing side by-side stuff this is super simple so and you could add the prompt to here you could add whatever you want to it or you know just just remember or or save the text file or save the prompt stuff in the text file uh yeah that's simple comparison for video now I'm going to add control net and all the fixing it will make you do cool stuff yes uh am I going to get the Laura for the V3 yeah I have it but it's not set up yet I don't thinkink dink dink H animate diff evolved yeah I don't think the motion Laura stuff works with the new motion lauras for V3 yet uh it's whole new uh sort of framework for the motion stuff for uh V3 so yeah we'll see okay where' you start okay so let's add some control that sorry I don't mean to keep yawning all right let's move this out let's uh attempt to delete this but I guess we don't really need to we can rebuild this later okay um so we don't actually need two animate diff loaders now because we're going to use the same animate diff uh for both yep uh the thing we're going to do differently is two different control Nets so let's move this stuff over here let's move this here then we're going to have a control net here and a control net here okay yes control net apply Advanced so let's add another one conditioning positive to positive positive to positive negative to negative negative to negative positive to positive positive or negative to negative same deal control net moves through the clip so we take the conditioning from the clip go through control net and out we need a control net loader advance so I'm going to pop that on there I'm going to get this one up here so let's do uh let's do the face stuff that I was doing before so we'll do media pipe and we'll do uh open pose yeah easy peasy okay uh plug that control net into control net control net to control net yeah so we're loading control net we have our control net we don't have images for control net but we can fix that VHS nodes we want uh load video path all right I already downloaded some stuff for this so I'll just use one of the videos I downloaded from pixels um yeah you can get these videos on pixels we're just going to extract the uh face out of this and then dream with it so you just paste the path to the video in here force a frame rate of 24 frames per second and uh this video is wide so we're going to force size and we're going to hold the height at 512 and let the width be whatever it is uh we're going to do that because we're going to then uh image crop 512 by 5 12 okay in the center all right so now no matter what the we can use this and this in conjunction to make sure that we load the video in the correct size and crop it where we want it uh in this case we want to get the face the face in this video is in the center so we're going to load the entire video into the into the context and then we're going to crop into the center so we get the exact size that we want want to feed into control net this it does two things for us it um it gives us the right size makes things faster and control net is trained at these resolutions so giving control net um footage that's way too big like the pre-processors uh can make it tape like exponentially longer if not break or make you run out of vram so it's generally good practice to give the control net the resolutions they're trained on in this case it's 5 12 x 512 so we're going to do that so we have these and we have these but we need to pre-process this footage for both of these we need the open pose footage and we need the media pipe footage so before like an automatic you'd have to go do that in the control net tab but for here all we have to do is just add a pre-processor so we want a media pipe face pre-processor media pipe face mesh pre-processor and we want a DW open pose or DW DW pre-processor all right so this one's our media pipe this one's our open pose uh we take our image crop out go in image crop out go into our pre-processor once the pre-processor is finished we're going to spit that into our control net preprocessor finished we're going to spit that into our control net the other thing we're going to do is we're going to save these as videos so that we can see what's happening so we're going to add a video combine node just like these ones in fact you can just copy and paste these because they already have the the right settings right so we drag this DW POS estimation into a video combined node and then uh make sure you change the file names for these things so they make sense okay open pose Source yep and we're going to need another one of these video combined nodes up here for the uh the other one one okay and we want the media uh pipe face mesh output into here same thing 24 frames per second and we're going to call this one media pipe media pipe and the other thing we're going to do is just save the little clip that we're making here just for our own sanity so we can see what we're actually doing there is method to this madness I promise it's not just complete overload overload and we call that source okay got our source we got our open post Source we have our media pipe Source we have uh 24 FPS uh what's this media pipe and this is 24 FPS open pose okay loading the video cropping it combining it sending it to our our uh media pipe in DW pose estimation sending it to control net after that combining it into a video setting it a control net combining it into a video sending these two into the K Samplers one by one we're going to render both ones using the different control nuts we're going to leave the strength at one for both of them and we're going to keep the same fixed seed so we're going to see what having this input with the same everything but with the only thing being different being these two control Nets um then once we do that I'll help you we we'll build this grid out to actually assemble this stuff into uh usable uh you know footage um so what are these this is open pose and this one is Media pipe all right let's try it oh we should also change the prompt because uh the input footage is uh somebody smiling so let's say a person smiling in a rainy New York City alley uh photography 35 mm Fuji film uh cinematic lighting let's do it actually let's do uh a 35y old woman try and get something cool happening V3 geez all right cool hit it oh I made a mistake view Q cancel okay we were setting our frame count here but we don't need to do that anymore we can actually use this um this video loader to do all that so let's delete that our batch size is now determined by our frame count our frame count also determines how many frames are in each text mask right all right I think that's everything and now we can set our frame load cap here to 16 so we're going to the first 16 frames just for sanity's sake cute so there's our DW that's our open pose estimation um sorry media pipe estimation there's our open pose estimation there's our input footage see how it's cropped into the center 512 x 512 easy peasy wait we didn't squish we didn't squeeze we didn't Mash her face up everything's chill very cool so this is cool but it's not enough information right we want to get the source video uh we want to get the open pose uh estimation videos and the final results all side by side so we can compare them right like this is cool to compare you know what two things do but it'd be really cool to see them all synced up in Harmony and stuff so um now we need to pull all these in into the uh the grid system so this be fun and by fun I mean not so fun but we're going to do it anyway okay I'm going to delete all this and we're going to start over Okay so this is our final combine let's make this a little bigger okay so we have our text we don't actually need it yet the text we're going to do later all right what are we going to need here we're going to need video one and video please stay there play one and video two so this is our final result one and final result two okay we also want to have our uh input footage and our uh like the open pose or the media pipe so what do we do all right well we have our output right so let's just add redirects for all this stuff let's go out from our DW POS estimation rote okay so that's our these ones are are are our uh finals these are the open pose right like the uh the control net stuff and what's the other thing we need we need the source footage we actually only need that once because we can duplicate at but what we need is this output so let's drag this up here and make that a reroute as well so this should help keep us a little less insane okay these are our outputs these are our open pose information our control net information and this is our uh thing and then these are our text boxes so how are we going to smash all this stuff together well we're going to go uh uh what make the most sense let's do image concatenate so to the right we're going to put um or down to the right let's go uh first image and the uh uh the control net input and then we'll do another image concatenate to the right and we're going to plug this in here okay and we're going to do the same thing here so if I'm right this should go uh image one image two so the image one to the left uh the uh control net source to the the middle and the uh uh source to the right same deal here image two control net 2 image uh the text is currently disabled so let's see what happens holy crap so maybe it makes more sense to have the result the result in the middle and the this stuff on the left hard to say yeah it feels like it's backwards so let's uh let's rewire it let's rewire this okay source Source uh yes and then the result yeah I think that makes more sense that's the input footage that's what we did to it and then that's the final result all right so this is all fine when everything's 512 x 512 but you know if we had to do some responsive stuff it might be a bit of a nightmare um and the other thing we need to do is add the text so how do we add the text when we don't know how wide it's going to be we actually have to calculate the width of this sorry the width of this we need to calculate the width of this I think get image size okay this width is going to be our text box width so we're going to change this to uh we're going to change width to an input okay the width goes in here and it goes in here it's going to be the same for all of them because they're all the same size and if I am correct that should add the text on top uh no it won't because we haven't actually added the text yet the text needs to go one more of these so the text needs to go on the top of these rows but it needs to be the width of all three so one two three down so text up so confusing uh so for the first run let me just try this okay he okay wait what is this okay what oh I'm just stupid uh unfortunately I think that works I wish it was smarter but it is what it is right right up down yeah yeah that worked okay cool uh media pipe open pose there we we go so here's our whole [Music] workflow where are these guys going right here Hey Okay so let's walk through this whole workflow again just uh because holy [ __ ] but yeah that's it again I'm going to prop this on my uh on my Discord later so you guys can just uh just plug it in and play with it because like you know no need in sitting there I you know you can rebuild it if you want and I recommend you do because it helps your brain uh get around get get around all like all this stuff but you know I get you don't have too okay uh just let me add some groups here so this makes a lot more sense [Music] okay doesn't help much but you know let's rename this this is animate diff this is our settings go prompt and uh just prompt is fine because we'll probably end up taking our taking this from our cropped yeah from this size let's do that empty Laten image we're going to change to uh inputs width and height there we go and we need to get image size from this crop here that's going to go up here all right so the oh right that's okay all right uh so now guess you can kind of go there we don't really need you anymore you're just sort of dead weight okay cool all the time and this is uh input footage okay this [Music] is control net [Music] two on that one damn diffusion one oh my God worst comfy bug ever okay title will be diff Fusion two okay I think yeah I think that's it the groups didn't help much I'll admit that oh that that just sucks I hate that I very much hate that bug purple oh Boyle purple purple okay cool all right let's walk through it again uh so we're loading our checkpoint onto the prompt uh The Prompt goes into diffusion well it goes into both control Nets we're doing two different control Nets we're pre-processing media pipe and we're pre-processing DW POS we're saving them uh we're cropping everything to 512 x 512 we're getting all that stuff from this input footage uh we're applying both of them at one and then we're using this grid assembly stuff here in order to [Music] um in order to create this uh final grid of videos so we can compare everything together um the cool part is the final result is then you can then uh you know test your stuff and see what it actually does see if it makes a difference uh if you want to add more stuff to the grid you just keep adding these sort of of you know these little packs of uh of nodes and uh assemble more into the grid just like I did um yeah I I'll make this workflow available on my Discord so you can uh just plug it in and start playing but yeah so let's try adding some more frames to the input and see what happens so in order to control our um you know our frame count we just uh the frame count of our input foot footage is is pushed everywhere on this workflow so let's do 96 frames and it's going to do them all and uh this prompt is cool so let's let it run so now it's going to extract 96 frames from the footage it's going to do all of the control netting it's going to save all the stuff and then uh it's going to diffuse two runs and then it's going to staple all that together here and it's going to show us what the media pipe does versus what the open post does and once we have an idea of what both of those do at one we can then determine how much of each one we want to use on each so maybe we really like the way media pipe does the the eyes and teeth and the the the movements of that um but but uh open pose is providing some more structure for the thing like the the the person or the composition or their mood or whatever so then you could set the strength of media pipe to 0.8 and you could set the strength of open pose to to 0.4 or something run them in conjunction and you get a good idea of what you can do uh now you can just plug in other uh plug in other control Nets do different pre-processors change the titles in these text masks and uh just make all the comparison grids uh you want until the end of time yeah uh uh it wants 512 frames can somebody explain to me why that is happening is it possible the width got plugged into the batch size oh my God all right all right I'm going to have to kill comfyi so what happened there is our frame count from our thing got disconnected from our batch size and our height got put into our batch size the reason I knew that is because it asked me to generate 512 frames and that just seems like a lot so let's plug our batch size in uh and uh make sure that works derp all right uh yeah all cool uh I might have to kill comfy yeah yeah so don't do that I know exactly when that happened too when I was trying to plug in the nodes and comfy likes to decide oh I'll just do what I want all right cool let's hit it again should do 96 frames this time all right this is a good time as zeny to take a stand up and move your knees I'm going to do the same and grab a a little water uh while this runs uh yeah take a break stand up or don't I'm not your mom all right looking pretty cool [Music] do you get a much better idea of what these control Nets are doing this way how specifically they're influencing the motion do come on one minute we got this what's everybody else working on tonight what you been diffusing lately 30 minute 30 seconds 30 minutes minutes we go two more 23 seconds maybe less all right vae decoding 96 frames boom now combining all into one big video nice there we go but if you need that file I mean this stuff's all just in your uh comfy UI output folder there right mine's huge give me a sec my poor computer uh maybe I can just get to it here yeah today's date yeah uh be the grid yeah 24 FPS grid that's the little one there it is yeah so you got you know all this stuff separated and if you need to use these media pipe things you can always keep them around and reuse them you don't have to keep generating them every time but yeah I mean that's that's it that's the whole workflow um so if you wanted to swap out different things like try different pre-processors and stuff like that uh we can do that we can do depth and like line art let's see what that does actually let's add more more and more let's just add uh two more yeah so this is yeah let's add one more one more that'll be confusing enough all right so we need we need to copy these groups control net 2 diffusion two and uh video combine we need to do this stuff again okay we need all these nodes okay here's another another setup uh yes so we're going to do depth let's do Zoe depth image comes from our load video node actually no it comes from an image crop node sorry I know this is far away image crop it's a image and we need our positive and negative prompt from here just move this up here for now positive negative and then this positive and negative goes to our next K sampler it's already connected somehow all right cool sey depth image goes into here sorry I plugged the wrong thing into this we want the Zoe map this is actually what gets I need to plug the image crop into the pre-processor to make the depth maps dips nip okay so uh we have these now uh so what do we have to do we have to add another row we have to smack another one together and add another row so um copy and paste our text mask change this one to depth all the settings will be the same so what do we need we need the frames and width where do we get the frames and width we the same place we get these from so where does this width come from comes from here and the frames they come from load video that frame count there so I'm going to drag this up to frames okay frames and width text mask uh that's our second text mask this is our third text mask this is the depth why is this connected to anything oh right because this is to the input this is always confusing because when the cables come around the front I always think they're coming out of the back but nothing's plugged into the output yet it's just cuz this one's coming across it and in I kind of wish these would wrote underneath so you could see that they're like connected wonder if that's something we could do anyway there's our third text text mask and I need to add another right uh no need another set of these right okay so we need another route our third vae decoder needs to reroute out there so if we remember these are our main outputs these are our control net outs and this is our source all right so we need to add another one of these to these that's that redirect I just made that's our third output I'm going to make a redirect for our third control net which is this whoop and we drag that up here okay and then our source so I'll just keep these gray for now so we know they're the third one we need basically another one of these but for the for the third row and I have to think about this I think that's right I'm going to unplug these because these are wrong okay uh you are plugged into Source goes first then the first result or the third result oh right I don't need two I only need one more of these that's so crazy plug okay uh you go to here and you go here right no then what goes here oh right sorry I'm confusing myself here Source uh this is our control net and then this is our um uh our result so just like the just like the grid so I have my text mask and I have up up I need one more up and one more down right I think it's GNA hurt my brain for a little bit but we're going to figure it out so up we need the text into the second one because it's the one going above and this is going to be what does this come from when I'm normally this is going to be from here okay and then you go in here you go in here stop that God that's annoying D I'm going to be surprised to all hell if this works if it works I'll walk this this back but like I think that might work sucks we got to render this one just to find out so I'm going to do 16 because it'll take a lot less time 16 okay we'll do 16 16 and 16 and then hopefully combine them hopefully oh no we won't uh we didn't finish setting it up terrible OKAY model comes from uh anime diff okay and our lat image comes from our Laten image which is right here and our vae comes from our load checkpoint which is this red one up here red to red and that should get rid of all the red circles and let that run whoops let's try that again so now we're doing three diffusion runs one's uh yeah uh so we're doing um media pipe open pose and depth so this is just to show you like you could then you know this is how we can expand these this routing is a little crazy so I don't know I wouldn't I don't know if i' go much further than this I mean if your brain's less uh soft than mine uh or less hard than mine um you might be able to do this no not yet I'll check that out though oh I can't believe that worked ridiculous so one other thing we could add um because I'm feeling particularly masochistic um I really like the color match node that's out so I'm going to add another row of color matched uh outputs to match the color of the original output because why not uh that's not even going to be that hard to do wink all right let's do this uh I just want a color matched output uh of this so to do so I need to basically copy this copy this CM cm beautiful all right cool and then we're going to add a color match node the color match node is in KJ nodes it is awesome so we're going to take our vae decoder output from the first one make that the Target and the reference image is going to be uh our uh video like our our our input video so that is going to be the image cropped version and then that will go out to here and if I am not wrong which I usually that's not usually the case this should color match this one and it should do it quick because everything else are done yeah okay so that's the workflow add another one here ba decoder no yep reference image comes from here boop boop and we need one more which is uh we got media pipe open pose and depth cm yep you right no target the reference is the input video from the crop node and image out okay color match let's put it above yeah you all can come down a little bit too you too okay CM depth dep cm open pose open POS CM Med pipe Med pipe CM Boop doop okay and we want to put these color match outputs we need these up there so let's make reroutes for them one two and three uh 3050 oh um four gigs um probably not unfortunately uh it's gonna be a tough one uh I run to Fusion run to Fusion is like uh 50 cents an hour that's probably a better use of your time because you'll be able to render a lot faster on that okay so these are my uh try channel one two three of the um color match ones okay so color match control net oh you know what let's make these yellow control net let's make these color match CU cyan yeah and let's make these blue because output images let's make this uh red because it's the input footage that might help my brain all right I need these reroutes to be on the next sort of set of of image concatenate nodes right so oh no I need one more on each row oh that's confusing as hell all right let's do it I think that's right nope got size 1536 expected 1536 but got 20 48 so one of these is silly and uh let's disconnect you and you and you you you and you I think I know what the problem actually is and I think I just [ __ ] up a whole bunch of stuff shoot can I un do what I did Beauty okay uh the problem is I'm taking the width from the wrong place I think so now image is free I want to get the image size from this actual final one I think yeah cuz I was grabbing the width one too early that's pretty cool and if you wanted to you could put text labels on you know uh each thing you know you could say Source the type result and color match or whatever um you would just have to you know hack the way that that text works that's pretty cool well it's getting late here and uh that's basically you know all I wanted to cover tonight so yeah I'll package this workflow up pop it on the Discord um there's already some comparison stuff on there uh if you want to grab another one uh already um but yeah I just wanted to kind of do it from scratch build it up so um you know you can see how the whole thing comes together this is a nightmare I want to grab all this stuff and move it up here and then I want to grab this thing and get rid of it remove all right cool let's get this down here and then this whole shebang let's make it a group all right cool we'll call this grid assembly um yeah I know this is like like chicken scratch and confusing but uh hopefully just kinetically watching it go through the process will kind of knock some of these con ceps uh you know into place in your brain um this stuff didn't make sense to me until I uh really started dissecting some of the uh workflows on banad Doo that kajay was doing um so yeah like I said earlier this community uh like I said last night rather um you know this community is pretty much uh designed to help you learn so if you want to learn all this stuff go to banad Doo go to my server all my links are on pers. XYZ you want to join the Discord uh you want to subscribe on uh YouTube all that stuff to pers. XYZ all the socials are right at the very top uh there we go uh and as always if you love the channel you love do what we do here you you get into this stuff you dig it um you can support by uh hitting at patreon.com Pur and uh yeah it's uh you get access to a special channel in the Discord when you become a supporter and uh yeah I've got a mask pack coming of all the masks I've made on my uh streams and stuff uh so you can just plug those into different control Nets and get to work on those I'm just right now I'm just packaging them up and labeling them and stuff with their aspect ratios and you know the potential control Nets that they're best used for so that uh you know yeah it doesn't yeah because there's a lot of crap there so it'd be good to be able to see what is for of glance um but yeah enjoy and yeah there'll be more cool stuff for subscribers in the future um going to try and figure out some kind of cool exclusive content uh and um I'm thinking about starting a supporters only like um hangout like every couple weeks on the Discord where we just uh you know make stuff uh you know everybody makees stuff and drop in the channel and just hang out and share prompts and ideas and stuff like that workflows and all that stuff so a more interactive way to hang out as opposed to me just streaming at you all the time um so yeah uh come join the community and as always join banad Doo because that server is insane and they got all the stuff you're going to need to do all the workflows um yeah thanks everybody for hanging out and uh I'll be back at it um I think Tuesday night uh I got an idea for a stream uh try and put some of this stuff together into something interesting and fun thank you so much uh thank you everyone and have a great night and go to sleep or don't uh again I'm not your mom
Info
Channel: Purz
Views: 2,168
Rating: undefined out of 5
Keywords:
Id: eDKd1I9Qjx4
Channel Id: undefined
Length: 110min 35sec (6635 seconds)
Published: Sun Dec 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.