ComfyUI - ReVision! Combine Multiple Images into something new with ReVision!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
alrighty I'm really excited to share this with you this is a new technique that you can do with SD Excel that allows you to take two completely different images and combine them together into a third image now this is not a Photoshop type of merge but a concept type of merge or these soles of both images if you will into a new third image and comfy Emily and I covered this during the Discord live stream yesterday at stability.ai's Discord and I could put a link down there in the description in case you want to hang out there every week we do live streams and comfy was on the show with us and walked us through how to do this so I wanted to share this with you so we're going to set this up and again once this is done you can just have it by dragging any of the images you create here into your workspace and it'll load it right up also like to thank all the people who are supporting this Channel and I will put the image we create here into the posts for the channel for the sponsor level and higher you can then grab this image and download into company and have the graph ready to go thanks again for everybody supporting the channel so let's start by loading in a checkpoint and we're going to use the base sdxl model here remember that you have to use the sdxl model or you're going to need a special unclip model now in this case sdxl has the capability to function as an unclip model unlike 1.5 or 2.0 so just realize that this is a special situation here normally you would require a special model for this so change this to your sdxl model and once you have that in place we're also going to load in a clip model and I will put a link down below to the model we're going to be loading so go to models loaders again and we're going to load an eclipse Vision model here and you'll simply dump the file into the clip Vision folder inside of the models folder so it'll be models clip vision and then whatever it is here so I have two in here once the big ones one's a small one I'll put the link to the ones that are the official releases down below so you can grab it again it's something you're going to use from time to time now the clip model is interesting what it's going to do is take and classify gen numerically classify what it sees in the image and it's a pretty darn good at most things it's not very good at say specific car brands or flower species it's not very good at things that may be close to one another and it's only about 88 or so accurate with handwriting but everything else in the image it should be able to at least identify and kind of put into this so we're going to kind of use is the image as a prompt instead of the prompt as a prompt so so we're going to do here is going to drag out our clip and we're going to start here just like we would normally with our clip texting code this is where you would put a prompt of the image you're creating now in this case we're going to leave this blank in fact we want to make sure this is blank and the way we can do that is we can do we can drag this out and do search and we can search for zero and you see there's a conditioning zero out this takes anything that might be in here from padding or zeros or whatever happens being here and makes it basically null it's not going to be anything and we're going to do this for both the positive and the negative so if you control C and then if you hold down shift key when you paste you'll get both of them still connected which is nice for saving time but again we're not really going to do much here as far as prompting goes at least not initially we might come back to this okay so now let's go down and let's load in two different images so I'm going to use image and then hero go to load image and go ahead and pick one I'm going to use one mermaid here and I'll pick another one and I'll use some flowers here for this one again it doesn't matter where these images come from they don't have to be AI generated they could be photos or fragment Journey or wherever you want to get them uh just go ahead and load them in here again the size of them isn't going to matter either because the clip model is basically going to look at what is in the image and use that as the prompt so to do that we need to encode it with this model so if we drag this out here you'll see the only option is really the encoding here so when in doubt you can just drag out the node and it should give you some good hints as to what you can do with it and we're going to need the same node down below here so we'll grab this image here and we'll grab the same encoder here now we have that what we want to do is be able to adjust the strength of both of these prompts as the conditioning output here so if we drag this out you'll see that there's an unclip conditioning so this is basically going to take and allow us to adjust how strong this clip encoding is now it's also a noise augmentation down below here and that is kind of like a variability now in this case because it's going to be variable enough looking at the image and guessing what it is I don't think that we're really going to be using this at all but if you wanted a lot more variability almost like it's departing from the image you uploaded then yeah you could use noise augmentation in this case I don't think that makes any sense we're going to skip that okay we're just going to duplicate this and we're going to use it here as well so this is going to go here into this and this one is just chaining this condition is going to be chaining here into this one now the order doesn't matter because they're being combined but just realize that you're going to have to chain them together and we're not going to do this with the negative side we're only going to do this with the positive side so to that end we're going to the positive here down to this one this is kind of what it should look like again the the flow is this zero conditioning here um so we're making sure that this prompt doesn't add any noise more or less to what we're doing down here now if we do add a prompt to it then that's going to help guide it so if there's a certain aspect of this that you want to see out of it then you would continue uh the same thing but you would go ahead and add a prompt here to help guide this conditioning but remember that this is going to zero it out so if you do put a prompt you're going to have to take this directly into the conditioning here you're gonna have to get rid of this node otherwise that that's going to get rid of all your work and you'll be like why is my prompt not working well voice because I zeroed it out so all we're going to do now is we're going to take this right into our K sampler right just like we would normally so here's our case sampler so we have our positive and our negative is just going to come straight from over there we are not doing any sort of clip conditioning or any sort of encoding with the negative side at all and we're going to grab our model and drag that over and we're going to need our vae eventually as well in order to make this neat to I think I've showed this before but if you let go you have a reroute is an option and reroute allows you to kind of have this so you don't have to have lines string through everything um I don't use this very often but I use another thing called a pipe and we're going to actually cover that in a in a future episode here let's just get a little bit of OCD there and we'll fix that up okay now from here we obviously need our standard empty latent so we just drag that down and grab an empty latent and this is a sdxl model so 1024 is optimal here at least to start I like to have at least one dimension be 10 24 or longer and then we can go and we can decode this using the vae and then we'll have our save image and that should do it should be able to take these two images here and combine them in spirit and then produce an image over here I make my graph kind of wide here but it's simple or you know you can cram it all together but I think it's pretty simple this way let's see what it looks like now we are going to change our case sampler here but let's run an initial one so you can see what it looks like and the result's okay but I think we can do better by adjusting some of the sampling here so let's dig into this what I would like to do is more steps uh now more steps means more detail to a point and then you're diminishing returns so 30 35 is probably plenty in here I like my CFG six and a half to eight is fine so we'll leave that in there for the sampler though instead of Euler I'm going to use one down here the 3M sde gpus this is the DPM plus plus the standard differential equation one made for a GPU and for the scheduler we're probably going to use like exponential just to kind of mix it up a bit let's try this again and what you can do uh if you're not OCD is we can bring the pieces of this that are meaningful uh together so that we can kind of control everything so this is the controls for the bottom image that's the controls for the top image and if we want less of the top image or more of the bottom image then we control these by adjusting the strength here that's a little bit more in the spirit of what we were doing in fact we could drag these over here as well just hanging around and I know it's a bit uh ugly from a UI perspective but I think comfy said it best is that he didn't design this to be a front end he designed the back end and then threw a front end on it so if it's not the UI Masterpiece that you want it to be realize that's not its intention its intention is to be a back end with just a UI tossed on top so let's change these a little bit let's put uh let's leave her at say 0.9 and we'll put the background at 0.7 just for giggles and run with that there we go look at the idea uh the concept here now if we want to we could go and I'll put a prompt in here so we can do is expand this clicking on the gray dot there and now we have to get rid of this zero here if we're going to use a prompt so this will if you just hit delete key now we're connected directly now we can guide it in my mind we are kind of interjecting a third variable in here's the kind of the way I look at it we have our conditioning from both of these images plus the conditioning from whatever this is so it's a almost equal thirds type of situation if we let them all at the same weight okay here's cyborg mermaid if we add that is prompt let's try another one steampunk flower girl here we go so it's using a bit of both images in this case obviously we're getting some flowers from here and we're getting a little bit of the woman here and the tones is kind of a value uh combination of the two not very interesting uh I really think this is a neat concept and one that I'll Explore More and obviously you can link as many of these images together as your computer can handle so that might be also a fun bit of Adventure but adjusting the weights playing with the weights here a very interesting combination of technologies that sdxl is offering to us so let me know what you think in the comments try it out and see if you can create some really amazing things everybody take care stay safe and I'll catch you all next time
Info
Channel: Scott Detweiler
Views: 16,069
Rating: undefined out of 5
Keywords: midjourney alternative, sdxl, stable difussion, ai art, comfyUI, automatic1111 alternative, stable diffusion, stable diffusion tutorial, comfyui tutorial, sdxl workflow, stable diffusion sdxl, img2img, image 2 image, ReVision
Id: 1SwDCqgXZ-M
Channel Id: undefined
Length: 10min 7sec (607 seconds)
Published: Sat Aug 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.