From real to anime (with IPAdapter and ComfyUI)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone this is mato and today I'm going to play a little with comy UI the goal is to create this kind of animal like realistic illustrations without the need of a Laura or a checkpoint specifically trained for that and the cool thing is that you can set the realism factor from cartoonish to very realistic so everything starts from a good checkpoint you need a generic model and I find the it to be a good one for what we are going to do today uh The Prompt is very simple I like to split it into three parts the first is the main subject then uh we have the settings of the scene and finally keywords like cinematic highly detailed and so on and all these will serve only as a scaffolding for the illustration that we are going to do later so I'm not worried about the fine details I just want the the composition is fine let's see what we've got okay this is a good start already the image is a little bit burnt but it is not important because we are going for a anime style so high contrast is good but what I don't like is that there's a lot of bleeding between the tokens that I've given in the in the positive prompt you can see that in the prompt I have perfect piercing blue eyes and the scene became name all blue not only the eyes but also the armor and the background too so to fix that we can split our positive prompt into two vectors and merge them with a conditioning concut node like so the order is not important now in the first prompt I will only have the main subject so portrait photo of a muscular woman wearing a sci-fi worn Mech armor now there are a couple of important keywords here one is muscular and the other is sci-fi muscular helps to bring the attention back to the body of the woman otherwise we will very likely get only uh the face and the Sci-Fi keeps the attention away from fantasy because we used armor we only want futuristic armors and not fantasy ones now we can remove from the second prompt the main subject and keep everything else this way we we have two separate vectors that do not bleed too much into each other so let's see the difference as you can see now the scene is less blue the armor is still bluish but better than before and overall we have more brown tones which is nice so I think this is a good start for our illustration let's go to the next step that is adding a touch of Ip adapters just for curiosity sake let me try to remove muscular and right like strong and see if we get a portrait instead of a body shot as you can see the attention is not on the body anymore but only on the face because we have portrait as first token in our prompt so let's put put muscular back and go to the next step so the easiest way to add the IP adapter is by starting with the apply IP adapter node and then you can simply drag all the inputs and select the node this is the IP adapter model loader then the clip Vision loader and we have to load an image then we connect the model and and back to the case sampler now we have essentially three options for the IP adapter model one is the base model the other is the plus model and the last one is the light model you have to experiment a little but if you want just a hint of the reference image you either pick the light model or the base model the plus would be too strong so I'm starting with the base model and as per the reference image I have this very nice scene you can pick actually whatever you want but it has to be very simple and very cartoonish there should be not too many elements otherwise the scene is going to be contaminated too much of course we want to lower the weight I'm starting with 35 and it's always a good practice to prepare the image for the clip Vision so I'm going also to add a prep image for clip Vision node the crop position is in the center and let's see the result and now we have a strong comic book graphic novel kind of illustration it's not exactly what I was aiming for also the eye is a little bit crooked and so I'm going to first of all lower the weight and I'm also adding a bit of noise that off helps with the generation now it's closer to what I want but I think I'm going to add a little weight to get some anime style back and this is pretty good now the face is a little vanilla I want it to have a stronger character so I'm going to try to add some famous actress to the positive prompt we need someone who played some strong character and we can try with uh Michelle rodriges now it is certainly more interesting we can try with like Lucy lwes Zoe salana Miranda Oto or Rosario do oh this is a good one now let's try to change the reference image and see if we get more interesting result as you can see the reference image doesn't need to be of the same kind of the image that we want to generate uh in this case this is not a Sci-Fi scene but of course if it is close it is better so let me see what I have here okay this is closer to what we want let's see what happens now since the reference is closer to our subject now we can set a higher weight let's try with 40 and of course this is a portrait so we lost part of the body but but it's fine it's a very nice illustration very well lit and a good candidate for the next step which is upscaling so there are many ways we can upscale an image AKA highest fix if the upscaled image can be very different from the original we can upscale the latent directly and probably one of the best method is with the NN latent upscale but we want to use a load noise so I'm not going to use this one on the other side of the spectrum we have upscaling with tile control net and that will grant an upscaled image that is very close to the original but I want a certain grade of Freedom when I'm scaling so what I'm actually going to use is a model upscaling in the pixel space so so I add an image upscale with model then I grab the inputs and select the upscale model loader I like the full handies remarry model but you can use whatever you want we connect the image so this is a four times upscaler and it is a little too much so I'm going to scale the image down no with image scale by node by .5 so the resulting image will be two times the original then we need to convert the pixel image into the latent space with vade encode we need the V model and finally we can connect a second Cas sampler I can copy this and with control shift V will pass the case sampler keeping all the pipe connections of course I need the latent from the upscaled image now for the model I I don't want the one that comes from the IP adapter but I'm going to connect directly from the original model because I want to be able to get back some realism out of this image when upscaling I like to use DPM Plus+ 2m with Cara scheduler and uh we change the seed we don't need many steps 20 are usually more than enough and of course we need to lower the the noise the lower the noise the closer we will be to the original image let's stay pretty close so 25 for now then I need to the code and finally we can preview the image this is very nice we lost just a little bit of vibrancy because we don't have the influence of the reference image so the upscaled image is going slowly towards realism and let's increase the the noise a little bit more let's try 35 and as you can see the more I increase the noise the more realistic the image gets and at this point it's just a matter of personal preference and I think this is a good midway between realistic and illustration since that the noise is so high prob we want to customize an upscaling prompt so let me grab a new positive and let's see if we can get a slightly better [Music] result and I think that we pretty much reached our goal let's try some other seat a this is very cool really nice [Music] let's try with another reference let's try something something completely different like this one I have to crop the image to the top I want to try to increase the CFG scale I got some face tattoo we can remove them let me try with something completely different which is a more western style this is a more Western superhero comic book style let me remove some weight [Music] yeah this is a completely different style but still interesting I have here an anime sexy I want to see what it is okay now she is a little less dressed and the result is still very nice okay I tidy up the workflow a little I I have the IP adapter on top then the main generation the model upscaling and finally the the second pass with dedicated negative and positive so we can fine-tune the final result let's see what we get and this is a pretty decent generation there there's one more thing that I want to try and that is adding a little bit of sharpening to do that I'm using a cast sharpening note cast stand for contrast adaptive sharpening it is a very smart way to apply sharpening that only affects areas that are on Focus while the blur sections like the background or the right shoulder won't be affected that much so I am applying this as very last step before the encoding and a small amount is enough let's see the difference and as you can see now the face is slightly more detailed but the areas that were blurred stay blurred okay so far we've only used the Bas IP adapter model and I want to try the light one IP adapter sd15 light this one gives a lot more importance to our text prompt so we can increase the weight let's see how it goes yes in fact now we have a full body like the original picture without IP adapter and the end result is still pretty cool we can try another seed let's set the base model back and lower the weight we should get a closeup now this is nice but the eyes are not very detailed and to fix that we could try with some in painting but instead I want to give sdxl a try I'm loading the base this model changing the V I'm disabling the upscaling for now we'll see later with crl m i can mute the nodes then we have to select the IP adapt sdxl model and of course we have to increase the resolution for sdxl we need a better scheduler so we are going to use exponential this is true for DPM Plus+ to MSD sampler and I think I've done everything let's see so this is probably too tall let me remove a little bit of height and now it's much better the IP adapter is too strong now let's try to lower it a little this is not too bad [Music] I'll try with another reference this one let me try to add my magic keyword to the negative horror and zombie and in fact now it is more pleasant I want a closeup so we can can see the eyes better I'm removing musular now let's do a second pass over this image I'm not going to upscale the image because it is already at a high resolution so I'm disabling the upscale with contrl B but I'm still applying the cast sharpening and now we have better eyes and fine detailed [Music] skin we can also try with a community sdxl model I should have Juggernaut Excel yes now the IP adapter is not strong enough the composition is not great but we have very nice eyes and it's overall very pleasant I want to try another seed yeah much better the armor is very nice and still very nice eyes Okay one last thing because it is not anime if it is is not animated so I generated this image with the previous workflow and I'm passing it through the IP adapter with a very strong weight then I simplified the text prompt because all the heavy lifting is done by the IP adapter I'm only adding short hair flowing in the wind just to be sure that there will be some kind of movement then I'm I'm replicating the image 32 times with with the repeat latent batch so our animation will be of 32 frames and since animate diff models are trained at 16 frames we use a uniform context option node that will split our animation in two chunks and finally go through the animate diff loader it's pretty much all default settings I'm only using a higher motion scale the default is one I'm using 1.125 because I'm basically doing a image to image on a steel frame I do not have any control net so to ensure that there will be actually some movement I'm setting the motion scale a little higher and since this is going to add some noise I'm lowering the noise in the case sampler so the original image will be a little stronger it is a rather basic workflow but something should come out of it let's see well it's nothing special but it's pretty impressive if you consider that only 3 months ago we couldn't do something like this so this would be all for today I hope you found this video interesting let me know if you would like to see more videos like this and see you next time ciao
Info
Channel: Latent Vision
Views: 11,498
Rating: undefined out of 5
Keywords:
Id: vp_5Sm4V-Ds
Channel Id: undefined
Length: 19min 18sec (1158 seconds)
Published: Sun Nov 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.