Magnific/Krea in ComfyUI - upscale anything to real life!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we're going to learn how to create a k/m magnific clone using free and no person stores so you can go from this image to this image and for those that want to use it the workflow is in the description alongside all the files and the prompts I used here it's entirely free of course as it should be let's go so after a brief meeting with my aliens Overlord I remembered that a Crea clone was the first thing I tried with KY because the centralization which also equate censorships in some cases and the high cost of cloud services was not to my liking I give up after because I didn't really see the need for it and I thought videos were much cooler anyway still I don't want you to just copy and paste a file guys so I'd like to teach you how it works but more importantly why it works so I'm going to build this thing with you together in this video and a fairly moderate to high level of skills is required for most of my videos just so you know so if there's something you don't understand please remember you can always ask the community on my Discord the link is in the description first let's look at the tools themselves we have two major players right right now Kaa and magic. and to be absolutely crystal clear I have nothing against these tools in fact I bought a license of Kaa I'm not reverse engineering anything it's just that I would prefer to run most of what they do locally so guys please call back your lawyers it's just for fun okay so let's try to understand what this does I've loaded a picture of my wife about 20 years ago it was taken on the cheapest digital camera I could buy at the time and I would like to upscale this because it's a phone memory so if we look at what it does yeah exactly as I expected so I mean I'm not too bothered about the noise on the ground that leads to this weird pattern I am much more bothered about the face because well um that's that's not my wife I know there are settings that you can use you can prompt it usually it auto prompt so we need to remember that they have an auto promp feature they also have what's called an AI strength feature which seems to be directly linked to probably what the doo setting does in a case sampler the resemblance setting is probably something to do with IP adapters face detail ER and control Nets and Clarity uh I don't know probably a sharpening filter of some sort in general I found that this tool struggled with faces but did a very good job on things like clothing and also it doesn't understand Firearms but that's okay because that's a limitation of the technology they probably don't have gun luras and uh guns are bad in general don't have guns I never touched a gun personally look that's not even me so what are you talking about as I went to test more and more I just found the same thing over and over again really good at objects and things like that but really bad at faces okay fine another thing that was really interesting is if you upload an image that's already high quality for example this one that I created in the previous tutorial and had already upscaled I find that the output was really Pleasant it's really good fun to see that you can change the eye color we can do this in confy I'm pretty sure of that but where it really shines is the amount of detail it added to the face like the freckles in a way that's consistent with what you would expect and that I'm going to tell you a front spoiler alert I don't think we can do it with fee and that's okay I mean this as I said these tools have their reason to exist but I don't want you to have the wrong expectations these are commercial platforms with millions of dollars we're not going to be able to rebuild their clusters of A1 100s in your bedroom we can try to get close if you feel that these results are quote unquote terrible please understand we're not going to be able to build something better but we can try our best anyway and the best way to learn I found especially in these early days when we don't have much documentation is by building building building another thing that I found is this PDF file by user alen knight from the Bodo server who published his findings on these tools and I have to say he's done an amazing job at explaining what they did and in what order and it also highlights the fact that it's likely very difficult to reproduce this in KY but I'm the kind of guy who likes to hear no so I went ahead anyways so to get started you want to First make sure that your version of confy UI is up to date please please please go and update this right now it needs to be at least the 25th of February and make sure that all your custom nodes have been updated to the current date all right so we're going to need two groups the first one is going to be our diffusion step there you go and I have the strong feeling that they use a twostep process on this website the second is going to be the upscale process it's always optional so I might as well copy what they do there you go up scale then we need to load our image we're going to use load image of course and we'll pick any old image this one is 20 years old I'm not getting any younger there you go next we need to clean it up a bit so cool tip here you can use loadup scale model and pick a model called onepeg om Sr that's going to basically remove the pixels from your image without upscaling it kind of cool tip there let's go click those noodles and preview it now we want to compare it but first let's C it there you go make sure it works yeah Works want to probably add a comparer here we're going to use RG3 image compare which is excellent definitely download that if you haven't let's click the noodles there you go oops oh it's a bit difficult to record and noodle at the same time but I did it there you go C and now we can compare is it good me it works a lot of things in conf are a bit superstitious I didn't want to have a noisy image PR it's a cool tip anyway all right so now we need to load our checkpoint uh we're going to pick load checkpoint I'm partial to copex Timeless if you watch these videos regularly but be careful because sdxl is behaving very different from 1.5 we'll use both anyway so next we're going to need to encode our image because that's going to form the latent that we're going to pass to the case sampler let's connect the noodles there you go and now we need a prompt positive and a negative so we I'm going to use the standard clip text encode sdxl that comes with confy connect the clip and create a second one for the negative there you go uh this's the difference between G andl if you're not familiar I will have a tutorial on this for now we're going to use the same text in both I'll put highly detailed there you go breath taking breath taking and photograph and I'll pass that to L there you go and now here for the negative we're going to pretend that there is no copyrighted material in those models going to get in trouble again and put blurry text and water mark and pass that to both G and L there you good done right now I need a lot more space guys a lot more space I can't work in these conditions I cannot work right so let's drag this over there all the way there make this way bigger there you go um if I can click it that would be better there you go right so now uh let's see let's add a case sampler um so I'm going to connect my model to the sampler my conditioning positive my conditioning negative and now I'm going to pass it that prompt so we need to have a latent that's the image I'm going to change the settings I'm not a big fan of this seed built in the sampers so I'm going to switch it to a widget and I'm going to pass an G3 seed I love that thing by the way really useful to keep all your seeds into one place really nice drag this over to seed and I know the settings by hard by now I mean I've done this so many times just put 50 in there uh CFG is usually more realistic if you put it on 34 after that it becomes a little bit plasticky if you see what I mean uh it's one of the rare models that supports DPM 3 DPM Plus+ 3M so I'm going to use this caras I prefer that schedular and the doo value is going to be 0.7 we're going to play with that to see the results that's how much it's going to influence the image if you will so now we search for our VD code and we're done let's drag this over there make sure we're not forgetting anything uh probably over here I want to compare it oops I want to compare it to the image oh yeah you can rename these things by the way just click the dot right that's the trick just click the little dot don't click the text cool image because it's cool rename reference image there you go and because I lost the noodle let's drag it again there you go up you can hit the space bar to move around just use that it's much easier and yeah there you go that way you don't mess up your workflow okay all right so I spent five days recording this tutorial believe it or not so it's very important to me that you learn stuff and not just follow or God forbid just download the thing and press the button okay um let's hit new fixed random here uh that's because we want to leverage the cache of confy UI which is excellent at caching things uh next uh what we can do is uh organize in tabs uh so what I'm going to do here is use the fast Group bypasser by RG3 and that's going to allow us to enable or disable the upscale so we can iterate this is the name of the game right I have a video on that it's all numbers it's a numbers game so you want to iterate as many time as you can and as fast as you can especially if you don't have a 49t uh the other thing you can do to be more productive is install useful extensions like this one for example I use config UI workspace manager to go and organize my work into folders with tags uh you can do also a thing with this tool and for people like me who make educated videos it's extraord really useful let's see if I made any mistake looks about right let's just hit Q promp see what it does okay yeah so the D noise is way too high so what's happening for those that don't know is is if you have a d noise that's too high it just reinvents the entire image to whatever it thinks uh should fit the mold right so let's change this to something far more reasonable like 0. 2 maybe at most q and let's see what it gives us all right well you know it's something we're making progress we're starting somewhere we got to start somewhere but it's not going to turn heads that's for sure so how do we fix this well first of all I think that my model and in fact I know so because I'm technically from the future I've done this tutorial 5 days ago you see so I know that probably playing with a different model might be a good place to start before we start with a more complicated things like IP adapter control Nets Zoe depth Etc let's go with the simple stuff so I'm going to do a number of things first I'm going to change the model to something like say realistic vision is really good so let's go with that and now I need to change obviously my clip text and code because these are for sdxl we weren't using it properly anyway anyway so let's ditch this this is a good time to tell you that when you work with KY it's all about iterating and going fast so at first you do something quick and dirty like what I'm doing now and then you clean it up all right so let's change this to green and this to Red because I know a lot of you guys enjoy this little color thing and now we drag our clip and our conditioning into the right location there you go click click there a faster way to do this I'll show you later uh then we copy paste our prompts and now we're going to adjust our steps because obviously this is A5 model I know it takes about 30 steps each I'm sick and tired of changing the CFG let's let's put Auto CFG shall we so Auto CFG is an extension um not set if you will allows you to automatically calculate the CFG once you start using it you lose control over the CFG parameter evidently it has limitation you can bypass it and one reason you may want to bypass it is because you're going for a specific artistic effect uh in this case I want to use it uh very much again about iterating fast switch the simpler to the correct one the Schuler stays on caras I'm happy with that uh let's see I'm going to get rid of this box I don't need it anymore let's delete do let's organize this a little bit and now I would like to CU my prompt so let's look at the results no that's too low so let's change it again and let's flip it to say 04 yeah that should do in fact I know it does so look at that oh my wife became White that's weird well I would say the quality is way better it's definitely going the direction I want I know I can use Photoshop for that to say change the brightness change the contrast Etc right so now I'm going to load a different picture because it will be a different set of setting a different set of everything every time you change the image let's use the one where I'm holding a banana and right there you go this was pretty fast so yeah the vegetation has been corrected the way I saw magnific and Kaa do it but the face of course the face and the the banana looks very different so that I think I won't have a fix for the banana but we can use face detailer to fix the face later okay so I think here we can add a few quick wins before we go any further uh basically what I don't like is having to retype this prompt every single time so what we're going to do is we're going to move it over somewhere where we can see we're going to have to clean up the noodles but I have a video on that so I won't bother you with this right now uh and we're going to use the moon dream interrogator now there's something really important written here and that's no commercial use and they mean it so please please please be careful with that okay no commercial use Pinky Promise so what we're going to do is we're going to concatenate uh the text that we had with the output of this visual model and we use Python gos extensions here they're really good the show text one uh comes with another one where you can play an MP3 file of your own voice when you C finish maybe I should make that a tip it's good fun in any case uh let's add our multiple line text uh this one is going to come from the was not suit we're going to use a lot of different extensions here you might do a bit of downloading and we're going to make a second one and that's going to be prompt override to override a prompt so we don't have to go through mream every single time and for choosing we use the impact switch that's from the impact pack the same guy who did the confy manager really cool you also need to check this out uh I'm going to go and change the title so we know what we're looking at that's the override and here I'm going to put a concatenate this one comes from the configural suit so I concatenate the highly detailed blah blah blah with the content of moondream but I want it to be on a switch so I go through my switch there you go oh there you go and put my second input might as well rename new slots again right click on the dot not on the text on the dot and here we're going to call it um over override there you go uh right so now we select two that's going to be our override and I'm going to make a little bit of space here on the screen because I'm running out of space again right right so let's uh transfer this into an input so I can pass my string and yeah looks good but uh oh I'm missing my image yeah big problem all right let drag this over there okay uh now here uh the text so tell me everything you know about this image my life depends on it yeah I know it's funny but it's true it works you can uh sort of blackmail the models it works I'm serious so now uh switch this to GPU and Q all right uh oh well nothing happens and that's because I didn't turn on remote code execution this is dangerous by the way we're trusting developers with the data so be mindful of that but it works look uh we got this really good text I'm quite happy with the outcome here in fact I couldn't have typed all this every single time I change picture so we're going to save a lot of time but we want to copy this into our backup or I should say override switch this to one press go and let's look at the output actually that looks a little bit more structured and that makes sense because we're using a prompt to tell the engine what to draw so we're getting really close to the cre output and we haven't even done that much work it was literally 15 minutes now we're missing a big part and that's the upscaler so let's get to it you probably know already that there's multiple types of scaler some of them work in pixel space some of them work in latent space and depending on which one you choose one will modify the image that's the latent up scaler and do that work in the pixel space won't modify the image as much in this case however we don't care if it modifies the image in fact we encourage it so I think it would be a good time to try something new and that's ccsr ccsr is an upscaler developed by kiji well technically it's more of a wrapper node for something called a super resolution upscaler that uses stable diffusion without prompt and if you think about it it's perfect for a situation so here I downloaded the real world checkpoint which is available on the site that hosts this node and it's been trained on real world images we live in the real world perfect I'll pick that you pass it to the actual upscaler and you obviously need to give it an image in and an image out that that goes without saying next we're going to pick a resize method I'm going to go with lour because I find that it gives the best image in my test you might want to play with that next we pick a scale two next I'm going to leave you a little love note because the step STS that you choose are going to work in conjunction with your scale so as for you not to waste GPU Cycles I'm going to leave the recommended scale to steps proportion you can also choose the start and stop at and it's by playing with the start step that will be able to erase the seams created by the taling system which is also optional you have three modes ccsr CCR T mixd and CCR t v oan mixd introduce seams hence the need to play with the steps and V oan introduces noises but doesn't have the Sims and that kind of makes me think I should probably make a video about ccsr because there's too many options and I need to move fast anyways I'll choose Aiden instead of wavelet because my image quality doesn't warrant using wavelet uh and then finally I'll put a seed a 77 there you go lucky seed and I'll put it on fixed so it doesn't rerun every time I hit Q right good now I'll go grab my image which is sitting in the other group let's go grab that I'm going to show you how to clean up the noodles as well just let's do everything in this tutorial why not this is going to be a how to config UI tutorial after all all right so let's save our image because it's going to take so long to process ccsr I don't want you to lose your images and so that implies that you should only run the upscale if you're absolutely sure that your image is good okay let's hit q and we're going to have time to look at other things while the runs so I'm going to start by adding a watermark that's from confir is brilliant by the way just drag my image to that send it to the save and we're going to put ccsr so remember what we did because when we browse images it's a lot nicer to see what the image is rather than read a complicated file name going to save you time as well let's put 70 in the font size because this is going to be a big image and we're pretty much done now let's go back to this part of the workflow and I'm going to try to clean this up a little bit uh I'll use a lot of free routes so let's put one in from RG3 uh they're usually controlled by keyboard shortcuts but uh unfortunately there's a bug right now that prevents me from doing that so I have to right click to the L resizing and I'm going to put uh a title that's going to be representing my image output image there you go and I'm going to ask it to show the label again the shortcuts for some reason not working I don't know why let's drag this over there and let's put this as an output of this group that's how I usually organize myself and we're going to drag this into the input of the obscurer and voila so this one is finished running let's have a look yeah it looks really nice really clean a little bit too sharp for my taste but that's not really important I don't see any Tils or anything like that so I don't have to mess with the settings I'm very happy with this outcome uh the only problem is of you see my base image still not great uh let's try to change the scale to four and hit Q so the downside of ccsr as you probably guessed is extraordinarily slow I'm going to pause the video because this is going to take about 7 minutes to run on my 49t so we need an alternative I think in this case since speed matters uh more than anything else because we're still in the exploratory stage is load upscale model and obviously we apply it by doing a upscale image using model we drag this to that and we select a model which one we going to pick sex gives good results but since speed matters the most here RS Gan is going to be the correct one to pick so I want to organize myself a little bit better here so I want to show you something let's reduce the size of this group and let's move this all the way over there okay so uh let's make this a little bit smaller and this is what takes the most time is organizing your notes honestly it's what takes forever uh but I wanted to show you how it's done so that you can organize your workflow a little bit better we're going to put this one at the back of the ccsr we're going change the title to something like ccsr final upscale and we're going to grab these two nodes and we're going to paste them over here after we move that a little bit more to the right there you go and now we can resize the a little bit more add another group this is going to be obviously the upscale via pixel space group I'm going to make this a little bit larger and we're going to drag the nodes so we understand what we're doing this will also help you understand my workflow when you see all the groups Etc and this is just one of the way you can organize Yourself by the way you don't have to do this it's just nicer in my op opinion let's drag it over here let's bring that back and now we can drag this over there now everything is connected so if you will it's like running in serial if this was an electrical circuit if I hit one on my bookmarks I go back to this and I'm presented with a way to switch between groups so I think it's quite convenient because now I've disabled my ccsr I hit q and before I know it it's already GNA be run now I can also do the other way around uh but I and I could do both but that would not be a very good idea uh let's try to resize this because it's way too big and we're going to switch this to a Forex q and this is blazing fast there you go done this took about 1.5 second so I'm inan 7 minutes versus a few seconds I know which one I would pick all right so we still need to insert the IP adapters two control Nets if not more we need to standardize all this to fit a meaningful workflow with fre U V2 a Lowa loader deep shrink a face detailer and joyous of old Joys we need a two pass system using custom Samplers so let me make you a deal so you don't have to watch 10 hours of me moving nodes around I'm going to organize everything into empty groups and in exchange I'm going to teach you how to properly organize your nodes I'll see you in a second and we're back oh wait no that's yeah that's the wrong workflow this is the correct workflow now before you scream two things number one everything and I mean everything including the intermediary workflow that we've been looking at are going to be in a zip file hosted on the site where I upload these things second I did not add too much here believe it or not what I've done is everything that's new I've put into fast bypasser groups and I've organized the code into meaningful boxes like this one there's a box just for seat there's a box just for model or group there's a group just for the prompt and so on and so forth but if you pay attention you'll see it's the exact same prompt that we had before the exact same technique it just looks better the one thing to remember is how I organize my workflows so usually I think like a programmer because I come from a programming background and if you're familiar with programming you know that you have variables in going into a function variables out going out of a function it's the same thing here I have a variable coming into my prompt that's the what I call the Bas image that's the only thing I need because my prompt is generated through moondream and I have variables out does a prompt return conditioning so I've put con plus con minus and that's obviously the output and likewise for the unclip section which we're not going to use yet but we're going to look into a second I have two conditionings going in con plus con minus and they're linked really easy but I think it's cleaner than doing something like this for example if I did this yeah that looks ugly this is a little bit cleaner okay so let's go look at the beginning of our workflow we initially load a model like we used to it's just that I added some useful thing that that I know a lot of you enjoy and I want to take you through if you're not familiar with these items what they do and how they work first we Lo the checkpoint nothing new here what you can do is build a little bus like this with all these functions and place them into subgroups like this colored a specific color by choosing Ed group color and then select into the mutters which I've linked using bookmarks the color that you want to filter for I have a tutorial on this as well but for for the sake of this one just go into properties and choose matching color s or in the case of the model options I've chosen yellow of course there it is so going back to the model first I want to give you an option to load luras it can be useful because ultimately what we're doing is table diffusion so if I right click this and choose bypass um I can enable it and there's some luras in there that I enjoy using I'll have tutorials on do I'm a new channel so so please bear with me while I build my content next up we have clip set last layer this can be useful for some models that require a clip skip meaning that it's skipping if you will a step in the layers of text embedding of your model if you're not familiar with this just remember that some models require a clip skip of one usually that's the realistic models in sd15 and the anime model usually take a minus two but again this is not a hard and fast rule you'll have to check on civit AI per model talking about models again using this extension that I have here you can click models and look for them if you click one it's going to take you straight to the CIT AI page and it's going to point out the clip skipe right here so I hope this is helpful and that you've learned a little something here next up we have sag what is Sag you ask well what you don't know what sag is what you mean you don't know this everyone knows this now I'm just teasing you this is just for the the commenters that feel a little bit nitpicky about the job I do U essentially it's a form of combining the idea of CFG if you're familiar with this with that of self attention it's complicated math and a full explanation wouldn't fit this tutorial so I recommend you go and check out this page if you're interested in the math of it some people argue that sack can help adding detail to the image but in my test what I found is it would mess it up more than anything else sometimes it works sometimes it doesn't it's complex I included it for the sake of being complete and let's close our model box next is free uv2 very popular because well as the name indicate it's free and mean by free I don't mean just free as in beer I mean also free as in speech but it's also free in terms of computational resources now because again free uv2 is outside the scope of this tutorial I've included a little box of recommended settings that you can play with just remember these are the general guidelines if you want details user cyberian dock on Reddit made a very good post explaining the finer details of it and I found that is mostly accurate so this is a really good read we're going to leave it disabled next we have dip shrink again here we're doing diffusion so dip shrink is useful to a certain extent what is deep shrink you ask well essentially it's a simple to use node you just implemented in your model Pipeline and what it will do is it will create a better composition maybe a better background Maybe Sharper Image maybe faster render times even in some cases when you do latent upscaling which is the name the complicated name if you will for what is known as highrisk fix it is useful it has its usage but in this case I'm not going to use it so bypass and then finally we have Auto CFG Auto CFG does exactly what it says on the 10 it sets your CFG for you so you don't have to why would you want to use something like this I mean you're going to lose control right yes you will but it's useful because in this case what we're only interested in is quickly iterating between models quickly choosing a model and not having to worry about switching the CFG we already have to change the steps we already have to change the sampler it just makes life a lot easier to use Auto CFG so that one is something I'm going to use when I switch models and all of these are controlled with the switches so I have a bookmark for the switches which is two you hit two and you're presented with the switches and here you can enable and disable them then you click the little arrow and it takes you there and as you can see it's been enabled we don't need them for now so let's keep them off I want to keep everything as it was and demonstrate to you that this is the exact same workflow just better organized more option it's kind of fun I always organize my tutorials the same way over and over again so it's easy for you to keep track of and I try to reuse the same structure so you're going to see this used a lot think of it a bit like a book if I was to write a book about KY I would have a preface sub chapters and so on and it's the same here thing the chapters as groups and the nodes as paragraphs of text and the tree structure is maintained using context big from RG3 so this saves a lot of noodles on the screen and then I simply drop a context which I can extend every time I need something in a group I drop it twice usually to keep it clean and I continue the chain like this and so on and so forth at vam metam and one last thing the Samplers have been changed to custom Samplers they work exactly like your standard K sampler okay it's the exact same thing except you can pass a model and specifically you can pass the sigmas you can pass the latent image the seed and the CFG using parameters and that's all linked to a case sampler box and why did I do that well because it's easier to sort of eyeball this image and say well that's clearly from a video game so I'm going to need this I'm going to need that it's simply easier to do it this way from like a little control panel if you're really into control panels I recommend trunks node 0246 which allow you to build what essentially is a type type of gooey for KY using a little control panel uh but again this is outside the scope of this tutorial so now let's get back to the tutorial itself so the other change that I've made is that I've put the upscaler in front of the case sampler and I've given you options you can use the CCR scale or you can use a model of scale if you don't have enough vram or you your machine can't take it in any case I found that upscaling first had much better results than upscaling last and for those that I have have an Eagle Eye you probably notice that it says first pass here that's because yes we're going to put a second pass it's in the final tutorial and since we've only used onek sampler I think this is pretty damn good on its own honestly and the reason I point this out is I know that it's tempting to want to use unclip it's tempting to want to use IP adapter it's tempting to use control Nets of course and all of this we're going to implement but hey this isn't bad for a first thing and I know that we're not going to be able to reproduce magnific in the browser anyway one of the reasons for that is that the toing system in magnific is extraordinarily complex and I wanted to give credit to a user on the banoo server that has replicated the entirety of their tiling system and I think we can all agree this is maybe not something we want to get into right now because this is slowing down my 490 and I even queued it yet it's complex and if you're into this I will link you up so you can give it a try and you can replicate all this at home but I found that it was outside the scope of this tutorial we're here for good results from the get-go in a simple few nodes and I think this is actually really good but the million dooll question is can we make it better and the answer is yes so I've loaded the market image again I'm quite happy with the outcome really I mean this is such a big Improvement if we go at the comparison yeah there's no doubt that this is so much better and of course the face is not great but we knew this was going to be a problem but first problem I think we need to address is the colors I don't think they look very good so we're going to add a color match node and that's basically is going to take a reference from the first image which is the smaller image and pass it to the larger image we've just up scale to match colors that will be doing a pretty decent job it's not always necessary to run this but it's nice to have and you also have the joy metrix node if you want more incy editing type nodes next we're going to need a pH detailer because by definition we do not have a reference image for the face since it's an old image so I'm going to drag a context I'm going to copy paste it pass the correct image I've made that mistake many time and I'm going to need an ultra litic detector provider so let's go find that that's in the impact pack ult trtic detector provider make sure you get the one from the impact pack if you have multiple nodes and we need a Sam loader now they take models I'm going to use the 8m because I find the results to be very good as for the other one I didn't download the extra model so I'll just make sure it's stuck to GPU and off we go next we need a simple detector that's going to well detect the faces or I should say face face faces and we're going to hook it up to our B boox loader and our Sam loader we're going to drag an image to it that's our image that has been passed through the K sampler then we add a detailer debug SS again here be very careful because if you pick the wrong one it's not going to work so make sure you get the right node after that we're going to need to drag our image here and connect everything to it by dragging this context holding control releasing that's kind of magical but hey like all the all other form of magic be careful because sometimes it gets confused with the positive and the negative um next I'm going to change the settings because ultimately this is just another case sampler Sid needs to be lucky so that's 777 uh fix because I don't want to have to rerun it every time I quue steps needs to be in line with our other Cas Samplers so 70 in this case CFG at one because we use Auto CFG sampler needs to be I can never pronounce that one the three the three version okay I think you know what I mean caras for the scheduler the noise is too high at 0.5 but I'll leave it here so you can see the outcome um and a preview so let's go get that we're also going to get the preview for the indal faces because I find that convenient it helps me understand how many phases were found etc etc next uh we're going to go and cue this there's no reason to wait we want to see results I'm going to wait a bit uh this is going to take as long as your graphics card will take to process each face and reender each face it's trying to detail every face that it finds I'm going to show you in a second how to fix that and yeah it's picked up two faces but this one is really blurry so I don't like that sometimes it tries to detail them so let's put in a filter uh that's called a sex filter ordered you drag the SS to the filter and then the filter SS that is the big faces will go first into that one and the smaller faces will go here control shift V if you want to copy every connection so that's done here we're going to pick uh one face starting at zero arrays always are addressed from zero onwards we're going to drag a preview here there you go and I'm going to yeah I'm going to cheat I'm going to copy these two because I'm tired of selecting it in the list and there you go done so I'm going to press q and we're going to come back when it's finished so you don't waste time and it's finished so yeah it's important to always uh maximize your picture a bit to see the results I think this is a little bit too strong sometimes especially if you have a bad quality picture uh it gets really evident that the face was changed it's it's basically in painting right so we're going to lower over this to 0.3 um the other thing is usually what I would do here I would type face faceless pardon me uh 3.0 and I would run it and what it does is it would try to blur this face out rather than reconstruct them you see how it's tried to reconstruct this one it's definitely improved it now it worked this time but usually it doesn't and sometimes it can pick up people that are backwards to you backwards to the camera unfortunately there's a little bug right now in this uh note suite so faceless doesn't work and that's a shame uh it triggers the a tokenizer error but hey no big deal it will it's useful to know for the future for you guys so let's go and check out our image you can see that it's actually re thought the image and place this lady over there and it's completely reinvented my wife great um she's going to be so pleased when she sees this video we're still saving the image with a a little Watermark here uh I'm quite happy with the results the thing is again this is a numbers game guys so on this image it works let's try another image actually okay so I went and I literally snipped a picture from a website and that picture was really bad quality you can see all the pixels Etc uh little word of advice the first case sampler is always going to do most of the heavy lifting okay uh usually the second case sampler is going to be to for example remove the Sims from the passes of the C CSR using t diffusion we'll touch on that a little bit later so make sure that when you do trials like this you don't enable any upscale anything like that okay at first just to see the results now I enabled the face detailer hopefully it didn't damage the picture uh yeah it looks really good on the first case sampler and here's the work done by the phas detailer yep it definitely did something uh whether or not you like the output that's a matter of taste again change the the settings if you don't like the output this is going to be done on a buy picture basis it's the pluses of using config UI for this kind of thing it's also the minus because it's not a oneclick thing right so let's look at the output oh I'm very happy with this I'm very very very happy with this you can really see the difference between the original image and the output and that's just a 2X sub scale I could have done a 4X and then a second Forex on the final result I really like the job it did on the beard it reinvented the face but yeah I mean we have Doo set really high right now now the problem with that is that we're only using a face if it was a body exter how do we make sure the clo that's transferred it works 95% of the time what about that 5% when it doesn't work so we're going to need something called IP adapter to be implemented here to sort of transfer a style uh the clothing the hair color Etc onto the resulting image let's go and implement this now okay so I prepared a little group for IP adapter right over here and it's going to take two parameters in a model and an original image it wants the original image so it can copy its style and the way it works is it takes the model that's going to go to the cas sampler it modifies it and it returns a single parameter which is model updated I kind of like this notion of talking about it like it was a function because that's pretty much what it is there's a couple of things to note about IP adapter before we move on so let's go and have a look at the original paper first of all IP adapter is not a one thing it's multiple models and IP adapter nodes encapsulate them and allows you to use all of them as part of your project so that's important to understand and that means and I left you a little love note here so that you can remember it that when you use 1.5 you're going to need different models than if you were to use sdxl so my recommendation is do all your one5 modifications first switch to the sdxl model by just clicking the little drop- down menu and follow the this table if you're getting confused it's it's pretty straightforward and plus you'll end up remembering them by heart after a while so let's go and implement it first I'm going to need to grab my prepare image for clip Vision because IP adapter is not going to like you pushing a giant image in second I'm going to need to load my IP adapter model itself so that's my IP adapter sd5 sdxl Etc next I need to go and apply the P adapter if I can type this properly where is it there uh and next I need to load my clip Vision model that's my vit Big G or vit H usually it's vit H but refer to the table once again so now I choose a resize method lenos looks better as you know a crop position of center now usually I would go and create a mask and this and that but this is a tutorial so we're going to do it quick I'm going to put a little preview oh and word of warning this preview do take time especially if you have a vday code before that so use them sparingly next I'm going to choose my IP adapter model make sure it match my clip Vision model uh I'm using sdxl so I refer to the little table I choose IP adapter plus sdxl v h and the reason for that is it's it's a little bit stronger than the regular IP adapter if it's too strong switch it back to the regular one again depends on your image entirely uh we need vit H here so I'll go for that again refer to the table so I'm going to drag this over there drag this over here drag my image to the apply IP adapter boom and I'm going to need my model to transform it and the model goes out to the case sampler simple all righty couple of comments on IP adapter first of all it has a weight so you can change this maximum would be one okay and minimum would be zero but there would be no point using it just disable it I think it just middle would be something like 3 .4 maybe and it really depends on which model you use if you use IP adapter plus it's very strong how strong let me show you so if you put it all the way up essentially it tries to and it's kind of interesting actually the effect it's it's like the original image it still looks like a video game but you you're missing all the blocky element let me show you the original for comparison yeah it looks like this so we go from this to that I think that's kind of a cool effect honestly I haven't seen many people try it and I think it's good fun uh and as you decrease the IP adapter strength you can see that instead of seeing the original image what you're seeing is the effect of the highd Noise We have set so all the way to it's going to get back to photo realistic and eventually it's going to lose that composition but it kept the cloth and I think that's really cool actually the cloth are still very much on par and well reimagined the backpack the belt uh even the gloves are taken into account I think that's really good fun the background because it was so pixelated has been completely reinvented but that's to be expected so how you set the settings on this is completely up to you uh but personally I go for 0.4 if I use the strong strong model uh and then I try a multiplicity of uh elements maybe I use an XY grid for example to test this there is no right setting and I think that's what differentiates KY from tools like Kaa and magnific is these tools they give you this little three sliders but KY gives you control over the IP adapter control over the doise control over the control Nets control over everything you have full control with Ki that's what's great about it next up we have the control Nets so control Nets are complely different breed to IP adapter let me show you so there's a series of node called configu control net Ox by foval 16 another brilliant developer and he's developed these nodes that allow you to use any type almost any type of pre-processor so you what's a pre-processor well it can be depth estimation like marry gold for example that's the new and hot one uh but there's plenty of others there's Midas there's zo depth anything uh in fact there's more there's L res depth map Bay normal map uh if you want it you can also use open pose that's how the animations are made like if youve seen those videos where someone dances I have a tutorial on that I have a workflow up as well uh you can download that on the same website where I'll put this workflow and the the only catch with these tools is that you have to use the right one for the job so which one is the right one for the job well it depends on the image there's no magic answer despite what some are saying I find obviously canny edges this one uh tends to work quite well because it has a lot of information about the image that humans can immediately recognize as in this case a face someone wearing a hat Hair Etc but it may not be very good at keeping the gender that's why you have IP adapter of course realistic s line art is very similar to canny you also have a scribble one that one is is far more vague as you can see the output looks complely different so again it could be good fun it completely depends on what you need personally I'm quite partial to the ones that relate to depth specifically I find that Midas is slightly better in some instances than zo depth anything although Zoe depth anything does a better job at picking things up like hands so it really depends what you want to do with this and this is also the set of note that has the mesh gra forer so I suppose you kind of need it anyway uh it's got open pose it's got an animal pose which I'd love to try because I really want to make uh virtual AI Cals I think that would be quite funny and it's got another one which I've put into the tutorial which is the segment Tator so just so you're aware what a segmentor is is essentially capturing layers of the image in a parallax type effect so the front usually tends to be red and and the back tends to be I don't know green yellow Etc and then you can create masks based on that if it's a single image so I wouldn't stop at the tutorial personally I would go even further but then I suppose you could argue while it becomes an image by image type process I suppose it's what it kind of is really anyway so let's get to it let's implement it right so for this one I chose a color control net that works only with sdxl because sometimes it's difficult to find sdxl control Nets plus this one is super easy to implement so you need load of Advanced control net and then you need to apply the control net that's it two nodes the end it's really that simple so unlike IP adapter which takes a model this one Tes takes conditioning so you're going to drag conditioning positive and negative and the positive and negative back to your pipeline it simply sits in the middle of your conditioning of course you need to pass it the base image and that's going to be used as well the control right so the base image if you will is analyzed for depth analyzed for colors in this case uh it could be analyzed for anything depends on your control net and then you choose the control net that you wish to have I've downloaded a bunch that's not all of them there's far more than that because that's my tutorial machine but you get the idea so for this particular example we're going to pick the color one I just wanted to point out something important you can name them whatever you want as long as you recognize them later on now I happen to like this notation if you prefer another notation just go rename the file in your control net directory and you're done I know that Lura recol rank 256 works well with sdxl so I'm going to pick this one another thing about control net I want to show you because there's no point recreating news nodes one by one essentially they all work the same so you load a control net in this case it's a depth one and you pass it to apply control net the reason I use the advanced control net is because this set of node is compatible with animate diff and sometimes I copy paste my nodes from one workflow to the other we've all done that it's it's not dirty thing to do or anything like that uh and it's just simpler uh the other thing to note is some control Nets not all but some do use a pre-processor that's the one we were just looking at on the website in this case zo ad depth anything as a pre-processor there's one thing that's important here is it takes what's called a resolution what's the resolution you ask it's simple it's the shortest side of the image that you pass it so the original image if it's say landscape obviously the horizontal resolution is going to be longer wider whatever the word is than the vertical resolution and vice versa if it's a portrait it's going to have a wider height than a width so to calculate this I've put this set of notes that do a little Boolean comparison and that put the shortest side it's that simple so you can reuse this by the way copy paste you can even clip it uh and use it as part of the clip space in ki moving on to the next one I've put the segmentor now the reason I highlight this one is because you'll notice that it has a control v11 psd1 15 seg fp16 safe tensors which is quite a mouthful really and in reality there is no counterpart in sdxl land uh now I I'm aware that there are workarounds to this but it's quite outside the scope of this tutorial just remember that if you're using the segmentor you're going to need to use an SD 1.5 model and if you use the color one you're going to need to use an sdx model so it's an or not an and here and in general though I really really really wouldn't want you to learn that way really like oh we're stuck with this because he said that no first of all you didn't pay anything for this so you can do whatever the you want and second there are tons of tons of tutorials on how to do this for example this is my bagel workflow you can download it from the website it's a little bit messy but it has the advantage of I want to try everything type of approach so I've put all of the control Nets and their pre-processors I mean almost all and you can see how it's done and how they work basically for each control net you can actually it's kind of nice you can see the conditioning being passed here and if you didn't want to use one by the way just right click choose bypass there and it's done it's going to skip this one you're not going to process the line art you're going to process all the others what matters is that you're happy with the result Nothing Else Matters guys seriously talking about results and before you start playing with the settings I wanted to show you this because this is really important and you're going to need it if you have less than say 16 gig of vram I would say more or less based on my test so as you understand the way this workflow functions is it takes your base image it upscales it by using either a ccsr upscale or a model up scale don't do both okay that's obvious if you were to choose the ccsr up scale as you understand it can go up to 6X even 8X that's huge that's huge and since we're going to have artifacts on this image we need to pass it to a second Cas sampler preferably we will even rerun the face detailer but I've only did once because otherwise it's going to be too big for most machines and talking about most machines that's what t diffusion does it helps with this so if your latent is gigantic and you pass it to the case sampler in the second pass it's going to break your machine again guarantee you it will even a 49t what you want to do is you want to break it into Tils and tiles is great CU it's a single node that takes a model in and a model out and it just sits in the middle and it works automatically that's my favorite way of working the only thing you need to remember is that the til width and height it's preferable if you stick it to the same train resolution as the model meaning for sdxl it's going to be 1024 okay that being said it's super useful and at 1024 it works super great for ccsr ccsr as you now know can leave some Sims and this is a great way to pass a ccsr image through to remove the Sims I know one of you at least had that problem so mate I got you covered tle diffusion that's how you do it so I also wanted to cover a few potential issues you can encounter with this workflow the first one is that RG3 is a brilliant node but it does have bugs sometimes so if you find that you can't turn off certain control net groups that's probably because it broke and you need to pass it through the link fixer of RG3 which is included with the RG3 note very easy to use the second thing you might encounter I'm going to press q and I'm going to get module list object as no attribute one if you get an error like this what it is is you've used a 1.5 model with an sdxl control net or vice versa and trust me it's better that way the worst thing that can happen to you is that it works except it does nothing and you think oh it worked but it didn't so here for example I use the Lura depth rank 256 which as you know now is is DXL only so sure the pre-processor will work I mean no doubt but when you run it it fails at the cas sampler step okay so to fix that you go in here and you choose a control net that will work with SD 1.5 in this case it's going to be control v11 F1 PSD this is boring okay it's this one I hit q and now it works again and it looks like this I think this is pretty Dam cool if you ask me I can't believe how good it is at keeping the colors consistent and the clothing okay sure the office background is kind of funny but push come to shove you could always create a mask and mask it out the other problem you can encounter is if you get images that end up looking like this so that's actually because you set up your control net strength way too high so if we go look at for example the segmentor on the Tomb Raider Image we can see that the segmentor it just says hey there's red and then is the rest the end so it's only it's going to keep that sort of weird video game looking thing it's going to mix it up with everything I think I have a IP adapter yeah turn all the way up that's a bad idea So eventually you get bad results if you go too high with the settings and keep in mind that IP adapter is to transfer a style think of it as an instant Laura really and control Nets is to conform to a certain shape or a certain position if you were to use open pose for example instead um a certain composition if you're using depth and therefore they have to be used lightly and usually you use them in conjunction with the doo setting the higher the D noise the more you're going to want to use the control Nets and then you might lose some coherence in your image the lower the D noise the less you need to use the control Nets in fact you get to the point where you don't need to use them at all so I've run a bunch of tests for you guys to look at the results and get an idea of what this can or cannot do uh here I used the famous Lara Croft picture that was used by magnific for their marketing materials I think that the outcome is excellent It's better in my opinion than what we get through Korea for example the problem with that Korea image in my opinion is that the neck is completely deformed and you can almost tell that they're using some sort of control net for the depth uh that's why you have uh let's say pointy things on this image uh anyway uh and it seems to have confused the video game elements with the re alistic elements and not do a good job at integrating both whereas the configu version uh was able to with a lot of tweaking of course and a lot of modification of parameters get a good balance between the two next I uploaded this picture of three gentlemen uh because I wanted to see how complex the image could get before face detailer failed and it supported very well picked up all three faces no problem the issue I had was the eyes but it turns out that's actually this gentleman's eyes orientation so you come to your own conclusion as to what that means and for the teeth that looks a little bit off here I was able to fix it later by just switching the seed to a different number so I went to I think it was Triple 8 instead of Tri 7 a common problem that's highlighted here is that it smooth out it Smooths out the faces a lot it doesn't add detail now the this is the same problem here on this person's face there's no added detail whereas the tools like magnific and Crea are excellent at adding detail now I'm fully aware that we can try to inject noise in fact I left the noise injector in the tutorial but it didn't work well in my testing what I found was it just blurred out the image or left some artifacts without injecting details in the face so if you really need to inject detail go for the paid tools here you're looking at the magnify official homepage and their cherry pick test and here you're looking at the workflow I'm very surprised by this test I expected to be considerably inferior to magnific but it turned out actually pretty good it did not inject detail as expected but by playing with the control net and the IP adapter I was able to recover a better representation of the original image as opposed to reinvented version of it I was also pleased with this test on a cat that I downloaded from the internet I mean it wouldn't be a test if there wasn't a cat from the internet I suppose I used the Leo Sam hello world Excel V4 not V5 but you can try V5 of course Nothing Stops you and I found that it was really good at animals really really good it basically completely reinvents the image but here it's quite subtle and it did a really good job on the fur it did an excellent job on the little whiskers overall I'm very pleased with this and I would say it's completely a valid solution another cat this one is 20 years old uh poor cat may not be with us anymore uh here it was obviously a terrible Basse picture with a lot of noise in the original and it completely smoothed out the noise it even picked up this little pal and it reinvented it which I think is brilliant uh I'm very very happy with this result especially with the textures on the bed it really did a brilliant job at figuring out that this was indeed a duvet and picturing it properly okay well I think we demonstrated that it's perfectly possible to get decent results with very little work in conf UI but evidently there are limitations specifically details in the image cannot be injected proper at this point in time hopefully we'll see changes that leads to that talking about changes there's a new node that just dropped as I was finishing the editing on this video literally a few hours ago kiji released a wrapper for suir which is another stable diffusion upscaler it's excellent like everything kiji does but I didn't get a chance to review it so once it's a little bit more stable I'll of course have a video on it in fact if you enjoy this type of video I have many more they should appear on the screen right now thank you so much for watching I'll see you all on Discord and I really look forward to know what you've built with this and how you improve this workflow further cheers guys see you later
Info
Channel: Stephan Tual
Views: 8,859
Rating: undefined out of 5
Keywords: AI, comfyUI, svd, sdxl, sd15, ipadapter, controlnet, animatediff, loras, models, checkpoints, tutorial, stable diffusion, sora, open ai, ArtificialIntelligence, MachineLearning, #technology, magnific.ai, krea.ai, magic upscaler, stephan tual, stephantual, ai art
Id: TVCOasIZOyg
Channel Id: undefined
Length: 61min 1sec (3661 seconds)
Published: Thu Feb 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.