How to use ControlNet models with Stable Diffusion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to foffer AI the channel where we explore the world of artificial intelligence and its latest tools and techniques in this channel we'll dive deep into AI tools like mid-journey stable diffusion and chat GPT giving you tips tricks prompts and workflows to get the most from these tools in this first video we'll take a look at control net a neural network structure that lets you control diffusion models by adding extra conditions and guide you through the different models available we'll show you how to use these AI tools to create stunning AI generated art let's get started so there's been a lot of interest in this new thing called control net which is now available with stable diffusion and automatic one one one one and it's showing up in all sorts of other tools so I thought I would create a video that shares a lot of the stuff that I've already learned about control net all of the different models that are available and how to use them so let's go through each of the different preprocessors and models that you get with control net I'm going to start with this room picture this is a mid-journey generation and I've got a prompt that should change the image quite substantially so we've got this sort of Japanese kind of tea room living room and we're going to try and change it into a sort of cyberpunk Cafe we're going to change the light to something that's sort of dawn like warm tones morning scene with Windows we're going to use the same seed for every render and this is all going to be generated using the deliberate model which is based on stable diffusion 1.5 we're going to take two images through each of the control net models we've seen the first one which is the Japanese living room the second one is this cyberpunk couple that are embracing again we're going to choose a prompt that tries to change this image as much as possible so we've got a portrait photo of a 30-something couple so an older couple summer clothes color photo 50 ml Prime Canon Studio portrait so we're going to try and move from a painting to a photo and we're going to use negative prompts to try and prevent any of those sort of RT painting type outputs that we don't want again we're going to use the fixed seed for all the generations and we're using the deliberate model so the first control net model that I'm going to take you through is the canny model and the canny preprocessor and this is probably one of the models I go to the most and that's why I'm starting with it it's one of my favorites gives some of the best results and essentially it is Edge detection so here is our Japanese living room and we'll put that through a preprocessor and the output of the preprocessor looks like this so we can see if I flick back and forwards between these two that we've got all of those edges from the Japanese living room showing now what cantranet does is when you click your generate button when you start creating your image through the standard diffusion model using the standard Samplers with your prompt it will guide the generation process based on this pre-processed image to create an output it is Guided by this and has these same constraints so if we look at the image that we get at the result of this we can see if I flick back and forth that we've got those same edges we've maintained that essence of what was the original image but rather than a Japanese living room we've kind of got this warm cyberpunk Cafe morning scene there's some nice light coming in we've kept the plant in the corner we've got the window on the side we've got the coffee table in the middle [Music] out so I'll just flick back between these a couple of times just so you can see how the pictures changed from the input through the preprocessor to the output now let's take a look at our second image which is our cyberpunk couple which we're trying to change into a photo so here we've got our cyberpunk couple on the left we've got the edge detection in the middle and we can sort of see that it's got the profiles of both of those two characters and on the right we've got our new generation and we can see it's it's done what we've asked for we've got the photo they're 30 something they're older and they're in summer clothes we can see that we've got the essence of the original image still coming through in our new output but the context has completely changed so we're going to move on now to HED is another another form of edge detection it's kind of fuzzy detection and it's really useful for keeping more of the details of an image so let's start again we've got our Japanese living room and this is what the pre-processed image looks like so if you remember our canny Edge detection the edges were really really sharp here we're maintaining that detail by having fuzzier edges and when we go to our output we get something a bit more like this so let's flick between these a couple of times and I think the effect is a lot more clear on our cyberpunk couple so let's have a look at the cyberpunk couple here they are on the left we've got our fuzzy Edge detection in the middle and on the right we've got this really detailed photo where a lot of those elements that we kind of lost with the candy Edge detection are still there in this HED version you can see that some parts of the detail it's kind of struggling with so we've got some strange hair going on here which is try it's trying to compensate for this color it hasn't given the guy color but it still needs to have this line coming down here this sort of strange headpiece has become flowers and the tattoos have become hair so you can see how it's sort of creatively taken those constraints from hcd and turned it into an image something is worth saying with HED is it often produces really good paintings if you want to do photos you might have to work a bit harder to get the prompt to do what you need all of those details that come out in the edge detection in that pre-processed version lend itself really well to paintings and art [Music] now I will typically try the canny edge detection model first and if I'm not getting the results I need if the edges it's picking out aren't great or I'm losing too much detail I'll switch over to HED and give that one a try next up we've got M Dash LSD which is another form of edge detection but it only focuses on straight lines so here's our Japanese living room and if we look at the pre-processed image this is what we've got we've got the straight lines from the beams and from the pictures from the coffee table from the window but our plant has completely disappeared because our plant doesn't have any straight lines on it so if we look at our output we're going to have something that doesn't have any plan in the right hand side it's probably going to put something else there and we need to see what it's going to do so here's our output so we just flick between these a couple of times so there's our original here's our pre-processed and here's our cyberpunk output so we've got those two posters we've got those beams we've got some down lights we've got the window on the side and it looks like it's filled that Gap by putting in some sort of vending machine and chair if we take a look at our cyberpunk couple uh the results of this one are a little bit odd essentially the input image doesn't have any straight lines except for the collar maybe a little bit on the arm so yeah this is not a good model for this image there's not enough straight lines and we're really not guiding the output in any way it's free form and this would be a bad choice let's move on to the next model and preprocessor which is depth so you've probably already seen that stable diffusion stability AI when they released uh stable diffusion 2 it came with a depth to image model uh this is very similar to that uh essentially we'll be creating a depth map based on our input images and using that to guide the diffusion process so like before here's our Japanese living room and here's the depth map currently there are two different types of depth map you can choose from so there's depth and there's depth Le res and this is the depth Le res version which gives a bit more detail in the depth map you see our input image and we've got the shape of the plant shape of the coffee table the strength of those frames behind is is a lot less strong than in the other models so I'm not sure what effect that's going to have on the output and we've got a coffee table and the cushions they're all showing so let's look at the output so well it still kept those frames it's made them neon we've got our plan it's kind of completely changed from a green plant to a red plant we've got the cushions we've got our table and it's made those panels behind into a window so it's really kept that essence and we've got the same structure of the image except now it's more of a cyberpunk cafe let's have a look at our cyberpunk couple so here we go we've got the couple on the left we've got the depth map image in the middle and on the right we've got our new generation again well we've got that high collar on the couple it's really struggling to work out what to do there it's giving him a shirt that doesn't have a high collar uh it's put a piece of grass or some some sort of light there to try and compensate it kind of works but this generation also has sort of some strange hands and artifacts going on I think if I was going to redo this I'd probably change the prompt to say no hands in the negative prompt and try and perfect this but I've kept the prompt the same for all generations so we can do an easier direct comparison so canny Edge detection HED Edge detection and depth maps those are the three go-to models I tend to try each of those see what sort of output I'm getting and see if I can get the image that I'm after next processor and model I'm going to cover is open pose and this essentially allows you to take a human pose or a hand pose from an image and reuse it in another image so you could have someone standing with arms crossed use that pose create a new image of someone completely different but they'd be standing in the same position with their arms crossed obviously our first example here a Japanese living room doesn't have any people in it so we're only going to look at this one from the perspective of the cyberpunk couple so here we are we've got our couple on the left we've got the open pose annotated result in the middle and we've got our generation on the right so in the middle that's the those color bars are the representation of the pose of our cyberpunk couple and if I overlay that on top of the image you can sort of see how it's coming up with that generation we've got those green lines representing arms the red ones being shoulders and then those sort of purple lines representing faces at the top it's taking one pose and reusing it to create a new image so if you look at the image on the right all of the styling all of the details completely gone but what we've got is two people sort of standing in roughly the same pose this probably isn't the best example but you can see there their arms are crossed uh her head is on his chest he's not sort of head to forehead like in the original image but the gist is there the next model is scribble and scribble essentially allows you to take a little drawing and turn it into a real thing you can also start with a real image and pre-process it and there are two ways to pre-process it one is to assume that your image that you're putting in is a scribble it's a doodle and to just convert it into binary channels which I'll show you and then there's an alternative which is fake scribbles so it takes your input image tries to reimagine it as if it had been scribbled and then uses that to guide the diffusion process so the first one here so we've got our Japanese living room and this is what happens if we go with the straight up scribble preprocessor you see it's just converted it into black and white and if we use this to generate an image we get something looking a little like this so it's not necessarily a bad input we've got this sort of diagonal line going across the top right which is only really there in the original image because of the Shadow but that's creating some new elements in our picture we've still kind of got a central coffee table we've got some panels to the left we've got the shape of one of the frames so some elements are being preserved whilst others are disappearing fake scribble as I said lets you take your input image which is our Japanese living room and then imagine that you would just sketch that out so this is what the preprocessor creates it takes a low Fidelity sketch of your input so we've got some of some detail there from the beams so these are the beams we've got this detail which is in the frame haven't really got much of the plant we've got a cushion a bit of the table and as you can imagine this is probably going to create an output that is quite different to our original so let's have a look so here's our new output of our cyberpunk Cafe those three beams in the middle have stayed and we've lost the frames but we've kept this sort of squiggle and it's become what looks like some sort of floating drone interestingly the plant has also stayed it's changed form it's a bit bigger now but it's still there and we've still got these panels on the left as well here's our cyberpunk couple I didn't show you the real skibble version because when converted to straight black and white it was pretty much all black except the top left corner here's the fake scribble so you can see we've got a rough outline of those two characters and again we've kept that collar detail this straight line here which is the thing that the uh the image generator always struggles with this and on our right we've got our new 30-something couple we've got some plants obscuring uh this guy which I think is quite a clever way of building that detail in you can see the ear details come through and the sort of headpiece detail is coming through as well so with scribble and fake scribble you kind of [Music] you're at the whim of how the preprocessor interprets your input and that decides what details are going to get capped so it's a good one to use if you want to maintain some sort of loose structure but also if you just want to be sketching doing real sketches and turning them into images you can get something that roughly matches your sketch the next model is segmentation and segmentation when you put it through the preprocessor it breaks down your image into segments or areas so here is our Japanese living room and here is how the segmentation preprocessor has broken it down so we've got the red areas which are those frames we've got a sort of a light greeny highlight color so the plant our cushions are in orange our ground areas Brown the panels on the left are white so when generating it's going to use these to create different segments in the output and here's our output something that's interesting here is we've completely lost the three vertical beams in the middle they've been present in most of those outputs until now something else that's changed is our image is kind of now a bit more official it's a bit more wide angle it's distorted more we've got a a slightly different feel going on we've maintained that plant in the corner and it's quite similar actually and those frames are still there so the details that are coming through are different but the overall feel of the image is quite different here we've got our cyberpunk couple not nearly as many segments so not as interesting um probably not the best model to choose for this type of image some interesting points to highlight though we've got this Gap here between the couple that's coming through in the segmentation and if we look in our output image we've got the same gaps appearing so the last Model I want to show you is the normal map model which is a texture mapping technique I haven't had much success with this model I suspect it is best for images where you want to preserve textures but let's have a look at how it works with our inputs here's our Japanese living room when we put this one through the preprocessor we get a normal map that looks a bit like this as you can see we've lost all of that detail all of that purple space we've got no mapping for that area we've only really captured the immediate foreground which is the cushions and a bit of the tables so when we produce an output we get something very very different and we're really only getting the detail from uh the very front of our input image that said I really like this output it's very grainy it kind of feels like a still from a sort of Blade Runner or a cyberpunk movie let's move on quickly to our cyberpunk couple because our normal map here is a lot more detailed so you can see we've got a really good map here showing the the couple embracing and when we use that to generate an image we're actually getting some really good results you can see the sort of profile of both of them is coming through and the pose is the same and there's a lot of good detail that we're maintaining again we've got this issue with the collar and there's some weird gaps here and strangely here is this a t-shirt or is it a long sleeve shirt um that said for this image the normal map seems to have done a good job so with the normal map occasionally if you haven't been getting the results you wanted from the other models it's worth giving this one a try see what it does see what the um preprocessor output looks like if it's giving you a lot of detail then keep using it changing your prompt and seeing what you can get so those are all of the current models and the preprocesses that come with them it's worth saying that anyone can train their own control net model so in the future we're probably going to see a lot more of these we're going to see new ways to control the diffusion process to create images with a higher degree of control so it's going to be really exciting to see where contronic goes in the next couple of months so I hope you've enjoyed this video we've been through all of the models and hopefully now you've got a good understanding of what each of them does and when to use them if you've enjoyed it please like And subscribe and there'll be more videos coming soon one
Info
Channel: fofr ai art and experiments
Views: 33,308
Rating: undefined out of 5
Keywords:
Id: GVCZHCLWON8
Channel Id: undefined
Length: 22min 48sec (1368 seconds)
Published: Wed Mar 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.