Stable Diffusion for Professional Creatives - Lesson 2: Photoshop with SD generating in real time

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to another Stable Diffusion tutorial.  Today we'll be looking at ways to integrate Stable   Diffusion into Photoshop. For this tutorial we'll  be using comfyUI and Stable Diffusion. If you   don't know how to use comfyUI or haven't seen the  first video about how to install it and run your   first generations, please do so before attempting  to do this. In this tutorial first we'll build the   node workflow, and then we'll see some of the use  cases that such a workflow can actually have in   a production environment. As always, you'll be  finding the work flow files in the description   as well as any model you might need for this.  And remember that if you're missing something   from the workflow you can just install missing  nodes or missing models from the manager. And,   if you don't have the manager installed, you  can go back and watch the first video where   I explain how to do so. And that's about it for  the introduction, so let's go build some nodes.  So, first of all, we’re going to set up the node  responsible for communicating with Photoshop.   Double click and search for “Photoshop to  ComfyUI”. This node has a password field,   which is going to be put into Photoshop later,  and for now it’s 12341234. It has an image output,   which is the actual active image on Photoshop,  and since we might be working with larger images,   we want to resize it to a resolution we can  work with on Stable Diffusion. So drag and drop,   and search for a “Resize” node. We will input 1024  by 1024 for resolution, since it’s big enough to   have enough detail, and it’s small enough  to still be fast for real time generations.   If you don’t have some of these nodes don’t  worry, you can install them via comfyUI manager   by going to manager > install missing nodes. The next thing we need is obviously a model.   Search for “Load Checkpoint” and select your  model of choice, in my case it’s going to be   epicRealism. Then search for the “Load  VAE” node to load up a VAE if necessary.  Since we want to be able to generate in  real time while we’re working on Photoshop,   we want to work with LCM, which offers faster  generation times than regular models. Search for   the “Load LORA” node, and select your LCM model.  By the way, you will be able to find the links to   all these models in the description below. Next up is the “KSampler” node. Search   for it and set it up, then search for the “VAE  Decode” node and the “Preview Image” node which,   although not strictly necessary in this workflow,  offers some debugging insights if needed.  These are the main inputs and outputs nodes, but  what goes on in the middle of the workflow? Well,   let’s start by connecting the “Load Checkpoint”  model output to the “Load LORA” model input,   and connect the “Load LORA” model output to the  “KSampler” model input. This will make sure that   the model is loaded, passes through the LCM  lora, and is then loaded into the KSampler.   Next set up the KSampler settings. Since we’re  using an LCM model, we want to have a lower than   normal number of steps and CFG, so let’s put 8  steps and 1.4 CFG. Set the denoise to a starting   setting of 0.4, and the sampler_name to “LCM”. Then, we connect the CLIP output from the “Load   Checkpoint” node to the CLIP input in the “Load  LORA” node, and create our “CLIP Text Encode”   nodes from the “Load LORA” node CLIP output.  Connect the conditioning outputs from the “CLIP   Text Encode” nodes to the positive and negative  inputs in the KSampler. These will be our positive   prompt text box, and our negative prompt text box. Then we connect the VAE output from the “Load   VAE” node to the VAE input in the  “VAE Decode” node. On the same node,   connect the Latent output from the KSampler to  the samples input. Then connect the image output   to the images input in the “Preview Image” node. We’re now missing a latent image. Since we want   to be working with pre-existing images sourced  from Photoshop, we’ll be working on an image   to image workflow. Search for the “VAE Encode”  node, hook up the image output from the “Image   Resize” node to the image input, and the same  VAE we’re using to decode to the VAE input on   the “VAE Decode” node. Then hookup the latent  output to the latent_image input on the KSampler   and we’re done. Or, well, almost done, since  we need a way to display the resulting images   somewhere else than the comfyUI webpage. Search  for the “PreviewPopUp” node, and hook it up to   the image output in the “VAE Decode” node.  This node will create a floating window that   we can place beside our Photoshop workspace  displaying our generated images in real time.  Now, for the prompts. For this first  experiment, we’ll do stick figures,   so we’ll start with a generic prompt like “photo  of a young woman standing on a beach in front   of the ocean”, and some negative prompt words  we don’t want to see, like “illustration, 3d,   render” and not safe for work related stuff. Now that the node chain is completed,   we want to head over to the right hand  side menu, click on “Extra Options”,   and click on “Auto Queue”. This will keep the node  chain going once started. We can also set it up so   that it’ll only queue when it detects a change, or  we can set up the “Photoshop to comfyUI” node in   the same way too. I’ll keep false for both, since  I want it to be generating images continuously. Everything is ready, so let’s head  over to Photoshop. Click on “Edit”,   and navigate to “Remote Connections”.  Click on it, and locate the “Enable   Remote Connections” field. We want to click  it, and input the same password we set in   comfyUI in the “Photoshop to comfyUI” node,  like 12341234. Go set up your password of   choice on the node, copy it, and paste it into  the correct box in Photoshop. Once that’s done,   click ok, and the connection should be  established. In order to check that, we can   just hit the “Queue Prompt” button inside comfyUI  and the generation should start by sourcing from   the blank image on Photoshop. Once the generation  is done, a popup will, well, pop up, and that’s   our “PreviewPopUp” node working. We can place it  where we want it on our screen, then click on the   pin button and it will stay there even if we click  somewhere else or switch programs and windows. Now to actually draw something. We’ll  start by drawing a crude human figure,   and as you can see the popup is being refreshed  as soon as a generation process is done,   which is quite fast for now. Even  if we keep adding details, though,   the generated image is quite the same as the  one we’re drawing, and that’s because we have   a denoise value that is quite low. So, let’s try  to up it to something like 0.8. As you can see,   now we are generating images that aren’t crude  drawings anymore! But since the generated pictures   are quite random, we might want to tune in  the settings for the KSampler a bit more,   so let’s tinker with the denoise value until we  find the correct parameter for what we want. In   this case, the outline is there, but the  picture is in black and white. We can then   try to fill in the white spaces with color.  Like a blue for the sea, a brown for the sand,   a light blue for the sky. We can also tinker a bit  more with the denoise value, like in this case,   where we went down to 0.65, then 0.7. As you can  see, the generated pictures are now much closer   to the original, and if we add more details, the  generated pictures will be more detailed too. The   face will be in the correct place, the clothes  will be similar to what we drew, et cetera. But this is just the beginning. Let’s start  with a picture of a model. In this case,   we want to dress it up. But first, as you can  see, the generated picture is struggling with   the concept of depth. In order to fix that,  we can set up a ControlNet. Search for “Apply   ControlNet (Advanced)”, then hook up the  conditioning nodes so that they will go   through the “Apply ControlNet” node. Since  the generations are going on right now,   we’ll finish linking them later. Then we need to  search for the “Load ControlNet Model” node, where   we’ll look for the depth model. You’ll find it in  the description below, or you can search for it in   your comfyUI manager “Install models” tab. Hook up  the Control_Net output to the Control_Net input,   and now we’re only missing an image. First, we  need a way to extract a depth map from an image.   We’ll search for the “MiDaS Depth Map” node.  But we don’t want it to process a 1024 image,   so we’ll duplicate the “Image Resize” node and  set it to 512 by 512, and hook up the resized   image to the “Depth Map” node, and the resulting  depth map to the “Apply ControlNet” node. We will   then close the circuit by hooking up the positive  and negative outputs and inputs. Aaand we get an   error, because I forgot to hook up the new “Image  Resize” node with the “Photoshop to comfyUI” node,   so there’s no source to resize. Let’s fix that  and hit “Queue Prompt” again. As you can see,   the generation is a bit slower now, but the  depth of the generated picture is much better. Going back to Photoshop, I will pull up some  clothes. The popup will refresh with an image   of a model with a sweater on top, and will  try its best to make sense of it. The light   is kind of coherent, and it warped it to follow  one of the arms too. But since we would like   a better approximation, we can try masking the  sweater and warping it so that it fits the body   a bit more nicely. Keep in mind that while  we’re warping, the preview will not update,   since the underlying image is still “unchanged”.  But as soon as we confirm the warp, the resulting   image will be a lot better than what we have in  Photoshop. We can do the same for a skirt now,   and as you can see it’s already quite good,  even adding some sort of pockets on the side.   If we mask and warp the skirt, it’s going  to result in an even better generation.   If we don’t want a pocket, we can just  mask in the corresponding hidden hand.  So, the generated image is adding coherence to  the rough post processing we’ve done, all in real   time. Quite good to use in pre-visualization  stages of productions if you ask me. The clothes, though, are a bit different from the  reference still. This is because we’re using LCM.   We can try and switch it off and use a normal  sampler, sacrificing some speed in the process,   and see if that works too. In order to do  that, right click on the “Load LORA” node   and select “Bypass”. This will result in the  chain still working, it’s just not going to   be affected by the Bypassed node at all.  Then, we can select a more usual sampler,   like “dpmpp_2m”, and a more realistic scheduler,  like “karras”. If we resume the generations now,   though, we can see that the resulting image  is quite bad. That’s because, while LCM can   work welll with a low number of steps, normal  samplers just can’t. So let’s go up to 14 steps,   and set the CFG to 2, and generate again. A  little better, but the denoise value is still   too low to notice any huge difference. Let’s up  the steps to 18, CFG to 4, and denoise to 0.4,   and try again. The results now are much better,  although the speed is not quite the same as   before. It’s a tradeoff we need to be wary of.  At this point, we can keep tinkering with the   values until we find the parameters that work the  best for us depending on the situation at hand. But if we’re doing previsualization, why stop  at clothing? We can also insert a background,   like a forest, or an interior. The generated image  will be a lot more coherent than the mashup we set   up in Photoshop, and with minimal effort. These  are just some examples of what we can do in terms   of previsualization, but it’s a technique that can  help in production environments. The only thing   that’s not really working, though, is the shadow,  and that can be taken care of with some post work. Let’s check out one more possible application.  We’re going to generate some fashion   illustrations, starting from this picture.  Let’s set everything back to the LCM settings,   and we will change the prompt to reflect what we  now want. So we’ll write “Fashion illustration   of a young woman” as positive prompt, and we’ll  remove “illustration” from the negative prompt,   while adding “paper” so it doesn’t look like it’s  drawn on paper. We will then up the denoising   value to 0.8, since we want a very different style  from the original image, and since we don’t care   for depth anymore but we want the edges to  be consistent, we’ll switch the ControlNet   model from depth to canny. Now the “Depth Map”  node is not needed anymore, so we’ll delete it,   and in its place, we’ll search for the “Canny  Edge” node, which will preprocess the Photoshop   image’s outline. Hook it up, and we’ll add  “watercolor” as a style to the positive prompt,   since we’d like to get some watercolor images. Now let’s hit “Queue Prompt”, and going back to   photoshop we can see how we can get some  fashion illustrations effortlessly. The   colors and the overall style is different,  since we have a very high denoise value,   but this lets us influence the overall design by  painting over the original image if we want to,   and get some different results very quickly. So on  the one hand, it’s not precise, but on the other,   it lets us brainstorm new ideas faster. It really  depends on what stage of preproduction you are   at. If we want to have more precise results,  we can always tinker with the denoise and the   other parameters, but this will be resulting  in more or less realistic images. It’s up to   you to find the combination of parameters  that work for you, and to apply the correct   workflow to your - or your clients’ - needs. That's it for today. As always - you know the   drill - please like and subscribe, as it helps  a lot since this is a new channel. I'm Andrea   Baioni and you can find me on Instagram  at risunobushi or at andrebaioni.com.   Also please comment if you have any questions  or suggestions for this or the next videos.   See you next time with more workflows that can be applied in production environments for creatives.
Info
Channel: Andrea Baioni
Views: 5,661
Rating: undefined out of 5
Keywords: generative ai, stable diffusion, comfyui, civitai, python, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, sd1.5, tutorial, Stable Diffusion for Professional Creatives - Lesson 1: Install ComfyUI and run your 1st generation, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, professional creatives, install comfyui, installing comfyui, andrea baioni
Id: h58B8ibjftw
Channel Id: undefined
Length: 14min 1sec (841 seconds)
Published: Sat Mar 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.