Stable Diffusion for Professional Creatives - Lesson 2: Photoshop with SD generating in real time

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Welcome to another Stable Diffusion tutorial. Today we'll be looking at ways to integrate Stable Diffusion into Photoshop. For this tutorial we'll be using comfyUI and Stable Diffusion. If you don't know how to use comfyUI or haven't seen the first video about how to install it and run your first generations, please do so before attempting to do this. In this tutorial first we'll build the node workflow, and then we'll see some of the use cases that such a workflow can actually have in a production environment. As always, you'll be finding the work flow files in the description as well as any model you might need for this. And remember that if you're missing something from the workflow you can just install missing nodes or missing models from the manager. And, if you don't have the manager installed, you can go back and watch the first video where I explain how to do so. And that's about it for the introduction, so let's go build some nodes. So, first of all, we’re going to set up the node responsible for communicating with Photoshop. Double click and search for “Photoshop to ComfyUI”. This node has a password field, which is going to be put into Photoshop later, and for now it’s 12341234. It has an image output, which is the actual active image on Photoshop, and since we might be working with larger images, we want to resize it to a resolution we can work with on Stable Diffusion. So drag and drop, and search for a “Resize” node. We will input 1024 by 1024 for resolution, since it’s big enough to have enough detail, and it’s small enough to still be fast for real time generations. If you don’t have some of these nodes don’t worry, you can install them via comfyUI manager by going to manager > install missing nodes. The next thing we need is obviously a model. Search for “Load Checkpoint” and select your model of choice, in my case it’s going to be epicRealism. Then search for the “Load VAE” node to load up a VAE if necessary. Since we want to be able to generate in real time while we’re working on Photoshop, we want to work with LCM, which offers faster generation times than regular models. Search for the “Load LORA” node, and select your LCM model. By the way, you will be able to find the links to all these models in the description below. Next up is the “KSampler” node. Search for it and set it up, then search for the “VAE Decode” node and the “Preview Image” node which, although not strictly necessary in this workflow, offers some debugging insights if needed. These are the main inputs and outputs nodes, but what goes on in the middle of the workflow? Well, let’s start by connecting the “Load Checkpoint” model output to the “Load LORA” model input, and connect the “Load LORA” model output to the “KSampler” model input. This will make sure that the model is loaded, passes through the LCM lora, and is then loaded into the KSampler. Next set up the KSampler settings. Since we’re using an LCM model, we want to have a lower than normal number of steps and CFG, so let’s put 8 steps and 1.4 CFG. Set the denoise to a starting setting of 0.4, and the sampler_name to “LCM”. Then, we connect the CLIP output from the “Load Checkpoint” node to the CLIP input in the “Load LORA” node, and create our “CLIP Text Encode” nodes from the “Load LORA” node CLIP output. Connect the conditioning outputs from the “CLIP Text Encode” nodes to the positive and negative inputs in the KSampler. These will be our positive prompt text box, and our negative prompt text box. Then we connect the VAE output from the “Load VAE” node to the VAE input in the “VAE Decode” node. On the same node, connect the Latent output from the KSampler to the samples input. Then connect the image output to the images input in the “Preview Image” node. We’re now missing a latent image. Since we want to be working with pre-existing images sourced from Photoshop, we’ll be working on an image to image workflow. Search for the “VAE Encode” node, hook up the image output from the “Image Resize” node to the image input, and the same VAE we’re using to decode to the VAE input on the “VAE Decode” node. Then hookup the latent output to the latent_image input on the KSampler and we’re done. Or, well, almost done, since we need a way to display the resulting images somewhere else than the comfyUI webpage. Search for the “PreviewPopUp” node, and hook it up to the image output in the “VAE Decode” node. This node will create a floating window that we can place beside our Photoshop workspace displaying our generated images in real time. Now, for the prompts. For this first experiment, we’ll do stick figures, so we’ll start with a generic prompt like “photo of a young woman standing on a beach in front of the ocean”, and some negative prompt words we don’t want to see, like “illustration, 3d, render” and not safe for work related stuff. Now that the node chain is completed, we want to head over to the right hand side menu, click on “Extra Options”, and click on “Auto Queue”. This will keep the node chain going once started. We can also set it up so that it’ll only queue when it detects a change, or we can set up the “Photoshop to comfyUI” node in the same way too. I’ll keep false for both, since I want it to be generating images continuously. Everything is ready, so let’s head over to Photoshop. Click on “Edit”, and navigate to “Remote Connections”. Click on it, and locate the “Enable Remote Connections” field. We want to click it, and input the same password we set in comfyUI in the “Photoshop to comfyUI” node, like 12341234. Go set up your password of choice on the node, copy it, and paste it into the correct box in Photoshop. Once that’s done, click ok, and the connection should be established. In order to check that, we can just hit the “Queue Prompt” button inside comfyUI and the generation should start by sourcing from the blank image on Photoshop. Once the generation is done, a popup will, well, pop up, and that’s our “PreviewPopUp” node working. We can place it where we want it on our screen, then click on the pin button and it will stay there even if we click somewhere else or switch programs and windows. Now to actually draw something. We’ll start by drawing a crude human figure, and as you can see the popup is being refreshed as soon as a generation process is done, which is quite fast for now. Even if we keep adding details, though, the generated image is quite the same as the one we’re drawing, and that’s because we have a denoise value that is quite low. So, let’s try to up it to something like 0.8. As you can see, now we are generating images that aren’t crude drawings anymore! But since the generated pictures are quite random, we might want to tune in the settings for the KSampler a bit more, so let’s tinker with the denoise value until we find the correct parameter for what we want. In this case, the outline is there, but the picture is in black and white. We can then try to fill in the white spaces with color. Like a blue for the sea, a brown for the sand, a light blue for the sky. We can also tinker a bit more with the denoise value, like in this case, where we went down to 0.65, then 0.7. As you can see, the generated pictures are now much closer to the original, and if we add more details, the generated pictures will be more detailed too. The face will be in the correct place, the clothes will be similar to what we drew, et cetera. But this is just the beginning. Let’s start with a picture of a model. In this case, we want to dress it up. But first, as you can see, the generated picture is struggling with the concept of depth. In order to fix that, we can set up a ControlNet. Search for “Apply ControlNet (Advanced)”, then hook up the conditioning nodes so that they will go through the “Apply ControlNet” node. Since the generations are going on right now, we’ll finish linking them later. Then we need to search for the “Load ControlNet Model” node, where we’ll look for the depth model. You’ll find it in the description below, or you can search for it in your comfyUI manager “Install models” tab. Hook up the Control_Net output to the Control_Net input, and now we’re only missing an image. First, we need a way to extract a depth map from an image. We’ll search for the “MiDaS Depth Map” node. But we don’t want it to process a 1024 image, so we’ll duplicate the “Image Resize” node and set it to 512 by 512, and hook up the resized image to the “Depth Map” node, and the resulting depth map to the “Apply ControlNet” node. We will then close the circuit by hooking up the positive and negative outputs and inputs. Aaand we get an error, because I forgot to hook up the new “Image Resize” node with the “Photoshop to comfyUI” node, so there’s no source to resize. Let’s fix that and hit “Queue Prompt” again. As you can see, the generation is a bit slower now, but the depth of the generated picture is much better. Going back to Photoshop, I will pull up some clothes. The popup will refresh with an image of a model with a sweater on top, and will try its best to make sense of it. The light is kind of coherent, and it warped it to follow one of the arms too. But since we would like a better approximation, we can try masking the sweater and warping it so that it fits the body a bit more nicely. Keep in mind that while we’re warping, the preview will not update, since the underlying image is still “unchanged”. But as soon as we confirm the warp, the resulting image will be a lot better than what we have in Photoshop. We can do the same for a skirt now, and as you can see it’s already quite good, even adding some sort of pockets on the side. If we mask and warp the skirt, it’s going to result in an even better generation. If we don’t want a pocket, we can just mask in the corresponding hidden hand. So, the generated image is adding coherence to the rough post processing we’ve done, all in real time. Quite good to use in pre-visualization stages of productions if you ask me. The clothes, though, are a bit different from the reference still. This is because we’re using LCM. We can try and switch it off and use a normal sampler, sacrificing some speed in the process, and see if that works too. In order to do that, right click on the “Load LORA” node and select “Bypass”. This will result in the chain still working, it’s just not going to be affected by the Bypassed node at all. Then, we can select a more usual sampler, like “dpmpp_2m”, and a more realistic scheduler, like “karras”. If we resume the generations now, though, we can see that the resulting image is quite bad. That’s because, while LCM can work welll with a low number of steps, normal samplers just can’t. So let’s go up to 14 steps, and set the CFG to 2, and generate again. A little better, but the denoise value is still too low to notice any huge difference. Let’s up the steps to 18, CFG to 4, and denoise to 0.4, and try again. The results now are much better, although the speed is not quite the same as before. It’s a tradeoff we need to be wary of. At this point, we can keep tinkering with the values until we find the parameters that work the best for us depending on the situation at hand. But if we’re doing previsualization, why stop at clothing? We can also insert a background, like a forest, or an interior. The generated image will be a lot more coherent than the mashup we set up in Photoshop, and with minimal effort. These are just some examples of what we can do in terms of previsualization, but it’s a technique that can help in production environments. The only thing that’s not really working, though, is the shadow, and that can be taken care of with some post work. Let’s check out one more possible application. We’re going to generate some fashion illustrations, starting from this picture. Let’s set everything back to the LCM settings, and we will change the prompt to reflect what we now want. So we’ll write “Fashion illustration of a young woman” as positive prompt, and we’ll remove “illustration” from the negative prompt, while adding “paper” so it doesn’t look like it’s drawn on paper. We will then up the denoising value to 0.8, since we want a very different style from the original image, and since we don’t care for depth anymore but we want the edges to be consistent, we’ll switch the ControlNet model from depth to canny. Now the “Depth Map” node is not needed anymore, so we’ll delete it, and in its place, we’ll search for the “Canny Edge” node, which will preprocess the Photoshop image’s outline. Hook it up, and we’ll add “watercolor” as a style to the positive prompt, since we’d like to get some watercolor images. Now let’s hit “Queue Prompt”, and going back to photoshop we can see how we can get some fashion illustrations effortlessly. The colors and the overall style is different, since we have a very high denoise value, but this lets us influence the overall design by painting over the original image if we want to, and get some different results very quickly. So on the one hand, it’s not precise, but on the other, it lets us brainstorm new ideas faster. It really depends on what stage of preproduction you are at. If we want to have more precise results, we can always tinker with the denoise and the other parameters, but this will be resulting in more or less realistic images. It’s up to you to find the combination of parameters that work for you, and to apply the correct workflow to your - or your clients’ - needs. That's it for today. As always - you know the drill - please like and subscribe, as it helps a lot since this is a new channel. I'm Andrea Baioni and you can find me on Instagram at risunobushi or at andrebaioni.com. Also please comment if you have any questions or suggestions for this or the next videos. See you next time with more workflows that can be applied in production environments for creatives.

Info

Channel: Andrea Baioni

Views: 5,661

Rating: undefined out of 5

Keywords: generative ai, stable diffusion, comfyui, civitai, python, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, sd1.5, tutorial, Stable Diffusion for Professional Creatives - Lesson 1: Install ComfyUI and run your 1st generation, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, professional creatives, install comfyui, installing comfyui, andrea baioni

Id: h58B8ibjftw

Channel Id: undefined

Length: 14min 1sec (841 seconds)

Published: Sat Mar 09 2024