From sketch to 2D to 3D in real time! - Stable Diffusion Experimental

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Eye of a newt, wing of a bat, and... a rabbit's foot. Oh! Hello there, and welcome to another Stable Diffusion Experimental video. Today we'll be looking at a fun new way to create 2D and 3D game-ready, "game-ready" assets, just by starting from raw sketches, all in real time, inside of Comfy UI. And all of that, without any external tool needed. Alright, so here's the workflow that we're gonna use today. The core of it is Stable Diffusion Excel Lightning. And by using as the Excel Lightning, we can run things in almost real time. And actually, if we wanna stick to 2D assets, we'll run basically in real time. But the two new nodes that I'm really excited about trying today, and that make all of this not really production-ready, but more on the experimental side, painted node on the left side here, will allow us to roughly sketch whatever we need to then create and generate in 2D and 3D. And on the other side, on your right-hand side, TripoSR will take care of transforming a 2D image into a 3D mesh. And also left here, disabled by default, in the bottom left corner, a Photoshop to Comfy UI group. So that in case you wanted to use Photoshop to actually draw your illustrations, you can do that as well, by turning that on. So there's a switch here, that switches from the painted node to Photoshop. Since I've already covered this two times already, I won't cover today how to set up Photoshop and Stable Diffusion, but you can find two videos in the description below, and over here as well. So if you don't know how to do that, and you still wanna use Photoshop instead of the painted node, check those videos out before attempting to do this. As always, you'll be finding the JSON for the workflow in the description below, as well as any other models we might need today. And I will guide you through the creation of this workflow in just a second. On the other hand, if you don't wanna be bothered by how I construct the workflows, and why I use a node instead of another, you can just jump ahead and check out the results. Alright, so we're gonna need two things in order to work with SDXL lightning models. And I already covered this last week, but I'll cover that again just in case. We're gonna use a checkpoint. Last time it was real vis, but since today we're not going for realistic, but we're going for an illustration kind of look, we're gonna go with Juggernaut Excel, lightning version. We're gonna download that, and place it in our model checkpoint folder inside of comfyUI. And then we're gonna need an SDXL Lightning LoRA. There's different LoRAs, each trained on a different number of steps. Like last time, I'm gonna use the four-step LoRA. So I'm gonna download the SDXL Lightning four-step LoRa dot safetensor, and place it into our models LoRA folder, instead of comfyUI. As always, you will find a link to these models in the description below. So now that we've got all the models that we need, the first thing we're gonna need is a checkpoint. So we're gonna load a checkpoint. Search for load checkpoint, and then we're gonna select our SDXL Lightning model. In this case, Juggernaut XL Lightning. The next thing we need is a LoRA node. So search for Load LoRA, place it over here, and hook both the model and the CLIP through the load LoRA node. After that, we're gonna select our SDXL Lightning four-step LoRA dot safetensor. I have already explained this in the previous video, but what this does is basically allows our checkpoint here to be solved in just four steps. In order to do so, it has to go through a four-step LoRA. The next thing we need is gonna be a KSampler. So drag and drop from model, select KSampler, and then we're gonna need two prompt fields, one positive and one negative. So drag from clip, and then drop and select clip text and code. Control C, control V, and then connect the clip. The top one is gonna be our positive prompt field, and the bottom one is gonna be our negative. So connect the conditioning from the top one into positive, and the one from the bottom one into negative inside of the KSampler. For now, don't worry about the prompt fields, we're gonna take care of that later. And for this workflow, we don't wanna do text to image, we wanna do image to image. So, what we wanna do is encode an image. So we'll search for a VAE encode node, drag and drop the VAE from our checkpoint, and drag and drop the latent into the latent image input inside of the case sampler. The next thing we're gonna need is a VAE decode node, so drag and drop from latent into the case sampler. And select VAE decode. Drag and drop again the VAE from the checkpoint, into the VAE decode node, then drag and drop from image here in the VAE decode node, and select preview image. This image will be our 2D game asset. Now let's take care of the case sampler before moving on to the other parts of this workflow. Since this workflow is using an SDXL Lightning model, we're kind of limited to what we can actually input in the case sampler. The steps must be equal to the LoRA. So it's gonna be four steps. The CFG needs to be between one and two, so we're gonna go with, let's say, 1.3. The sampler is gonna be DPMPP 2M SDE, and the scheduler is gonna be SGM Uniform. It is very important that SGM Uniform is selected here, otherwise SDXL Lightning won't work in the way it's supposed to. And by the way, if you're not sure about what you should input over here, you can just go over to the model page, and usually there are some suggested settings. And for Denoise, since we're an image-to-image workflow, we can lower that a bit. Let's say 0.75 for now, and then we're gonna fix that later if needed. Now you can see that here in the VAE encode node, we are missing something, and that's pixels. That means an image. But where does the image come from in a workflow? Well, it really depends, because in this workflow, I'd rather use the painter node. So we're gonna search for our painter node, and we're gonna place it right over here. Now, in case you can't find a painter node inside of a comfyUI, or you can't install it via the manager, let me show you how you can install it. So, in order to install the painter node, if you can't do that via the manager, you wanna go to this GitHub page. Inside of this GitHub repository, you can find the painter node. What we wanna do is scroll down to the install section, and follow the procedures. So in our case, we can either download the contents of the GitHub repository, or we can install using Git. I would suggest to use Git, if you already have it installed. You can just go to your custom nodes folder inside of comfyUI, open the terminal inside the folder, and then type git clone, and then followed by the link to the repository. So what we wanna do is copy this command here, git clone, git clone, and then the link, and then go over to our custom nodes folder inside of comfyUI, right click, open in terminal, and then, Ctrl V, git clone, link, and press enter. In my case, the destination already exists, because I already copied that, so it won't do anything, but in your case, it's gonna install the missing nodes. And then, you're just gonna have to restart comfyUI, and refresh the webpage, and then you'll see the nodes that were missing before. Back to our workflow, we got our painter node here, and the next thing we wanna do, is resize the image that's coming out of the painter node, to an image that is good for SDXL Lightning, and that's 1024 by 1024. So the thing we're gonna do, is just drag and drop from image, and then search for image resize. Gonna input 1024 by 1024, and then hook up the image output, to the pixels input, in the VAE encode node. In case you don't wanna use the painter node, but you'd rather use Photoshop, you can just download the workflow in the description, and use that instead. Inside of that workflow, you will find both the painter node, and the Photoshop to ComfyUI node. That workflow has a logic switch, inside of the Photoshop to ComfyUI group. So, if you wanna enable the Photoshop to ComfyUI group, the switch is gonna activate as well, and then the painter node is not gonna be activated anymore, and the VAE encode node, is gonna work with Photoshop only. The opposite is true, while the Photoshop to ComfyUI group is disabled, then only the painter node, will be acting on the VAE encode node. Another thing that we wanna add, that is gonna be useful later as well, is a remove background node, so that we can just use the resulting image as an alpha. So, we're gonna search for a REM - remove - BG session node. We could use a variety of different nodes, like for example, the segment anything node, that we used last week, but I wanna show you a new node. But this node is very versatile, because it can work by either using the CPU only, or by using CUDA, or direct ML, or openvino as well, and a lot more of different options. In my case, since I'm using an NVIDIA GPU, I'm gonna select CUDA. Then, we're gonna drag and drop from REM BG session, and then select image remove background plus. This node needs an image, and the image we want, is the image that we're generating right now. So, we're gonna drag and drop from the VAE encode image output, and input that into the image remove background node. Then, from its outputs, we're gonna drag and drop from image, and select a preview image node, that's gonna be our segmented image, without the background, and we're gonna drag and drop from mask as well, and select our mask to image node, and drag and drop from over here as well, and select a preview image node. By doing this, we can both have the resulting image without background, and the resulting mask. So, in case we wanna troubleshoot something, we can do that at a quick glance. And now, it's time to think about what we actually need as a game asset. So, let's say we want a game illustration of a mana potion. So, we're gonna drag and drop from image, and select a preview image node, and select a preview image node. And for negative, since we don't want any photography, for example, or we don't want any 3D or render, we're gonna input just that. And since we're on YouTube, and rather are more on the side of caution, I'm just gonna add not safe for work, and mute. Never seen a nude mana potion, but you never know. And now, we can finally draw. The painter node works basically like paint, and here, you have a brush tool, here, you have an eraser, what I'm gonna do is just work freehand, but you can use different tools and shapes if you want to. So, I'm gonna go over here, select a color, let's say we wanna select, let's say white for starters, and then, I'm gonna draw the basic shape of a potion, so, like a glass vial. And then, I'm gonna select the brown, and pop a corkscrew over there, and then, I'm gonna select a light blue, since everybody knows mana is blue, health is red, mana is blue, and then, I'm gonna just fill it in inside of the glass. And then, I can hit Queue prompt, and generate the picture. As you can see, it's very fast. But the thing we wanna do is, we wanna keep generating while we're drawing inside of the painter node. So, in order to do that, we wanna do two things. The first thing we wanna do is go over to our ComfyUI sidebar, enable extra options, and then, hit auto-queue. What this does, it's basically auto-queuing the Queue prompt command. And then, since we're using a preview image node, and not save image node, we wanna keep the image around until we make any substantial changes. So, we're gonna input fixed, instead of randomized, inside of the KSampler. Now, if you wanna switch a preview image for a save image node, you can absolutely do that. I just don't wanna keep around a lot of images that I might not even need. So, instead of save image node, I just use and prefer a preview image node. Also, if you made some changes to an image that was in the preview image node, and you can't find it anymore, it's in your temp folder inside of your ComfyUI folder. At least, up until the point you restart ComfyUI, at that point, it would be deleted. But up until that point, the temp folder inside of Comfy UI will hold everything you generate in a preview image node. Okay, so now that we got everything we need for a 2D image generation process, we're gonna hit Queue prompt, and as you can see, the prompt will continue to auto-generate. And now that the prompt is on our Queue, we can just keep adding new stuff to the painter node. And so, I'm gonna draw like a bit of a table here. Now, it's thinking it's a ring, so what I'm gonna do is just add a bit more. It might interpret this as a table, or as sand, or rock, we might never know. Here, it thought it was like a green splash, so I'm gonna go with, let's say, a darker brown. And let's see what it does. As you can see, it just keeps generating stuff, so now it's like rock. And honestly, they're quite good illustration that are kind of ready to use for 2D games or point and click games. Like, for background assets or clickable assets, they're pretty good. And the generations are almost instant as well. Let's add, let's say, a bit of a violet sparkle around. And as you can see, it's lighting like this fairy lights around the bottle. Let's say we want to add a little bit of a green splash inside of the bottle. Let's add some green as well. And there you go, you got a slight green hue. Or let's do like a poison minor bottle, and it's got some green fumes coming out of it. But what if we wanted to add 3D capabilities as well and generate meshes in almost real time? Well, what we're gonna need is a ComfyUI TripoSR node. You'll be able to find the link to the GitHub in the description below. And in order to install this repository, we're gonna do the same thing we did before for the Painter node. So we're gonna copy this command, then go over to our custom nodes folder inside of ComfyUI, right click, open in terminal, copy the command, git clone the link to the repository, and press Enter. Same as before, I already installed it, so I got an error, but in your case, it's gonna be installed inside of your custom node folders. Now, in this case, we also need to install some dependencies. So, we're gonna go over to our TripoSR folder, right click, open in terminal, and then type pip install -r requirements.txt and then press Enter. What this does is basically installing everything that's needed in order to make the node work. In my case, since I had already installed it, you can see that the requirements were already satisfied. In your case, it's gonna download everything that you don't have that is gonna be needed by the node. So, now that we have the node, you're gonna need to restart ComfyUI and then refresh the webpage. After you've done that, I'm gonna stop the auto queue because I don't wanna keep queuing anymore until I'm done building the workflow. And then, we're gonna search for TripoSR. And as you can see, there's three nodes here. We're gonna need all of them. So, we're gonna start with the model loader. Now, the model loader is gonna need a model. In order to find a model, you can find it linked here in the description below or on the GitHub. This on the GitHub is the link to the model we're gonna need. So, we're gonna click on that and then we're gonna be on Hugging Face. We're gonna go over to files and version. We're gonna select model.ckpt, download that and place it in our checkpoint folder or our comfyUI folder. Once the model has been downloaded, we're gonna refresh comfyUI and then we're gonna find it over here. In my case, it's gonna be model.ckpt, but if I were you, I'd rather change the name of it because if I don't, at the end of the day, I'm gonna have a lot of models.ckpt. So, rename it something like TripoSR.ckpt and that's gonna do the trick and you're gonna remember what it does when the need arises. Then, I'm gonna drag from TripoSR model output and select TripoSR sampler. Now, the sampler needs a few things. It needs a reference image, a reference mask, a geometry resolution and a threshold and that's where our image remove background node is gonna come in handy. We already have both of those. The reference image is either the one without the background or the original one from the VAE decode node. At the end of the day, it's gonna be the same because we're gonna fit it the mask from the remove background node as well. So, I'm gonna drag and drop from the VAE decode node into our reference image input and then from the mask of the image remove background node inside of reference mask of Trepo SR sampler node. Then, for our geometry. Geometry is how complex is the resulting mesh. Now, we have to make a choice between good resolution and fast generations. Since I wanna generate fast, in real time and then decide at a later stage if I like the results or not, I'm gonna input a very low resolution like 128. But, once I will find a result that I will like, then I will increase the resolution to let's say 512 or 640 which is the maximum resolution I can reach on my 3080 Ti and then generate the same mesh but at a better resolution because the mesh is gonna stay the same because the SID is fixed in my case sampler. If you don't care about working in real time, you can just input 640 or 512 or whatever is the maximum resolution you can output and then queue every single prompt once you've made changes that you like. As for threshold, it really depends on the generated images so for now, I'm gonna just keep 25. Then, we're gonna drag and drop from mesh and select TripoSR viewer and since this is a lot easier to show than actually explain, I'm just gonna hit queue prompt and there we go. This is our glorious result in 120p. As you can see, it's a 3D approximation of this object we have in our illustration based on one single image and as of right now, both the mesh and the textures are not great quality but what we can do is just go over to our geometry resolution field, increase it to the maximum we can actually manage and then queue prompt again. And as we can see, we can see that we can get a much better mesh now. I had to turn down the geometry resolution to 512 because while recording this, it couldn't handle 640 so you could probably squeeze out a bit more resolution both for mesh and texture. Now, is this good enough to be considered game ready? Oh no. Oh no no no, absolutely not. But yeah, I've seen way worse assets in indie games out there and at least you've got a nice mesh you can work on in say Blender or whatever software you might be using for 3D stuff. The topology is not gonna be great so you're gonna have to retopo that and the texture, you can work on that as well. So as a starting point or a very cheap game asset, that can work quite well. But we might still not be done with generating assets so let's turn down the resolution to 128 and then hit auto queue here and start anew. Now you're gonna see that both the 2D assets and the 3D assets will be able to be generated in close to real time. It's gonna be a bit slower than normal here because I am recording my screen and I'm using my camera but in my testing it's quite fast. It usually takes around four or five seconds with my 3080 Ti so that's quite good. So we're gonna clear our canvas and then we're gonna change our positive prompt to let's say a poison vial. So we're gonna go back to our painter node now and we're gonna hit queue prompt and I'm gonna draw a vial. So here we go. We've got the vial. We've got the grin for poison. Everybody knows that poison is green. Then we're gonna add like a corkscrew over here and then we're gonna add some toxic fumes around it and as you can see we're getting the asset in close to real time although it's quite scuffed but we can keep improving in our painter node or we can just change the subject. So let's say we want to do a cauldron. We're gonna go for like a sort of a witch team. So we're gonna clear our canvas, select a grey of sorts and then start drawing and then we're gonna put like a skull here and then we're gonna add some liquid inside. It's not picking up the skull so we are going to add with a skull in front of it. Yep, and as we can see now it's picking up that this one is a skull. And there we go. We've got our asset both in 3D and in 2D. Both with a background and without. So let's say I wanted to develop a point and click game or I need some assets for my backgrounds or I need some assets for my D&D games or any other game I might play or any other thing really that you might need simple illustration in 2D or 3D meshes for. Done. You can just instantly create and generate with this real time workflow. Now this is a very linear and simple workflow. If you wanted to add stuff to it you could absolutely do that. Let's say you wanted to add an IPAdapter group in order to source from reference images that you have and that you like and it can influence the final designs of your illustrations. You can do that as well. Just implement a group for IPAdapter over here. So searching for IPAdapter and we're going to select the V2 so that's "Advanced". We're going to search for our unified loader and then I'm going to hook the model through this so we can create a unified loader the IPAdapter and onto the case sampler and then load a reference image so search for a load image node and plug it in into the IPAdapter advanced node. Or let's say you wanted to generate different assets and then merge them all together before creating a 3D mesh. You can absolutely do that and then use an image blend by mask node like we did last week. The possibilities are actually endless same as all the workflows I'm showing you all in all though I wouldn't think that this workflow is quite production ready yet. TripoSR is not quite at a place where I would feel confident assuring clients that I could do something with it at least not on a consistent level and that's what I'm looking for in my production ready tutorials. So that's why I'm presenting this tutorial to you in the stable diffusion experimental series instead of the professional creative series. So that's going to be all for today and I hope that you like this workflow and can implement it in your own ways. As always this is a new channel so any like and subscribe I get helps a lot. I am Andrea Baioni you can find me on Instagram @risunobushi or on the web at andreabaioni.com. If you want to know more about how you can experiment with stable diffusion or you want to use stable diffusion for your professional creative needs check out these videos. Thank you for watching and I'll be seeing you next week.

Info

Channel: Andrea Baioni

Views: 4,183

Rating: undefined out of 5

Keywords: generative ai, stable diffusion, comfyui, civitai, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, tutorial, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, professional creatives, install comfyui, installing comfyui, comfy-ui, andrea baioni, From sketch to 2D to 3D in real time! - Stable Diffusion Experimental, stable diffusion experimental, triposr, sdxl, sdxl lightning

Id: DO8CeE5c0Mc

Channel Id: undefined

Length: 24min 59sec (1499 seconds)

Published: Sat Apr 06 2024